A modular approach to the evaluation of Adaptive User Interfaces

A. Paramythis1, A. Totter1 and C. Stephanidis1,2

1 Institute of Computer Science, Foundation for Research and Technology ö Hellas,
Science and Technology Park of Crete, GR-71110 Heraklion, Crete, Greece

2 Department of Computer Science, University of Crete, Greece

{alpar, totter, cs}@ics.forth.gr

Abstract. Adaptive User Interfaces (AUIs) are a continuously growing area of research, with numerous experimental and commercial systems reported in the literature, in diverse application areas. Although there exist today a number of established approaches and frameworks for the design and implementation of AUIs, their evaluation is yet to be addressed in a comprehensive way. Arguably, the main drawback of existing evaluation methods and techniques is that they fail to provide adequate feedback into the AUI design process, or to generate empirical findings that can be reused across application boundaries. This paper proposes a new, modular approach to the evaluation of AUIs, which is specifically intended to cater for the aforementioned problems. The adaptive Nautilus Web browser, on which the approach will be applied is also briefly presented.

1. Introduction

The application of adaptive methods and techniques in human-computer interaction (HCI) is gaining increasing attention in recent years. However, to date, there is limited knowledge as to which adaptation methods and techniques are appropriate for different users and for different interaction contexts. This is arguably due to the lack of reusable empirical findings coming from the evaluation of adaptive interactive systems, which, in turn, can be traced back to the way in which the evaluation of adaptive user interfaces is approached today.

To start with, adaptation is not sufficiently addressed by existing standardized evaluation frameworks (although, in some cases, it is a concern) (Stary & Totter, 1997). As a result, researchers have had to employ more ãbasicä evaluation tools to approach the assessment of adaptive systems. Among the best-known and widely used approaches in the field is the ãwith and withoutä adaptivity evaluation design, in which an adaptive instance of the system is compared with a non-adaptive one (see, e.g., (Kaplan, Fenwick, & Chen, 1993), (Meyer, 1994), (Boyle & Encarnacion, 1994), (Weber & Specht, 1997), (Brusilovsky & Pesin, 1998), (Brusilovsky & Eklund, 1998)). A major criticism of this evaluation approach has been that the non-adaptive instance cannot be ãoptimalä in any way, if adaptation is properly ãdesigned intoä the system (Höök, 2000). Another equally important problem is that, in this type of study, the reasons behind the ãsuccessä, or ãfailureä of adaptation can only be traced back to the initial hypotheses of the adaptive system design. In other words, it is not possible to ascertain why, and under what conditions, a particular type of adaptation may be employed towards a specific goal. This situation is exemplified in the several studies that have addressed adaptive link annotation, often arriving at contradictory conclusions (Eklund & Brusilovsky, 1998).

A different perspective on the study of adaptive systems has been put forward by Oppermann (Oppermann, 1994), in the assessment of adaptation in Flexcel II (Krogsäter, Oppermann, & Thomas, 1994). Following this perspective, adaptation is treated as an integral part of the system and evaluation is not based on the presence of a non-adaptive counterpart. However, this approach is also limited with respect to assessing the degree to which the different factors influencing adaptation contribute to its ãsuccessä or ãfailureä.

In light of the above, there is an acknowledged need for a renewed look at the evaluation of adaptation, in which the employment of traditional HCI evaluation methods and techniques will be placed on a new basis, acknowledging the particular characteristics that differentiate adaptive systems from their ãstaticä counterparts (Höök & Svensson, 1999), (Höök, 2000).

The main idea behind the approach put forward in this paper is that the evaluation of adaptive systems should not treat adaptation as a ãmonolithicä / singular process happening behind the scenes; rather, adaptation should be ãbroken downä into its constituents, and each of these constituents should be evaluated separately where necessary and feasible. The seeds of this idea can be traced back to (Totterdell & Boyle, 1990a), who propose that a number of adaptation metrics be related to different components of a logical model of adaptive user interfaces, to provide what amounts to adaptation-oriented design feedback. Furthermore, (Totterdell et al., 1990a) present two types of assessment performed to validate what is termed ãsuccess of the user modelä (note that, in their case, the ãuser modelä is also responsible for adaptation decision making): ã... an assessment of the accuracy of the model's inferences about user difficulties; and an assessment of the effectiveness of the changes made at the interface.ä (Totterdell et al., 1990a)

The contribution of this paper, along these lines, is the introduction of a modular approach, which offers a detailed view into the ãdecomposabilityä of adaptation, from the perspective of HCI-oriented evaluation. The main strength of this approach, which builds extensively on previous work in the field, lies with the potential it offers towards deriving detailed evaluation results that can be analyzed, extended and reused across user interfaces and application domains.

A related approach to the one presented in this paper can be found in (Weibelzahl & Lauer, 2001), where the authors introduce an evaluation framework for the assessment of interactive systems that employ Case-Based Reasoning techniques to support adaptation in their interaction with the user. Their framework bears many similarities to the approach postulated herein, especially in terms of how adaptation is ãdecomposedä and evaluated in a series of steps.

Although the proposed approach is presented in relation to a particular class of adaptive systems, namely adaptive user interfaces (AUIs), it should be noted that it is not exclusively relevant to AUIs; rather, the proposed evaluation approach is expected to be easily extensible to other classes / categories of adaptive systems.

The rest of this paper is structured as follows. The following section presents the proposed evaluation approach in two steps: in the first step, a high-level model for adaptation in AUIs is introduced, accompanied by a tentative classification of evaluation methods, intended to facilitate subsequent discussions; in a second step, the model is broken into (sometimes overlapping) modules and the evaluation of each module is discussed in detail. The subsequent and final section presents a brief overview of the Nautilus Web browser, which will serve as the platform for applying and further improving the proposed approach.

3. Modular evaluation of AUIs

The proposed approach is based on the premise that the evaluation of individual stages (referred to as ãmodulesä) involved in the AUI adaptation cycles, enables the derivation of detailed findings, which, in turn, provide ample feedback back into the AUI design process.Ê Specifically, the proposed approach:

ð      identifies ãmodulesä of AUIs that can, and should, be evaluated both separately and in combination (i.e., the evaluation objects);

ð      presents the evaluation rationale underlying the decomposition of AUIs into modules and the subsequent assessment of these modules, based on specific criteria (i.e., the evaluation purpose);

ð      circumscribes the methods and techniques that can be employed for the evaluation of the different ãmodulesä, in the different stages of the AUI development life-cycle (i.e., the evaluation process).

To that effect, the rest of this section will: (i) establish a basis for the discussion of evaluation methods / techniques; (ii) present a high-level model for adaptation in AUIs; identify the individual stages of adaptation that can be targeted as evaluation modules; and, (iii) propose specific evaluation methods and techniques that can be employed for each module.

3.1 A contextual perspective on evaluation

To facilitate a rather generalized treatment of user-based evaluation methods in the forthcoming sections, a tentative classification will be introduced. This classification is not intended to fully capture the characteristics of all existing evaluation methods, but rather to identify those dimensions of evaluation that are pertinent to the ongoing discussion. The classification scheme is based on two dimensions: (a) the types of evaluation measures that are supported by each method, and (b) the stage of the development life cycle that each method is best suited for.

The first dimension, i.e., evaluation measures (one could alternatively term this dimension ãdata collection methodsä), is a simplification of the measures proposed by McGrath (McGrath, 1995), which are extensively used in the social and behavioral sciences (see Table 1, top row). Along this dimension, evaluation methods are separated into: self reports of participants (e.g. questionnaire responses, interview protocols, rating scales, etc.); observations; and, trace measures.

Regarding the second dimension of our classification, i.e., stage of the development life cycle that each method is best suited for, a broad categorization is employed, which distinguishes between methods that: (a) are best suited for the early (exploratory) stages of design, (b) require the existence of at least an interactive prototype, (c) are targeted towards complete (ãfinishedä) products, and (d) can be used (in variations) at any stage of the design process.

Table 1 presents a classification of thirteen empirical evaluation methods commonly used for the investigation of usability in HCI, as identified in (Jordan, 1998), using the above classification scheme.

Table 1. Classification of empirical usability evaluation methods.

Types of

Measures

 

Empirical Usability

Evaluation Methods

self reports

observations

trace measures

 

Suitability for employment at different developÐment stages

Focus groups

 

 

at any stage of the design process

Interviews

 

 

Questionnaires

 

 

Private camera conversation

 

 

better suited for early design stages

Valuation methods

 

 

User workshops

 

 

Co-discovery

*

 

 

require at least an interactive prototype

Think aloud protocols

*

 

Logging use

 

 

Controlled experiments

 

 

Incident diaries

 

 

better suited for ãfinishedä products

Feature checklist

 

 

Field observation

 

 

*Ê When applied in HCI, Co-discovery and Think aloud protocols need to be combined with some form of observation in order to obtain a meaningful record of the interaction circumstances, so as to enable the contextual interpretation of the users' comments.

 

In addition to the above user-based evaluation methods, the proposed AUI evaluation approach will also consider expert-based ones (i.e., evaluation methods that require the participation of experts, but not of end users). Following the broad categorization of (Jordan, 1998), such methods can be classified as expert appraisals (which typically require the expert to judge the product against known principles, guidelines, rules, standards, etc.) and cognitive walkthroughs (which call upon the expert to approach the evaluation from the point of view of a typical user performing a particular task).

Finally, it should be mentioned that, following the norm in HCI, user testing in a ãusability laboratoryä (for hypothesis testing, or performance measurements such as error rate, task completion time, task frequency, etc.) is classified under controlled experiments (although controlled experiments are not restricted to this type of user testing).

3.2 A high-level model of adaptation in AUIs

In the context of the proposed approach, a base model for adaptation is required, which will reveal some important high-level architectural components of AUIs, as well as explicitly represent the fundamental stages involved in deciding upon and effecting adaptation in HCI. The model presented in Figure 1 is based on, and extends,the logical two-level architecture of adaptation in (Totterdell et al., 1990b). Our goals in deriving this model have been to: (a) make the individual stages of adaptation as concrete as possible, without, however, delving into technical issues, or implementation-oriented details, and (b) introduce details related to different approaches to AUI adaptation, which impact on the evaluation choices (affecting both the objects of evaluation, as well as the process for evaluating them). A number of points that should be noted regarding the model are: no assumptions are made as to the employed technologies and the targeted platforms; no assumptions are made as to the physical distribution of user interface components (e.g., over the network); although depicted separately at the conceptual level, some of the components may actually be combined in an implemented AUI.

 

Fig. 1. High-level model of adaptation in AUIs.

The model encompasses the following components / stages of adaptation:

ð      Interaction monitoring: Refers to facilities that are intended to capture the exchanges between the user and the user interface, at different levels of the interaction (i.e., physical, syntactic, semantic (Hoppe, Tauber, & Ziegler, 1986)).

ð      Interpretation / inferences: Refers to the part(s) of the AUI that is responsible for interpreting information made available through interaction monitoring, in order to update the models maintained by the system (e.g., user model).

ð      Explicitly provided knowledge: Refers to information about the usersâ characteristics, plans, tasks, context, etc., which is explicitly provided to the system (as opposed to indirectly inferred from interaction data), typically by users themselves.

ð      Modeling: Refers to explicit or implicit representations of the users (including, for example, their abilities, skills, requirements, preferences), their plans with respect to a particular (portion of an) interactive session, the tasks that can be performed with the system, etc. Of particular interest in the context of the present discussion are those models that are dynamically updated during interaction, based on knowledge acquired at run-time (the user model being a typical such case).

ð      Adaptation decision making: Refers to the part (or parts) of the AUI that is responsible for deciding upon the necessity of, as well as the required type of, adaptations, given a particular interaction state. Seen at an abstract level, decisions made at this stage match information found in the various models maintained by the AUI, with the alternative interactions designed to cater for variations therein.

ð      Applying adaptations: Refers to the actual introduction of adaptations in the user-system interaction, on the basis of the related decisions. Although typically subsumed by adaptation decision making in the literature, this adaptation component may be varied independently of the decision making process, e.g., to account for different adaptation strategies.

ð      Transparent models & adaptation ãrationaleä: Refers to the particular case of AUIs that enable users to review the models maintained by the AUI (at different levels of ãtransparencyä ö see (Höök et al., 1996) for a detailed discussion), or the rationale that underlies the adaptation decisions made by the system. In the case of transparent modeling, users may also be offered the capability to modify these models, so that the latter better reflect their individual or other characteristics.

ð      Automatic adaptation assessment: Refers to the run-time assessment of the effects of decided upon and effected adaptations, with the indent of evaluating their ãsuccessä (i.e., whether the goals underlying their introduction have been met). This stage is referred to as ãsecond-level adaptationä in (Totterdell et al., 1990b) and may further involve the modification of aspects of the lower-level adaptation cycle (e.g., by enabling or disabling rules in rule-based adaptation, or by altering the ãweightä of alternatives, in decision theory-based adaptation).

It should be noted that this high-level model is not intended to capture the characteristics of all AUIs reported in the literature. On the other hand, there do not exist to date AUIs that comprise all of the identified components. However, the modular nature of the proposed evaluation approach allows one to selectively apply it, or extend it to suit the particular needs of the AUI at hand.

3.3 Modular evaluation

In this section we will identify adaptation ãmodulesä (comprising one or more of the adaptation stages / components in the previous section), which can be evaluated individually and in combinations. Before proceeding to the presentation of the modules and their evaluation, we would like to make the following clarifications, which hold true throughout the presentation of the approach:

ð      In some cases, evaluation methods that do not involve the users directly assume that the evaluator / expert takes into account the characteristics (abilities, skills, knowledge, etc.) of the ãtypicalä user of a system. Since the concept of a ãtypicalä user is contrary to the very notion of AUIs, this assumption cannot be applied in AUI evaluation. Thus, an explicit requirement that permeates the proposed approach is that, in all cases where users are not directly involved in the evaluation, each and every individual evaluation task takes into account a particular user (conveyed through relevant characteristics, which are encoded in some type of user profile), in a particular context of use (conveyed in a way analogous to the user).

ð      Expert-based evaluations in HCI are, in general, assumed to be conducted by usability experts. In the description of the proposed evaluation approach we will occasionally refer to expert-based evaluation tasks which are foreseen to be undertaken by individuals that posses expertise relevant to the application domain, the target user group(s), etc., but do not necessarily have a background in usability evaluation.

 

Fig. 2. The correspondence between evaluation modules and AUI model components.

Let us now move on to the presentation of the modules, wherein, for each identified module, the following information is provided: (i) components comprising the module; (ii) evaluation goal(s) and potential evaluation criteria; and, (iii) proposed evaluation methods, and prerequisites for supporting these methods.

3.3.1Ê Module A1

Comprises: interaction monitoring, interpretations / inferences, and modeling. The goal of evaluation in this module is to ensure that the models derived by the system through dynamic interaction assessment are ãoptimalä. Optimality in this context may be related to the following evaluation criteria[1]: correctness of the interpretations / inferences (i.e., do the inferences / interpretations reflect that actual state of the entity being modeled?); comprehensiveness of the model (i.e., can the model represent in its entirety the inferred / interpreted information about the entity being modeled?); redundancy of the model (i.e., does the model contain ãattributesä of the entity being modeled, which cannot be inferred from interaction?); precision of the model (i.e., how accurately does the model reflect the entity being modeled?); sensitivity of the modeling process (i.e., how fast does the modeling process converge to a comprehensive and accurate representation of the entity being modeled?); etc.

In the case of models that directly or indirectly involve the user (e.g., user modeling, plan recognition), one would need to employ a combination of evaluation methods to assess the degree to which the above criteria are met. Specifically, due to the fact that both observations and trace measures can only be used on overt behavior, not on thoughts or feelings or expectations (McGrath, 1995), methods in the self report category have to be used. Additionally, methods which allow the users to offer feedback during interaction are to be favored (to avoid remembering effects), although care should be taken that these methods are not too obtrusive with respect to the interaction itself.

Eliciting user feedback regarding the modeling process requires that at least a prototype of the system exists, with functional interaction monitoring and inferencing / interpretation components (the modeling component could be simulated). Furthermore, users should have some representation of the modeling process itself, which, in this case, can be constrained to the results of the process (i.e., the resulting model or models). If the AUI under evaluation also comprises a functional version of a transparent models component, then the latter can be used to that effect (although this might also necessitate a working modeling component). If such a component is not foreseen in the AUI (or not available at the time of the evaluation), then an alternative ad-hoc approach to the representation of the model should be sought (e.g., with an observer simulating the model, in a ãwizard of ozä type of study).

Expert-based evaluation might also be of use in the early design and evaluation stages for Module A1. In particular, experts may be able to contribute towards the evaluation of correctness of inferencing / interpretations, and comprehensiveness and redundancy of the model. Such involvement of experts could be beneficial if part of the user model is related to the applicationâs domain model (e.g., in student models); if the inferencing / modeling process seeks to capture some special user characteristics (e.g., userâs ability to interact through a particular input device); etc.

3.3.2Ê Module A2

Comprises: explicitly provided knowledge, and modeling. This module is very similar to the preceding one, with the following exceptions:

ð      Since there is no automatic assessment of the interaction, nor any attempt to elicit / infer information based on such assessment, any related evaluation criteria (including, for example correctness) are not relevant.

ð      Additional criteria that may be considered include: the transparency of the process (i.e., whether, and to what extent, the users can understand and / or predict how the information they provide affects the models maintained by the AUI); the overhead that may be imposed on the main interaction tasks by the explicit provision of knowledge; etc.

ð      The involvement of experts in the evaluation of this module might not yield as valuable results as in the case of Module A1. This is due to the fact that the direct ãmanipulationä of the model(s) is tightly coupled to the usersâ mental model of what is being modeled and how, which may be quite hard to simulate or predict.

3.3.3Ê Module B

Comprises: adaptation decision making. The goal of evaluation in this module is to ensure that the adaptation decisions made by the respective component are ãcorrectä. Correctness in this context may be related to the following evaluation criteria: necessity of adaptation (i.e., is an adaptation indeed required in the current interaction context?); appropriateness of adaptation (i.e., is the adaptation decided upon one that can cater for the requirements posed by the current interaction context?); acceptance of adaptation (i.e., does the user think that the adaptation is both necessary and appropriate?); etc.

A fundamental difference between this module and previous ones is that it does not (initially) require that any parts of the adaptation infrastructure have been implemented (although it does require that the alternative interaction artifacts have been designed). This is due to the fact that the adaptation logic relates interaction states (as these are depicted in the maintained models) to specific adaptations; thus, if such states can be reproduced or even simulated, it is possible to evaluate the related decisions ãin contextä. The decomposability of adaptation logic is of course constrained by the degree to which adaptation decisions affect each other (e.g., two decisions may be mutually exclusive, if they affect the same facets of interaction in different ways).

In practical terms, in a typical adaptation design cycle a theory, a set of hypotheses, or past empirical findings, will serve as input to the initial corpus of adaptation logic. This corpus can then be validated, in a first stage, using mostly formative evaluation methods to assess the necessity and appropriateness of adaptations.

Contrary to the above, it may be difficult (or even impossible) to extrapolate the overall acceptance of an adaptation decision in the same manner. This, combined with the requirement to further explore the other two criteria, when the entire corpus of adaptation logic is ãactiveä, points to the necessity of a second stage of evaluation in this module, in which users will experience adaptation decisions in ãreal timeä.

In either stage, to enable the participation of users in the evaluation, there needs to exist an explicit representation of the decisions made. This is a non-trivial requirement, especially in the case that the components that undertake decision making and adaptation application are separate (because, then, users would have to attain an understanding of a decision, without detailed knowledge of how it would be applied in practice). If the AUI comprises a transparent adaptation rationale component, then this could be utilized to offer the users the required representation. Otherwise, like in the case