AH2004 Logo

Third Workshop on Empirical Evaluation of Adaptive Systems

in conjunction with AH2004

Eindhoven University of Technology, The Netherlands, August 23 - 26
 
 

The proceedings of the workshop are now available as part of the TU/e CS-Report 04-18 "AH2004: Workshop Proceedings - Part I" (ISSN 0926-4515). Apart from the EEAS proceedings, this report contains also the proceedings for the workshops "Engineering the Adaptive Web" and "Individual Differences in Adaptive Hypermedia".

The above report will eventually be available from the TU/e on-line library at the URL http://alexandria.tue.nl/extra1/wskrap/publichtml/200418.pdf (please note that this URL is not active at the time of writing, but is the normative URL for on-line references). In the meantime, you can download a local version of the report (PDF 3.67 MB), but be forwarned that this version does not have page numbers and the table of contents is available as a separate document (PDF 33 KB).

If you like you can alternatively download only the EEAS workshop papers as a single file (PDF 863 KB), again without page numbers.

Finally, using the list below, you can access paper abstracts, invidual papers and the presentations delivered at the workshop.


The First Click is the Deepest: Assessing Information Scent Predictions for a Personalized Search Engine, by Karen Church, Mark T. Keane & Barry Smyth

Abstract. "First-click behavior" describes one of the most commonly occurring tasks on the Web, where a user submits a query to a search engine, examines a list of results and chooses a link to follow. Even though this task is carried out a billion times a day, our understanding of the factors influencing this behavior is poorly developed. In this paper, we empirically evaluate information scent predictions for first-click behavior in the use of a personalized search engine, called I-SPY. Our experiments show that the predictive accuracy of current information foraging approaches is not good. To conclude, we advance a framework designed to understand first-click behavior and guide future research.

Downloads. Paper (PDF 127 KB), Presentation (PDF 349 KB)

Proposed Evaluation Framework for Adaptive Hypermedia Systems, by Arpita Gupta, & P. S. Grover

Abstract. Although a number of frameworks exist for the evaluation of Adaptive Hypermedia Systems (AHS), recently suggested layered frameworks have proved useful in identifying the exact cause of the adaptation failure or any other error in the system. This paper presents an evaluation framework for AHS for internet which is an extension of the layered frameworks and adds new dimensions to them. It treats evaluation as an integral part of development process of AHS and also evaluates the successful access of AHS on the internet. The framework has four dimensions which are orthogonal to each other – Environment – the environment in which AHS is accessed, Adaptation – the type and the level of adaptation used, Development Process – software engineering life cycle steps used for developing AHS and the Evaluation Modules – the layers of AHS which are evaluated in context of other dimensions.

Downloads. Paper (PDF 62 KB), Presentation (PDF 29 KB)

An Example of Evaluation applied to a course adapted to learning styles of CHAEA’s test, by Mª del Puerto Paule Ruiz, Juan Ramón Pérez Pérez & Martín González Rodríguez

Abstract. This paper shows the results of an evaluation of a course adapted to learning styles of CHAEA’s test. It is a comparative analysis between an adapted course and a course without adaptation also.

Downloads. Paper (PDF 118 KB)

Evaluating Intelligent Tutoring Systems with Learning Curves, by Brent Martin & Antonija Mitrovic

Abstract. The evaluation of Intelligent Tutoring Systems, like any adaptive system, can have its difficulties. In this paper we discuss the evaluation of an extension to an existing system that uses Constraint-Based Modelling (CBM). CBM is a student modelling technique that is rapidly maturing, and is suited to complex, open-ended domains. A problem with complex domain models is their large size, necessitating a comprehensive problem set in order to provide sufficient exercises for extended learning sessions. We have addressed this issue by developing an algorithm that automatically generates new problems directly from the domain knowledge base. However, evaluation of this approach was complicated by the need for a lengthy (and therefore uncontrolled) study as well as other unavoidable differences between the control and experimental systems. This paper presents the evaluation and discusses those issues, and the way in which we used learning curves as a tool for comparing disparate learning systems.

Downloads. Paper (PDF 106 KB), Presentation (PDF 86 KB)

Evaluation of Effects on Retrieval Performance for an Adaptive User Model, by Hien Nguyen, Eugene Santos Jr., Qunhua Zhao & Chester Lee

Abstract. One of the challenging problems for evaluating the effectiveness of a user model with regards to retrieval performance is the absence of an evaluation method that offers the ability to compare with other existing approaches while assessing the new features offered by a user model. In this paper, we report our method of using collections, procedures and metrics from the information retrieval community to evaluate a cognitive user model which captures user intent to improve retrieval performance and adapts to a users's interests, preferences, and context. Specifically, by starting with an empty user model for each query, we simulate the process of assessing the short-term effects of relevance feedback techniques in traditional information retrieval. By using a seed user model learned from relevance feedback, we assess both short and long-term effects on the entire search session. In this paper, we show how we can compare user modeling approaches by using the above method, against a classic information retrieval approach, the Ide dec-hi, using CACM and Medline collections. This evluation also helps analyze and address the strengths and weaknesses of our model and develops appropriate solutions.

Downloads. Paper (PDF 170 KB), Presentation (PDF 398 KB)

Evaluating a General-Purpose Adaptive Hypertext System, by Chris Staff

Abstract. We describe the evaluation process of HyperContext, a framework for general-purpose adaptive and adaptable hypertext. In particular, we are interested in users’ short-term, transient, interests. We cannot make any prior assumptions about a user’s interest or goal, as we do not have any prior knowledge of the user. We conducted evaluations on two aspects of HyperContext. One evaluation was completely automated, and the other involved participants. However, the availability of a test collection with value judgements would be a considerable asset for the independent and automated evaluation of adaptive hypertext systems in terms of cost, reliability of results, and repeatability of experiments.

Downloads. Paper (PDF 221 KB), Presentation (PDF 393 KB)

Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval, by Ross Wilkinson & Mingfang Wu

Abstract. It has long been a tradition of evaluating information retrieval systems with very simple user models and very simple tasks: the task is to retrieve relevant documents to a user need described by a query. TREC, the Text REtrieval Conference sponsored by NIST, raised the bar by providing large scale collections, well defined user needs, independently judged documents, and a specified form of success. Groups from around the world all tackled this same task that allowed wide analysis of just what factors influenced system performance. Yet there was concern, as system performance improvement did not always lead to human performance improvement, so a concerted effort to study how people interact with information retrieval systems was undertaken in the Interactive Track of TREC. This paper describes this track, some of the experiments that we have undertaken in this track, and highlights some of the real problems in such evaluation.

There are two key issues that we have often observed in interactive information retrieval. The first issue is that human preference is often not correlated with human performance. Consequently, evaluation that relies solely on either form of evaluation is not reliable. The second issue is that genuine improvements are very difficult to validate, as system variation tends to be dominated by task variation and user performance variation. Consequently, the statistical power of these experiments, often already very expensive to conduct because of user participation, can be quite low. Thus we argue for staged experiments where only very “obvious” system performance gains are explored.

In the end, simple performance measures have proved less helpful than deeper analysis of just how people interact with their information systems.

Downloads. Paper (PDF 169 KB), Presentation (PDF 368 KB)