|
The workshop's proceedings are available as a single
file, and as individual papers (the
presentations delivered during the workshop are
now also available along with their corresponding papers).
The normative reference for these proceedings is:
S. Weibelzahl, A. Paramythis, and J. Masthoff
(Eds.) Proceedings of the Fourth Workshop on the Evaluation of
Adaptive Systems, held in conjunction with the 10th International
Conference on User Modeling (UM'05), Edinburgh, UK , July 24th
to 30th, 2005.
Download the full proceedings of
the workshop ( 1
MB).
Using the list below, you can also access paper abstracts,
individual papers and the presentations (not available
yet) delivered at the workshop.
Main Track
- Potentials of Eye-Movement Tracking in Adaptive
Systems
R. Bednarik, pp. 1-8
- The Evaluation of in-vehicle Adaptive Systems
Talia Lavie, Joachim Meyer, Klaus Bengler, and Joseph
F. Coughlin, pp. 9-18
- Is the ACT-value a Valid Estimate for Knowledge?
An Empirical Evaluation of the Inference Mechanism of an Adaptive
Help System
D. Iglezakis, pp. 19-26
- Impacts of User Modeling on Personalization of
Information Retrieval:
An Evaluation with Human Intelligence Analysts
E. Santos Jr., Q. Zhao, H. Nguyen, and H. Wang,
pp. 27-36
- Evaluating Scrutable Adaptive Hypertext
M. Czarkowski, pp. 37-46
- Layered Evaluation of Topic-Based Adaptation to
Student Knowledge
S. Sosnovsky, and P. Brusilovsky, pp.
47-56
- Problems and Pitfalls in Evaluating Adaptive Systems
S. Weibelzahl, pp. 57-66
Evaluation Challenge Track
Potentials of Eye-Movement Tracking in
Adaptive Systems
R. Bednarik
pp. 1-8
Abstract. Eye-movement tracking
proved its potentials in many areas of human-computer interaction.
Resting on a hypothesis that eye-direction and mind are linked,
some of the HCI researchers have employed eye-movement trackers
to investigate the visual attention focus of the participants completing
their tasks. Others have used the eye-movement tracking in real-time
applications, either as a direct interaction device or as an input
to gaze-aware interfaces. Inspired by the previous HCI applications,
we propose to utilize eyemovement trackers in adaptive systems research
and development in two ways. First, the evaluations of adaptive
systems could get an access to the information otherwise unavailable,
as for instance to how the visual attention and cognitive processing
are influenced by an adaptivity implemented into the evaluated system.
Second, we propose to employ the eye-movement tracking technologies
for a real-time registration of users’ loci of visual attention,
therefore increasing the awareness of the adaptive systems about
their current users. We discuss possible potentials, difficulties
and pitfalls of eye-movement tracking when applied to adaptive systems.
We argue that a methodological framework of applying eye-tracking
into adaptive systems shall be developed.
The Evaluation of in-vehicle Adaptive
Systems
Talia Lavie, Joachim Meyer, Klaus Bengler, and Joseph F. Coughlin
pp. 9-18
Abstract. Although research on
adaptive systems has begun only recently, studies have shown the
benefits of using adaptive systems. However most of those studies
have examined systems with and without adaptive qualities, disregarding
additional factors that may influence the interaction. This study
presents a first step towards a more comprehensive evaluation of
adaptive systems. We assert that adaptive systems should be examined
with regard to different types of tasks, different situations and
using various users to be able to determine the conditions in which
adaptivity will be beneficial. A preliminary study evaluated adaptivity
when performing routine and infrequent tasks. The study showed that
adaptivity is beneficial for routine tasks, and that adaptivity
impairs performance of infrequent tasks. The study proposes a method
to calculate the point at which adaptivity ceases to be beneficial
as a function of the relative frequencies of different tasks and
provides a starting point for a more comprehensive understanding
of the subject.
Is the ACT-value a Valid Estimate
for Knowledge? An Empirical Evaluation
of the Inference Mechanism of an Adaptive Help System
D. Iglezakis
pp. 19-26
Abstract. This paper reports the
results of an empirical study that evaluates the inference mechanism
of an adaptive help system for web-based applications. The help
system adapts to a measure for the procedural knowledge that is
computed from activity logfiles according to the ACT-theory of Anderson
and Lebière [1]. The results of the study show that the ACT-value
of procedural knowledge correlates with subjective and objective
measures of performance and proves itself as a better estimate of
the procedural knowledge than general computer knowledge, a measure
often used by other adaptive help systems.
Impacts of User Modeling on Personalization
of Information Retrieval: An Evaluation with Human Intelligence
Analysts
E. Santos Jr., Q. Zhao, H. Nguyen, and H. Wang
pp. 27-36
Abstract. User modeling is the
key element in assisting intelligence analysts to meet the challenge
of gathering relevant information from the massive amounts of available
data. We have developed a dynamic user model to predict the analyst’s
intent and help the information retrieval application better serve
the analyst’s information needs. In order to justify the effectiveness
of our user modeling approach, we have conducted a user evaluation
study with actual end user, three working intelligence analysts,
and compared our user model enhanced information retrieval system
with a commercial off-the-shelf system, the Verity Query Language.
We describe our experimental setup and the specific metrics essential
to evaluate user modeling for information retrieval. The results
show that our user modeling approach tracked individual’s
interests, adapted to their individual searching strategies, and
helped retrieve more relevant documents than the Verity Query Language
system.
Evaluating Scrutable Adaptive
Hypertext
M. Czarkowski
pp. 37-46
Abstract. Adaptive hypertext systems
personalise documents to meet the individual’s particular
preferences, knowledge and goals. There is a debate over how much
control should be given to the user as well as how much transparency
there should be to the inner workings of the system. Some adaptive
systems make the user model available to the user. We propose transparency
and control should extend beyond this by involving the user in the
personalisation process and granting them power to change it. Our
previous evaluations of scrutable systems have revealed users have
difficulty understanding and controlling personalisation. We have
developed SASY with a focus on improving scrutability support tools.
This paper describes our design for the evaluation of SASY.
Layered Evaluation of Topic-Based
Adaptation to Student Knowledge
S. Sosnovsky, and P. Brusilovsky
pp. 47-56
Abstract. A user modeling server
is an important part of modern distributed ELearning architectures.
The user modeling server CUMULATE has two main levels: the event
storage and multiple inference agents. To evaluate adaptive systems
functioning as components of the common distributed architecture
and using CUMULATE as the central user modeling server we need to
evaluate the adaptation provided by those agents. Unfortunately,
there are no commonly accepted approaches to the evaluation of the
universal user modeling server. This paper describes the results
of layered evaluation of our recent topic-based adaptation engine
based on the activity students performed using the system QuizGuide.
User modeling and adaptation processes are evaluated separately.
While previous evaluation experiments of QuizGuide based on the
traditional “with-and-without” approach showed that
students like the system and benefit from it, this paper provides
evidence of unfitness of large topics as knowledge assessment units
used for adaptation, which challenges the reasonableness of the
entire adaptation performed by the system.
Problems and Pitfalls in Evaluating
Adaptive Systems
S. Weibelzahl
pp. 57-66
Abstract. Empirical studies with
adaptive systems offer many advantages and opportunities. Nevertheless,
there is still a lack of evaluation studies. This paper lists several
problems and pitfalls that arise when evaluating an adaptive system
and provides guidelines and recommendations for workarounds or even
avoidance of these problems. Among other things the following issues
are covered: relating evaluation studies to the development cycle;
saving resources; specifying control conditions, sample and criteria;
asking users for adaptivity effects; reporting results. An overview
of existing evaluation frameworks shows which of these problems
have been addressed in which way.
Introduction to the First Adaptive
System Evaluation Challenge
pp. 65-68
Abstract. The Fourth Workshop
on Evaluation of Adaptive Systems launched the first “evaluation
challenge”. The challenge concerns an adaptive system, which
recommends sequences of music clips to groups of users. Focusing
on a real world problem, the challenge aimed to foster the development
of innovative evaluation designs as well as encourage controversial
discussion during the workshop. Participation in the challenge entailed
proposing an empirical evaluation design purposely created to answer
specific design questions regarding the system’s modeling
component. This chapter describes the task and participation requirements
given to the participants. The following two chapters contain the
two submissions received.
Evaluating an Adaptive Music-Clip
Recommender System
T. Zhu, and R. Greiner
pp. 69-73
Abstract. In this paper, we propose
an experiment design to address three evaluation goals of “First
Adaptive System Evaluation Challenge”, and demonstrate how
to achieve each of these goals.
Addressing Problems in the First
Adaptive System Evaluation Challenge
D. Chin
pp. 74-78
Abstract. The adaptive evaluation
challenge system has several problems with assumptions including
ignoring other emotions, ignoring other influences on happiness
and assuming ratings of 5.5 have neutral happiness impact. The recommended
evaluation combines established affective response to music surveys
and heart rate monitoring to measure happiness. First, ratings are
recalibrated by asking users whether music clips affect their emotions
positively, neutrally, or negatively. Clips that affect emotions
other than happiness are filtered out and a pilot study is used
to determine how many positive/negative clips are needed to maximize/minimize
Happiness. Finally a custom series of clips are designed for each
user in the study following standard DJ music selection practices.
The measured/reported happiness of the users is compared to the
proposed Happiness models to determine the best model. Ethics of
the study are also discussed.
|