|
The proceedings of the workshop are now available as part of the
TU/e CS-Report 04-18 "AH2004: Workshop Proceedings
- Part I" (ISSN 0926-4515). Apart from the EEAS proceedings,
this report contains also the proceedings for the workshops "Engineering
the Adaptive Web" and "Individual Differences in Adaptive
Hypermedia".
The above report will eventually be available from the TU/e on-line
library at the URL http://alexandria.tue.nl/extra1/wskrap/publichtml/200418.pdf
(please note that this URL is not active at the time of writing,
but is the normative URL for on-line references). In the meantime,
you can download
a local version of the report (
3.67 MB), but be forwarned that this version does not have page
numbers and the table
of contents is available as a separate document (
33 KB).
If you like you can alternatively download only
the EEAS workshop papers as a single file (
863 KB), again without page numbers.
Finally, using the list below, you can access paper abstracts,
invidual papers and the presentations
delivered at the workshop.
- Proposed Evaluation Framework for Adaptive Hypermedia
Systems,
by Arpita Gupta & P. S. Grover,
pp. 161-171.
- The First Click is the Deepest: Assessing Information
Scent Predictions for a Personalized Search Engine,
by Karen Church, Mark T. Keane & Barry Smyth,
pp. 173-182.
- Evaluating Intelligent Tutoring Systems with Learning
Curves,
by Brent Martin & Antonija Mitrovic,
pp. 183-192.
- Evaluation of Effects on Retrieval Performance
for an Adaptive User Model,
by Hien Nguyen, Eugene Santos Jr., Qunhua Zhao & Chester Lee,
pp. 193-202.
- An Example of Evaluation applied to a course adapted
to learning styles of CHAEA’s test,
by Mª del Puerto Paule Ruiz, Juan Ramón Pérez Pérez & Martín
González Rodríguez,
pp. 203-209.
- Evaluating a General-Purpose Adaptive Hypertext
System,
by Chris Staff,
pp. 211-220.
- Evaluation Experiments and Experience from the
Perspective of Interactive Information Retrieval,
by Ross Wilkinson & Mingfang Wu,
pp. 221-230.
The First Click is the Deepest: Assessing
Information Scent Predictions for a Personalized Search Engine,
by Karen Church, Mark T. Keane & Barry Smyth
Abstract. "First-click behavior" describes one
of the most commonly occurring tasks on the Web, where a user submits
a query to a search engine, examines a list of results and chooses
a link to follow. Even though this task is carried out a billion
times a day, our understanding of the factors influencing this behavior
is poorly developed. In this paper, we empirically evaluate information
scent predictions for first-click behavior in the use of a personalized
search engine, called I-SPY. Our experiments show that the predictive
accuracy of current information foraging approaches is not good.
To conclude, we advance a framework designed to understand first-click
behavior and guide future research.
Proposed Evaluation Framework for Adaptive
Hypermedia Systems, by Arpita Gupta, & P. S. Grover
Abstract. Although a number of frameworks exist
for the evaluation of Adaptive Hypermedia Systems (AHS), recently
suggested layered frameworks have proved useful in identifying the
exact cause of the adaptation failure or any other error in the
system. This paper presents an evaluation framework for AHS for
internet which is an extension of the layered frameworks and adds
new dimensions to them. It treats evaluation as an integral part
of development process of AHS and also evaluates the successful
access of AHS on the internet. The framework has four dimensions
which are orthogonal to each other – Environment – the environment
in which AHS is accessed, Adaptation – the type and the level of
adaptation used, Development Process – software engineering life
cycle steps used for developing AHS and the Evaluation Modules –
the layers of AHS which are evaluated in context of other dimensions.
An Example of Evaluation applied to a
course adapted to learning styles of CHAEA’s test, by Mª
del Puerto Paule Ruiz, Juan Ramón Pérez Pérez & Martín González
Rodríguez
Abstract. This paper shows the results of an evaluation
of a course adapted to learning styles of CHAEA’s test. It is a
comparative analysis between an adapted course and a course without
adaptation also.
Evaluating Intelligent Tutoring Systems
with Learning Curves, by Brent Martin & Antonija Mitrovic
Abstract. The evaluation of Intelligent Tutoring
Systems, like any adaptive system, can have its difficulties. In
this paper we discuss the evaluation of an extension to an existing
system that uses Constraint-Based Modelling (CBM). CBM is a student
modelling technique that is rapidly maturing, and is suited to complex,
open-ended domains. A problem with complex domain models is their
large size, necessitating a comprehensive problem set in order to
provide sufficient exercises for extended learning sessions. We
have addressed this issue by developing an algorithm that automatically
generates new problems directly from the domain knowledge base.
However, evaluation of this approach was complicated by the need
for a lengthy (and therefore uncontrolled) study as well as other
unavoidable differences between the control and experimental systems.
This paper presents the evaluation and discusses those issues, and
the way in which we used learning curves as a tool for comparing
disparate learning systems.
Evaluation of Effects on Retrieval Performance
for an Adaptive User Model, by Hien Nguyen, Eugene Santos
Jr., Qunhua Zhao & Chester Lee
Abstract. One of the challenging problems for
evaluating the effectiveness of a user model with regards to retrieval
performance is the absence of an evaluation method that offers the
ability to compare with other existing approaches while assessing
the new features offered by a user model. In this paper, we report
our method of using collections, procedures and metrics from the
information retrieval community to evaluate a cognitive user model
which captures user intent to improve retrieval performance and
adapts to a users's interests, preferences, and context. Specifically,
by starting with an empty user model for each query, we simulate
the process of assessing the short-term effects of relevance feedback
techniques in traditional information retrieval. By using a seed
user model learned from relevance feedback, we assess both short
and long-term effects on the entire search session. In this paper,
we show how we can compare user modeling approaches by using the
above method, against a classic information retrieval approach,
the Ide dec-hi, using CACM and Medline collections. This evluation
also helps analyze and address the strengths and weaknesses of our
model and develops appropriate solutions.
Evaluating a General-Purpose Adaptive
Hypertext System, by Chris Staff
Abstract. We describe the evaluation process of
HyperContext, a framework for general-purpose adaptive and adaptable
hypertext. In particular, we are interested in users’ short-term,
transient, interests. We cannot make any prior assumptions about
a user’s interest or goal, as we do not have any prior knowledge
of the user. We conducted evaluations on two aspects of HyperContext.
One evaluation was completely automated, and the other involved
participants. However, the availability of a test collection with
value judgements would be a considerable asset for the independent
and automated evaluation of adaptive hypertext systems in terms
of cost, reliability of results, and repeatability of experiments.
Evaluation Experiments and Experience
from the Perspective of Interactive Information Retrieval,
by Ross Wilkinson & Mingfang Wu
Abstract. It has long been a tradition of evaluating
information retrieval systems with very simple user models and very
simple tasks: the task is to retrieve relevant documents to a user
need described by a query. TREC, the Text REtrieval Conference sponsored
by NIST, raised the bar by providing large scale collections, well
defined user needs, independently judged documents, and a specified
form of success. Groups from around the world all tackled this same
task that allowed wide analysis of just what factors influenced
system performance. Yet there was concern, as system performance
improvement did not always lead to human performance improvement,
so a concerted effort to study how people interact with information
retrieval systems was undertaken in the Interactive Track of TREC.
This paper describes this track, some of the experiments that we
have undertaken in this track, and highlights some of the real problems
in such evaluation.
There are two key issues that we have often observed in interactive
information retrieval. The first issue is that human preference
is often not correlated with human performance. Consequently, evaluation
that relies solely on either form of evaluation is not reliable.
The second issue is that genuine improvements are very difficult
to validate, as system variation tends to be dominated by task variation
and user performance variation. Consequently, the statistical power
of these experiments, often already very expensive to conduct because
of user participation, can be quite low. Thus we argue for staged
experiments where only very “obvious” system performance gains are
explored.
In the end, simple performance measures have proved less helpful
than deeper analysis of just how people interact with their information
systems.
|