The goal of CLEF NewsREEL is to provide a vehicle for the IR/recommender system communities to move from conventional offline evaluation to online evaluation. We address the following information access challenge: Whenever a visitor of an online news portal reads a news article on their side, the task is to recommend other news articles that the user might be interested in.

  • Task 1: NewsREEL Live: The first subtask implements the idea of living laboratories, i.e., researchers gain access to the resources of a company to evaluate different recommendation techniques using A/B testing.
  • Task 2: NewsREEL Replay: This subtask focuses on replaying a previously recorded data stream of activities, hence allowing to benchmark news recommendations in a simulated environment.
  • Lab website:
  • Lab Coordination: Frank Hopfgartner (University of Glasgow, UK) Torben Brodt (plista GmbH, Berlin, DE)


LifeCLEF lab aims at boosting research on the identification of living organisms and on the production of biodiversity data in general. Through its biodiversity informatics related challenges, LifeCLEF is intended to push the boundaries of the state-of-the-art in several research directions at the frontier of multimedia information retrieval, machine learning and knowledge engineering. More concretely, the lab is organized around three tasks (and one pilot task):

  • PlantCLEF: crowdsourced biodiversity monitoring from mobile photo streams. The task consists of recognizing a large set of plant species within a sample of Pl@ntNet mobile app query stream. As training data, we will provide a considerable extension of the collaborative data used within last year’s PlantCLEF challenges (10K species illustrated by more than 500k plant observations).
  • BirdCLEF: bird species identification from bird calls and songs. The task consists of detecting bird species in crowdsourced audio recordings and soundscapes (that can contain up to several tens of birds singing simultaneously). The training data contains around 1500 bird species with several dozens of thousands of Xeno-canto recordings associated to various metadata such as the type of sound (call, song, alarm, flight, etc.), date, locality, comments, ratings, etc. Soundscapes in the test set are associated with time-coded annotations which required a great human annotation effort.
  • SeaCLEF: Organisms Identification in Sea-Related Visual Data. The task is related to marine organism identification for ecological surveillance and biodiversity monitoring based on several modalities: stream from submarine cameras but also thermal and stereo images. It will involve around 200 marine animal species mainly from Taiwan and Caribbean coral reefs. All the task data will be provided with additional metadata describing water depth, GPS coordinates, etc.
  • HabitatCLEF (pilot Task): observation-based habitat classification. Habitat categories are a way of reducing the complexity of the natural world to make it more understandable. A good classification helps interpret such data to produce information and add to our knowledge of the environment. Recognizing and monitoring habitats in crowdsourced streams such as Pl@ntNet and Xeno-Canto, should allow to focus analysis in particular in regions where they are not well known today.
  • Lab website:
  • Lab Coordination: Alexis Joly (INRIA, Sophia-Antipolis, FR), Henning Müller (University of Applied Sciences Western Switzerland in Sierre, CH), Hervé Goëau (CIRAD, FR)


  • Author identification - This year, we will be dealing with author clustering and style breach detection. The author clustering task (given a set of single-author documents, group them by authorship) will focus on
  • short documents of paragraph length. Style breach detection has the goal of identifying breaches of writing style in longer texts, which is a prerequisite to identifying changes of authorship within a document.
  • Author profiling - Gender and language variety identification in Twitter: demographic traits such as gender and language variety have been so far investigated separately. In this author profiling shared task we will
  • provide participants with a Twitter corpus annotated with authors' gender and their specific variation (e.g. UK, US; Spain, Argentina; Brazil, Portugal) of their native language (e.g. English, Spanish, Portuguese...).
  • Author obfuscation - This task works against identification and profiling by automatically paraphrasing a text to obfuscate its author's style. The tasks offered are author masking and obfuscation evaluation, where the former asks participants to devise a software that rewrites a given piece of text so as to maintain its meaning while altering or destroying its original author's style. In the latter task, participants will take an
  • active role within the former task's evaluation, experimenting with new ways of doing so.
  • Lab website:
  • Lab Coordination: Martin Potthast, Benno Stein, Matthias Hagen (Bauhaus-Universitat Weimar, DE), Paolo Rosso (Universitat Politècnica de València, SP), Francisco Rangel (Autoritas Consulting, SP), Efstathios Stamatatos (Univerisity of the Aegean, GR)

CLEF eHealth

CLEF eHealth 2017 Evaluation Lab: Laypeople find eHealth documents to be difficult to understand, also clinicians and policy-makers have problems in understanding the jargon of other professional groups. However, authors of both evidence-based practice guidelines, care documents, and consumer leaflets are overloaded with information and face many challenges in the timely and efficient generation, processing, and sharing of such information. Web search engines are commonly used as a means to access health information available online. However, the reliability, quality, and suitability of the information for the target audience varies greatly. The information seekers on the other hand also experience issues in expressing their information needs as search queries. CLEF eHealth aims to bring together researchers working on related information access topics and providing them with datasets to work with and validate the outcomes. This, the sixth year of the lab, offers the following three tasks.

  • Task 1. Multilingual Information Extraction: We challenge the participants to extract the causes of death from death certificates, authored by physicians in European languages. This can be seen as named entity recognition, normalization, and/or text classification. The task also has a replication track, where the participants are given the option to have evaluated not only their automatically codified documents but also the system used to produce them.
  • Task 2. Technologically Assisted Reviews in Empirical Medicine: We challenge the participants to retrieve medical studies relevant to conduct a systematic review on a given topic. This can be seen as the total recall problem and is addressed by both query generation and document ranking. The task consists of two subtasks: interactive information retrieval; and query construction (optional task).
  • Task 3. Patient-centred Information Retrieval: We challenge the participants to retrieve web pages that fulfil a given patient’s personalised information need. This needs to fulfil the following criteria: information reliability, quality, and suitability, at both the individual query level and optimised over an entire search session. The task also has a multilingual track, which challenges participants to fulfil these information needs in a multilingual setting.
  • The tasks are open for everybody. We particularly welcome academic and industrial researchers, scientists, engineers and graduate students in natural language processing, machine learning and biomedical/health informatics to participate. We also encourage participation by multidisciplinary teams that combine technological skills with clinical expertise.
  • Lab website:
  • Lab coordination: Lorraine Goeuriot (Univ. J.Fourier, FR), Evangelos Kanoulas (Univ. of Amsterdam, NL), Liadh Kelly (Trinity College, IR), Aurélie Névéol (CNRS-LIMSI, FR), Joao Palotti (Vienna Univ., AU), Aude Robert (INSERM/CepiDC, FR), Rene Spijker (Julius Center, NL), Hanna Suominen (Australian National Univ., AUS), Guido Zuccon (Queensland Univ. of Technology, AUS)

CLEF Cultural Microblog Contextualization

It deals with how cultural context of a microblog affects its social impact at large. This involves microblog search, classification, filtering, language recognition, localization, entity extraction, linking open data and summarization. Regular Lab participants have access to the private massive multilingualmicroblog stream of The festival galleries project. The mircroblog stream and related urls of festivals is appropriate to experiment advanced social media search and mining methods. There are three tasks: 1 -Content Analysis: Given a stream of microblogs the content analysis, the task consists in filtering microblogs dealing with festivals, language(s) identification, event localization,author categorization, DBpedia entities recognition and automatic summarization of linked wijkipedia pages in four languages. 2 -MicroBlog Search: Given a cultural entity as a set of WikiPedia pages. This task will involved two sub-tasks: Task 2a: Retrieval of relevant microblogs for an entity; Task 2b: Summarization of the most informative microblogs. 3 -Time Line Illustration: The goal of the Timeline illustration based on Microblogs is to provide, for each event of a cultural festival, the most interesting tweets.

  • Lab Coordination: Liana Ermakova, JosianeMothe (Institut de Recherche en Informatique de Toulouse, FR), Lorraine Goeuriot, Philippe Mulhem (Université Grenoble Alpes, FR), Fionn Murtagh (University of Derby & University of London, UK), Jian Yun Nie (Université de Montréal, CA), Eric SanJuan (Université d'Avignon, FR)
  • Lab website:


ImageCLEFlifelog: The availability of a large variety of personal devices, such as smartphones, video cameras as well as wearable devices that allow capturing pictures, videos, and audio clips in every moment of our life is creating the need for systems that can automatically analyse the huge amounts of data stored every day in order to categorize, summarize and also query them to retrieve the information that the user may need. The task addresses the problems of lifelogging data retrieval and summarization.

  • Organizers: Duc-Tien Dang-Nguyen <>, Luca Piras <>, Michael Riegler <>, Cathal Gurrin <>, Giulia Boato <>, Pål Halvorsen
  • ImageCLEFcaption: Interpreting and summarizing the insights gained from medical images such as radiology output is a time-consuming task that involves highly trained experts and often represents a bottleneck in clinical diagnosis pipelines. Consequently, there is a considerable need for automatic methods that can approximate this mapping from
  • visual information to condensed textual descriptions. The task addresses the problem of bio-medical image caption prediction from large amounts of training data.
  • Organizers: Carsten Eickhoff <>, Immanuel Schwall <>, Henning Müller
  • ImageCLEFtuberculosis: The objective of the task is to determine the TB subtypes and drug resistances as much as possible automatically from the volumetric image information (mainly texture analysis) and based on clinical information that is available such as age, gender, etc. Being able to extract the tuberculosis type and drug resistances
  • based on the image data alone can allow to limit lung washing and laboratory analyses to determine the TB type and drug resistances. This can lead to quicker decisions on the best treatment strategy, reduced use of antibiotics and lower impact on the patient.
  • Organizers: Vassili Kovalev <>, Henning Müller , Alexander Kalinovsky <>
  • ImageCLEFremote (pilot task - FabSpace 2.0 Exploring Sentinel Copernicus Images): The objective of the task is to explore Earth observation data images (Sentinel Copernicus satellite images) in order to discover unknown nformation. Before engaging any rescue operation or humanitarian action, NGOs need to evaluate the local population as accurately as possible. Current tools can only do a partial job since additional data needs to be considered for predicting the population correctly. In this task, participants will be given various zones plus some contextual information. They will have to provide the prediction of the population, first as a number, then as a range (min, max).
  • Organizers: Josiane Mothe <>, Dimitrios Soudris , Bayzidul Islam
  • Lab website:
  • Lab Coordination: Bogdan Ionescu (University Politehnica of Bucharest, RO), Mauricio Villegas (Universitat Politècnica de València, SP), Henning Müller (University of Applied Sciences in Sierre, CH)


The primary aim of the PIR-CLEF laboratory is to provide a framework for evaluation of Personalised Information Retrieval (PIR). Current approaches to the evaluation of PIR are user-centered, i.e., they rely on experiments that involve real users in a supervised environment.

The pilot test collection will provide all the traditional components needed in a laboratory-based evaluation experiment plus a set of user-related information for modelling and introducing profiles in the evaluation experiment:

  • user personal information: including gender, age range, native language, and occupation.
  • search logs: which contain the history of the user’s interactions with a search engine.
  • the user’s documents of interest: provided as useful and raw sources to extract topical user preferences.
  • basic user profile representations in the form of bag-of-words will be also provided with the aim of offering a basic model of the user’s topical interests.
  • user satisfaction: a satisfaction grade decided by the user and providing a feedback on the ranking of documents.
  • We will use ClueWeb12, which contains over 730 million Web pages, as the basic repository to extract the above mentioned user-related information. Recognising that operating a ClueWeb12 service is a significant undertaking, API access to an existing ClueWeb12 search service will be made available to participants.
  • Piltot Task: A pilot PIR task intended to enable practical exploration of our proposed PIR evaluation methodology, with the intention of offering a fully tuned Lab at CLEF 2018.

The workshop at CLEF 2017 will include a report on the PIR-CLEF pilot task, with short participant presentations, some invited presentations on the themes of benchmarking, personalization and adaptation, a discussion of potential PIR tasks for CLEF 2018.

  • Workshop website:
  • Workshop Coordination: Gabriella Pasi, Stefania Marrara (University of Milano Bicocca, IT), Gareth Jones, Debais Ganguly (Dublin City University, IR) Nicola Ferro, Maria Maistro (University of Padova, IT)

Early risk prediction on the Internet

The main purpose of eRisk 2017 is to explore issues of evaluation methodology, effectiveness metrics and other processes related to early risk detection. Early detection technologies can be employed in different areas, particularly those related to health and safety. For instance, early alerts could be sent when a predator starts interacting with a child for sexual purposes, or when a potential offender starts publishing antisocial threats on a blog, forum or social network. Our main goal is to pioneer a new interdisciplinary research area that would be potentially applicable to a wide variety of situations and to many different personal profiles. Examples include potential paedophiles, stalkers, individuals that could fall into the hands of criminal organizations, people with suicidal inclinations, or people susceptible to depression.

eRisk 2017 has two possible ways to participate: research paper submission and participation in a pilot task on early detection of depression.

  1. 1. Research Papers Submission: The workshop is open to the submission of papers describing test collections or data sets suitable for early risk prediction, early risk prediction challenges, tasks and evaluation metrics or specific early risk detection solutions.
  2. 2. Pilot Task: Early Detection of Depression: The second way of participation consists in performing a pilot task on early risk detection of depression. This is an exploratory task on early risk detection of depression. The challenge consists of sequentially processing pieces of evidence and detect early traces of depression as soon as possible
  • Workshop website:
  • Workshop Coordination: David E. Losada (University of Santiago de Compostela, SP), Fabio Crestani (University of Lugano, CH), Javier Parapar (University of A Coruña, SP)

Dynamic Search for Complex Tasks

Information Retrieval research has traditionally focused on serving the best results for a single query — so-called ad hoc retrieval. However, users typically search iteratively, refining and reformulating their queries during a session. A key challenge in the study of this interaction is the creation of suitable evaluation resources to assess the effectiveness of IR systems over sessions. The goal of the CLEF Dynamic Search lab is to propose and standardize an evaluation methodology that can lead to reusable resources and evaluation metrics able to assess retrieval performance over an entire session, keeping the “user” in the loop.

The objective of the lab is threefold:

  • to produce the methodology and algorithms that will lead to a dynamic test collection by simulating the users;
  • to understand and quantify in terms of evaluation measures what constitutes a good ranking of documents at different stages of a session, and a good ranking for the overall session;
  • to develop algorithms that can provide an optimal ranking throughout a user's session.

There are two possible ways to participate:

  1. By submitting a scientific paper: The focus of this year's lab will be the evaluation of interactive information retrieval algorithms. We solicit the submission of two types of papers: (a) position papers, and (b) data papers. Position papers should focus on evaluation methodologies for assessing the quality of search algorithms with the user in the loop, under two constraints: any evaluation framework proposed should allow the (statistical) reproducibility of results, and lead to a reusable benchmark collection. Data Papers should focus on describing test collections or data sets suitable for guiding the construction of dynamic test collections, tasks and evaluation metrics.
  2. By submitting a run to a pilot task: The focus of this year's pilot task will be the simulation of user's query reformulations. This is the first and hardest step towards constructing dynamic test collections.

Ideally, one would like to be able to generate query reformulations. However, for the purpose of this workshop the pilot task will focus on predicting which user's query comes next out of a set of user queries on a given topic.

Multimodal Spatial Role Labeling

CLEF 2017 Workshop on multimodal Spatial Role Labeling The main goal of multi-modal Spatial Role Labeling (mSpRL) is to explore the extraction of spatial information from two information resources that is image and text. This is important for various applications such as semantic search, question answering, geographical information systems and even in robotics for machine understanding of navigational instructions or instructions for grabbing and manipulating objects. It is also essential for some specific tasks such as text to scene conversion or vice-versa, scene understanding as well as general information retrieval tasks when using huge amount of available multimodal data from various resources. Moreover, there is an increasing interest in extraction of spatial information from medical images that are accompanied by natural language descriptions.

mSpRL 2017 has two possible ways to participate: research paper submission and participation in a pilot task on multimodal Spatial Role Labeling.

  • Workshop website:
  • Workshop Coordination: Parisa Kordjamshidi (Tulane University, USA), Taher Rahgooy (Bu-Ali Sina University, IR), Marie-Francine Moens (KULeuven, BE), James Pustejovsky (Brandeis University, USA), Kirk Roberts (University of Texas, USA), Oswaldo Ludwig (Zalando Research, DE)