RepLab 2013 – An evaluation campaign for Online Reputation Management Systems
1. About RepLab
RepLab is a competitive evaluation exercise for Online Reputation Management systems. Like first RepLab held in 2012, this second campaign will be organized as an activity of CLEF, and the results of the exercise will be discussed at the CLEF 2013 conference in Valencia, Spain, on the 23-26th September (see http://clef2013.org for details).
RepLab 2013 will focus on the task of monitoring the reputation of entities (companies, organizations, celebrities, ...) on Twitter. The monitoring task for analysts consists of searching the stream of tweets for potential mentions to the entity, filtering those that do refer to the entity, detecting topics (i.e., clustering tweets by subject) and ranking them based on the degree to which they signal reputation alerts (i.e., issues that may have a substantial impact on the reputation of the entity).
- Filtering. Systems will be asked to determine which tweets are related to the entity and which are not, for instance, distinguishing between tweets that contain the word "Stanford" referring to the University of Stanford and filtering out tweets about Stanford as a place. Manual annotations will be provided with two possible values: related/unrelated.
- Polarity for Reputation classification. The goal will be to decide if the tweet content has positive or negative implications for the company's reputation. Manual annotations are: positive/negative/neutral.
- Topic Detection: Systems will be asked to cluster related tweets about the entity by topic with the objective of grouping together tweets referring to the same subject.
- Assigning priority. The full task involves detecting the relative priority of topics. So as to be able to evaluate priority independently from the clustering task, we will evaluate the subtask of predicting the priority of the cluster a tweet belongs to.
- First, when analyzing polarity for reputation, both facts and opinions have to be considered. For instance, "Barclays plans additional job cuts in the next two years" is a fact with negative implications for reputation. Therefore, systems will not be explicitly asked to classify tweets as factual vs. opinionated: the goal is to find polarity for reputation, that is what implications a piece of information might have on the reputation of a given entity, regardless of whether the content is opinionated or not.
- Second, negative sentiments do not always imply negative polarity for reputation and vice versa. For instance, "R.I.P. Michael Jackson. We'll miss you" has a negative associated sentiment (sadness, deep sorrow), but a positive implication for the reputation of Michael Jackson.
- Polarity. Topics with polarity (and, in particular, with negative polarity, where action is needed) usually have more priority.
- Centrality. A high priority topic is very likely to have the company as the main focus of the content.
- User's authority. A topic promoted by an influential (for example, in terms of the number of followers or the expertise) user has better chances of receiving high priority.
- RELATED/UNRELATED: the tweet is/is not about the entity
- POSITIVE/NEUTRAL/NEGATIVE: the information contained in the tweet has positive/neutral/negative implications for the entity's reputation.
- Identifier of the topic (cluster) the tweet belongs to.
- ALERT/MIDLY_IMPORTANT/UNIMPORTANT: the priority of the topic (cluster) the tweet belongs to.
4. Evaluation Measures
4.1. Full monitoring task: filtering + topic detection + topic priority.
4.2. Filtering subtask
4.3. Polarity for reputation subtask
4.4 Topic detection subtask
4.5 Priority assignment subtask
Here (baselinereplab2013.zip) you will find baseline outputs for all RepLab subtasks: filtering, polarity annotation, topic detection (clustering) and priority assignment.
The baseline is a simple (memory-based) supervised system that matches each tweet in the test set with the most similar tweet in the training set, and assumes that the annotations - for all subtasks - in the tweet from the training set are also valid for the tweet in the test set. Tweet similarity is computed using Jaccard distance and a straightforward bag-of-words representation of the tweets.
You can use these baseline outputs to ensemble a full RepLab 2013 system in combination with your subtask systems.
- The filtering output contains annotations for all tweets.
- The polarity output also contains annotations for all tweets, given that annotating unrelated tweets is not penalized (they are simply ignored in the evaluation).
- For the rest of subtasks (topic detection and priority), only those tweets judged relevant by the baseline filtering are included in the baseline output.
6. Important dates
- April 15: Release of training and test data
May 27June 3: System results due
June 5June 12: Official results released
June 15June 22: Deadline for paper submission
- September 23-26: CLEF 2013 Conference in Valencia, Spain
7. How to submit runs?
Here are the instructions to submit your runs (please note that the deadline is June 3 and cannot be further postponed).
What is the number of runs allowed?
Each group is allowed to send up to 10 runs per subtask: filtering, polarity, topic_detection and priority_detection, plus 10 runs for the full_task.
How to format your submission?
- Each group must pick up a group id (alphanumeric string, preferably short).
- All runs must be packed in a directory named replab2013-<group-id>.
Inside this directory, each run should be in a separate directory named
<group_id>_<subtask>_<run_id> where run_id is a number between 1 and 10. For instance: replab2013-UNED/UNED_full_task_2. The directory will contain one file per subtask included in the run , with up to four files ("filtering","polarity","topic_detection","priority_detection").
Files must follow the specifications of the evaluation package distributed with the data.
How to submit?
The compressed directory with your runs must be sent as a single file to email@example.com and firstname.lastname@example.org (preferably as a download URL), together with a separate file (see this MS Excel template) containing metadata about your runs.
How to prepare your paper for the workshop notes
Each group must prepare one paper describing all experiments in all subtasks, following the formatting guidelines in the CLEF 2013 website. If you feel that your work should be split in more than one report (in cases of disjoint experiments with disjoint authors, for instance), please ask the lab organizers (email to email@example.com).
- Adolfo Corujo (Llorente & Cuenca, Madrid), firstname.lastname@example.org, http://www.adolfocorujo.com
- Julio Gonzalo (UNED, Madrid), email@example.com, http://nlp.uned.es/~julio
- Edgar Meij (Yahoo! Research), firstname.lastname@example.org, http://edgar.meij.pro
- Maarten de Rijke (U. of Amsterdam), email@example.com, http://staff.science.uva.nl/~mdr
- Eugene Agichtein, Emory University, USA
- Alexandra Balahur, JRC, Italy
- Krisztian Balog, U. Stavanger, Norway
- Donna Harman, NIST, USA
- Eduard Hovy, ISI/USC, USA
- Radu Jurca, Google, Switzerland
- Jussi Karlgren, Gavagai/SICS, Sweden
- Mounia Lalmas, Yahoo! Research, Spain
- Jochen Leidner, Thomson Reuters, Switzerland
- Bing Liu, U. Illinois at Chicago, USA
- Alessandro Moschitti, U. Trento, Italy
- Miles Osborne, U. Edinburgh, UK
- Hans Uszkoreit, U. Saarbrücken, Germany
- James Shanahan, Boston U., USA
- Belle Tseng, Yahoo!, USA
- Julio Villena, Daedalus/U. Carlos III, Spain