~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CALAMITA – Challenge the Abilities of LAnguage Models in ITAlian
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Special event co-located with the Tenth Italian Conference on Computational Linguistics – CLiC-it 2024 Pisa, 4 – 6 December, 2024 – https://clic2024.ilc.cnr.it/
Project Description
AILC, the Italian Association for Computational Linguistics, is launching a collaborative effort to develop a dynamic and growing benchmark for evaluating LLMs’ capabilities in Italian.
In the long term, we aim to establish a suite of tasks in the form of a benchmark which can be accessed through a shared platform and a live leaderboard. This would allow for ongoing evaluation of existing and newly developed Italian or multilingual LLMs.
In the short term, we are looking to start building this benchmark through a series of challenges collaboratively construed by the research community. Concretely, this happens through the present call for challenge contributions. In a similar style to standard Natural Language Processing shared tasks, participants are asked to contribute a task and the corresponding dataset with which a set of LLMs should be challenged. Participants are expected to provide an explanation and motivation for a given task, a dataset that reflects that task together with any information relevant to the dataset (provenance, annotation, distribution of labels or phenomena, etc.) and a rationale for putting that together that way. Evaluation metrics and example prompts should also be provided. Existing relevant datasets are also very welcome, together with related publications if available. All of the proposed challenges either with existing datasets or new datasets, will have to follow the challenge template, which will be distributed in due time, towards the write-up of a challenge paper.
In this first phase, all prospective participants are asked to submit a pre-proposal.
The organizers examine the submitted pre-proposals and select those challenges that comply with the template’s requirements, with an eye to balancing different challenge types. The selected challenges will be expanded with a full dataset, longer descriptions, etc. according to the aforementioned template which will be distributed later. The final report of each accepted challenge must provide the code for the evaluation with an example that must smoothly run on a pre-selected base LLM (most likely LLaMa-2) which will be communicated by the organisers in the second phase. All reports will be published as CEUR Proceedings related to the CALAMITA event. Subsequently, all challenge organisers who wish to be involved can participate in a broader follow-up paper, targeting a top venue, which will describe the whole benchmark, procedures, results, and analyses.
Once this first challenge set is put together, the CALAMITA organizers will run zero or few shots experiments with a selection of LLMs, and write a final report. No tuning materials or experiments are expected at this stage of the project.
Paper and Data Submissions for Selected Challenges
Authors of selected challenges are requested to upload via Softconf at https://softconf.com/p/calamita2024/ the following items:
- A challenge paper describing the challenge and the data proposed. Please, adhere to the prepared report template, available here: https://www.overleaf.com/read/sffbqqrcxnrj#7edb09. The papers will be reviewed and will be part of the CLIC-it proceedings, published by CEUR. In order to be published as a part of CEUR proceedings, your report will need to be at least 12,500 characters long (approximately 5 pages). A maximum of 10 pages will be allowed.
- Data and Code according to the specifications sent by email from the organizers to all authors of accepted challenges.
Deadlines (tentative)
17th May 2024: pre-proposal submission27th May 2024: notification of pre-proposal acceptanceEnd of May 2024: distribution of challenge paper template and further instructions16th September 2024: data and challenge paper submission28th October 2024: benchmark ready with reports for each challenge (after light review)- October-November 2024: running selected models on benchmark with analyses (the CALAMITA team will do this)
- 4th-6th December 2024: CLIC-it conference in Pisa. Co-located half-day CALAMITA workshop scheduled for December 6th.
Website: https://clic2024.ilc.cnr.it/calamita
Mail: calamita.ailc@gmail.com
Organizers
Chairs
- Pierpaolo Basile (University of Bari Aldo Moro)
- Danilo Croce (University of Rome, Tor Vergata)
- Malvina Nissim (University of Groningen)
- Viviana Patti (University of Turin)
Data and Evaluation Team
- Giuseppe Attanasio (Instituto de Telecomunicações, Lisbon), Coordinator
- Federico Borazio (University of Rome, Tor Vergata)
- Maria Francis (University of Groningen & University of Trento)
- Jacopo Gili (University of Turin)
- Elio Musacchio (University of Bari Aldo Moro)
- Matteo Rinaldi (University of Turin)
- Daniel Scalena (University of Groningen & University of Milan Bicocca)