CLIPS, Corpus of Spoken Italian
Today has been announced the final release of a new corpsu of spoken Italian (CLIPS), directed by Federico Albano Leoni (University La Sapienza of Rome). The key features of the corpus are: free distibution of audio and transcription, explicit Eagles compliant documentation, and above all phonetic transcription of a section of the collected material.Here is the brief description of the corpus made by Federico Albano Leoni:
"CLIPS, a corpus of spoken Italian, is freely available at www.clips.unina.it. The corpus (audio files, annotation and documentation) are fully downloadable from the website via ftp, free for research purposes.
CLIPS consists of about 100 hours of speech, equally represented by female and male voices. A section of the corpus is transcribed orthographically, a smaller section has been phonetically labeled. Recordings were made in 15 Italian cities, selected on the basis of linguistic and socio-economic principles of representativeness: Bari, Bergamo, Bologna, Cagliari, Catanzaro, Firenze, Genova, Lecce, Milano, Napoli, Palermo, Parma, Perugia, Roma, Venezia.
For each of the 15 cities different text typologies have been included: a) radio and television broadcasts (news, interviews, talk shows); dialogue (240 dialogues collected using the map task procedure and the “spot the difference” game. In this set: 30 dialogues are phonetically labeled, 90 orthographically transcribed); c) read speech from non professional speakers (20 sentences each, covering medium-high frequency Italian words); d) speech over the telephone (conversations between 300 speakers and a simulated hotel desk service operator), e) read speech from 20 professional speakers (160 sentences, covering all phonotactic sequences and medium-high frequency Italian words) recorded in an anechoic chamber.Documentation, corpus collection and annotation follow the EAGLES guidelines.
Labels: corpora, Italian, linguistics, phonetics

