Brussels, 21/01/2008 (Agence Europe) - In a concern for transparency and in order to promote multilingualism, the European Commission announced, on Friday 18 January, the publication of a vast collection of linguistic data made up from multilingual texts published by the European institutions. This database contains about one million sentences with their high quality translation into 22 of the 23 official EU languages (Gaelic is not yet available). Although there are already many translations of English and French texts on the internet, resources are scarcer for languages such as Latvian or Romanian. This collection of language data will be a precious aid, for example, for those designing machine translation software which “learns”, from texts translated by in-house translators, to translate words and expressions in context correctly. The data may also facilitate the development of other linguistic software tools such as grammar and spell checks, on-line dictionaries and systems for categorising multilingual texts. According to the commissioner for multilingualism, Leonard Orban, this linguistic database will make computer-assisted translation easier, less costly and more accessible. Also, “citizens belonging to the smaller linguistic communities will have easier access to documents and web pages only available in the most used languages”, he said. The commissioner for science and research, Janez Potocnik, felt that this unique collection of linguistic data “contributes to the creation of a new generation of software tools for human language processing and helps foster the competitiveness of the language industry”. The information and communication technologies section of the 7th Framework Programme for Research and Development supports research into automatic translation and other language-related technologies. The Commission has already opened public access to its documentary and terminology databases, Eur-lex and IATE. The European Media Monitoring site, which provides the possibility of seeking press articles in 35 languages, is available at http: //emm.jrc.it/overview.html. For further information on translation data, see http: //langtech.jrc.it/DGT-TM.html. (I.L.)