The 7th Conference
Human Language Technologies - the Baltic Perspective
October 6-7, 2016
Josef van Genabith
Statistical and Neural Machine Translation
Deep Neural Nets are are strongly impacting many areas in Artificial Intelligence and Human Language Technologies: in some applications, DNNs even show "super-human" performance. In machine translation, neural approaches are beginning to outperform the best statistical approaches, a technology that has been optimised and honed for many years. In the talk I will trace the different ways in which we have approached machine translation, starting with rule-based and moving to statistical approaches, highlighting some of the advantages and drawbacks of these technologies. I will then show how neural approaches can address some of the shortcoming and introduce some of the "tricks of the trade" used by current neural approaches to outperform the competition.
Josef van Genabith is a Scientific Director at DFKI, the German Research Centre for Artificial Intelligence, where he heads the Multilingual Technologies Group, and jointly with Prof. Hans Uzskoreit, the Language Technology Lab. He is Professor for Translation-oriented Language Technologies at the University of the Saarland, Germany. He was the founding Director of CNGL, the Centre for Next Generation Localisation (now ADAPT), in Dublin, Ireland, and a Professor in the School of Computing at Dublin City University.
Languages are Dialects with a Treebank and a Dependency Parser - Experiments with Cross-Lingual Parsing for Low-Resource Languages
Natural language processing (NLP) becomes increasingly important in people's everyday life if we look, for example, at the success of word prediction, spelling correction and instant on-line translation. Building linguistic resources and tools, however, is expensive and time-consuming, and one of the great challenges in computational linguistics is to port existing models to new languages and domains. Modern NLP requires data, often annotated with explicit linguistic information and tools that can learn from it. However, sufficient quantities of electronic data sources are available only for a handful of languages whereas most other languages do not have the privilege to draw from such resources. Speakers of low density languages and the countries they live in are not able to invest in large data collection and time-consuming annotation efforts, and the goal of cross-lingual NLP is to share the rich linguistic information with poorly supported languages making it possible to build tools and resources without starting from scratch. In this talk I will look in particular at transfer models for statistical dependency parsing. In my experiments I test these approaches on the recently released data sets with cross-lingually harmonized dependency annotation and I will show the potentials of simple yet effective annotation and treebank translation techniques. I will also include a discussion on shortcomings and problems and welcome suggestions for future work.
Jörg Tiedemann works as a professor of language technology at the University of Helsinki since August 2015. He received his Ph.D. in computational linguistics from Uppsala University in 2003. His work is mainly focused on machine translation, question answering and data mining from multilingual resources. During his Ph.D. he spent a year at the University of Edinburgh and worked as a post-doctoral researcher at the University of Groningen. Tiedemann is currently the director of BAULT, a research community on building and using language technology and maintains OPUS, a large open collection of parallel corpora.