Transcription

Improvement of the Translationof Named Entities in Neural MachineTranslationMaster’s Thesis ofMaciej Modrzejewskiat the Department of InformaticsInstitute for Anthropomatics and Robotics (IAR)Karlsruhe Institute of Technology (KIT)Karlsruhe, GermanyReviewer:Prof. Dr. Alexander WaibelSecond reviewer: Prof. Dr. Tamim AsfourAdvisors:Dr. Thanh-Le HaDr. Miriam Exel, SAP SEDr. Bianka Buschbeck, SAP SE14. November 2019 – 26. June 2020

Karlsruher Institut für TechnologieFakultät für InformatikPostfach 698076128 Karlsruhe

I declare that I have developed and written the enclosed thesis completely by myself, andhave not used sources or means without declaration in the text.Karlsruhe, 26.06.2020.(Maciej Modrzejewski)

AbstractNeural Machine Translation (NMT) systems have achieved better translation performancecompared to the Statistical Machine Translation (SMT) systems in the recent years andare perceived as the state-of-the-art solution in Machine Translation (MT). Nevertheless,the translation quality of the NMT output is often poor for words occurring infrequentlyin the training corpus or for words with multiple meanings due to their high ambiguity.One group of such words are named entities (NEs). Their correct translation posesa challenge for NMT systems. In general, conventional NMT systems are expected totranslate named entities by learning complex linguistic aspects and ambiguous terms fromthe training corpus only. When faced with named entities, NMT systems are found tobe occasionally distorting location, organization or person names and even sometimesignoring low-frequency proper names altogether.Recent approaches in NMT successfully enrich the source language sentences by addinglinguistic features into the neural network input with the use of source factors. Word levelfactors may carry linguistic information (part-of-speech tags, lemmas or morphosyntacticlabels), yet may be also used to augment the source sentence with other types of information, e.g. to denote named entities. The incorporation of word features into the neuralnetwork input in the context of named entities is a promising approach. The tagging ofNEs in the source sentence may support the networks in capturing named entities better,decreasing their ambiguity and thus enhancing their learning process. This thesis aims atexploring its potential and studies methods incorporating Named Entity Recognition (NER)into NMT with the aim to improve named entity translation. It proposes an annotationmethod that integrates named entities and inside-outside-beginning (IOB) tagging intothe neural network input with the use of source factors.In our experiments, we focus on three named entity classes: organization, location andperson. We investigate how the granularity of named entity class labels influences namedentity translation quality. Further, we execute an extensive evaluation of the MT outputassessing the influence of our annotation method on named entity translation. Finally, wediscuss our findings based on translation examples.Our experiments on English German and English Chinese show that just by including different named entity classes and IOB tagging, we can increase the BLEU score byaround 1 point using the standard test set from WMT2019 and achieve up to 12% increasein NE translation rates over a strong baseline. Furthermore, we also illustrate that ourannotation technique does not result in a poor translation performance in the scenariowhere no named entities are present.i

ZusammenfassungNeuronale maschinelle Übersetzungssysteme (NMT) haben eine bessere Übersetzungsleistung im Vergleich zum statistischen Ansatz in den letzten Jahren erzielt und werdenals die Lösung auf dem modernsten Stand der Technik in der maschinellen Übersetzungwahrgenommen. Trotzdem ist die Übersetzungsqualität ihrer Ausgabe häufig mangelhaftfür Wörter, die selten im Trainingskorpus vorkommen oder für Wörter mit mehrerenBedeutungen aufgrund ihrer hohen Mehrdeutigkeit.Eigennamen machen eine Gruppe solcher Wörter aus. Ihre korrekte Übersetzung stellteine Herausforderung für die neuronale maschinelle Übersetzungssysteme dar. Im Allgemeinen wird es von den konventionellen neuronalen maschinellen Übersetzungssystemenerwartet, die Eigennamen zu übersetzen, indem sie komplexe sprachliche Aspekte undmehrdeutige Begriffe nur aus den Trainingsdaten lernen. Wenn NMT-Systeme den Eigennamen begegnen, stellt sich heraus, dass sie vereinzelt Standort-, Organisations- oderPersonennamen verzerren und gelegentlich sogar niederfrequente Eigennamen insgesamtignorieren.In letzter Zeit konzentrieren sich die Ansätze im NMT Umfeld auf die Bereicherungder Sätze aus der Quellsprache durch das Hinzufügen sprachlicher Merkmale in die Eingabe eines neuronalen Netzwerks unter Zuhilfenahme von Source Factors, zu Deutschquellsprachlichen Faktoren. Faktoren auf Wortebene können sprachliche Informationenenthalten (z.B. die Wortart (part-of-speech) Tags, Lemmas oder morphosyntaktische Markierungen), können jedoch auch dazu verwendet werden, um den Quellensatz mit anderenInformationsarten zu ergänzen, z.B. um Eigennamen zu kennzeichnen. Die Einbeziehungvon Wortmerkmalen in die Eingabe eines neuronalen Netzwerks im Kontext von Eigennamen ist ein vielversprechender Ansatz. Durch das Tagging von Eigennamen im Quellensatzkönnen die Netzwerke dabei unterstützt werden, Eigennamen besser zu erfassen, ihreMehrdeutigkeit zu verringern und dadurch zur Verbesserung ihres Lernprozesses beitragen. Diese Thesis setzt sich als Ziel dieses Potenzial zu erforschen und untersuchtMethoden, die Eigennamenerkennung mit dem Ziel, die Übersetzung von Eigennamen zuverbessern, einbeziehen. Sie schlägt eine Annotationsmethode vor, die die Eigennamenund Inside-Outside-Beginning (IOB) Tagging miteinander in die Eingabe eines neuronalenNetzwerks unter Zuhilfenahme von quellsprachlichen Faktoren integriert.In unseren Experimenten konzentrieren wir uns auf drei Eigennamenklassen: Organisation, Standort und Person. Wir untersuchen auch, wie die Granularität der Eigennamenklassen ihre Übersetzungsqualität beeinflusst. Darüber hinaus führen wir eine umfassendeBewertung der Ausgabe aus dem Übersetzungssystem durch, um den Einfluss unsererAnnotationsmethode auf die Übersetzung von Eigennamen zu bewerten. Abschließenddiskutieren wir unsere Ergebnisse anhand von Übersetzungsbeispielen.Unsere Experimente von Englisch Deutsch und Englisch Chinesisch zeigen, dasswir den BLEU-Score um etwa 1 Punkt auf dem Standard WMT2019 Testdatensatz erhöheniii

können, indem wir verschiedene Eigennamenklassen und IOB-Tags miteinander kombinieren. Wir erreichen bis zu 12% Verbesserung in Eigennamenübersetzungsraten über einstarkes Basismodell. Darüber hinaus veranschaulichen wir auch, dass unsere Annotationsmethode im Szenario, wo keine Eigennamen auftreten, zu keinem Qualitätseinbußen inder Übersetzungsleistung führt.iv

ContentsAbstractiZusammenfassungiii1. Introduction11.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2. Research Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.3. Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. Theoretical tural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . .Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . .Named-Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . .Neural Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . .2.4.1. Word embeddings . . . . . . . . . . . . . . . . . . . . . . . . . .2.4.2. Encoder-Decoder Networks with Fixed Length Sentence Encodings2.4.3. Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.4.4. Training objective . . . . . . . . . . . . . . . . . . . . . . . . . .2.4.5. Neural Machine Translation Decoding . . . . . . . . . . . . . . .2.4.6. Large Vocabularies: Byte Pair Encoding (BPE) . . . . . . . . . . .Recurrent Neural Machine Translation . . . . . . . . . . . . . . . . . . .Convolutional Neural Machine Translation . . . . . . . . . . . . . . . . .Self-attention-based Neural Machine Translation . . . . . . . . . . . . . .Source Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Machine translation evaluation . . . . . . . . . . . . . . . . . . . . . . . .3. Related Work566111213131415161720202324273.1. Early approaches to named entity translation . . . . . . . . . . . . . . . .3.2. Transliteration approaches to named entity translation . . . . . . . . . .3.3. Deep Learning approaches to named entity translation . . . . . . . . . .4. System Description4.1. Research questions . . . . . . . . . . . . . . . . . . . . . .4.1.1. Influence of the granularity of named entity classes4.1.2. Inside-outside-beginning (IOB) tagging . . . . . . .4.1.3. Inline Annotation . . . . . . . . . . . . . . . . . . .4.1.4. Source factors combination methods . . . . . . . .4.2. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . .23427272931.313132333333v

Contents4.3. Language pairs and training data . . . . . . . . . . . . . . . . . . .4.3.1. Training and validation data . . . . . . . . . . . . . . . . . .4.3.2. Training data statistics . . . . . . . . . . . . . . . . . . . . .4.4. Data pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4.1. English-German . . . . . . . . . . . . . . . . . . . . . . . .4.4.2. English-Chinese . . . . . . . . . . . . . . . . . . . . . . . .4.5. Data annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.5.1. Process Overview . . . . . . . . . . . . . . . . . . . . . . . .4.5.2. Annotation of named entity classes with source factors . . .4.5.3. Annotation of named entity boundaries with source factors4.5.4. Inline Annotation with XML markup . . . . . . . . . . . . .4.6. Sockeye: The NMT toolkit . . . . . . . . . . . . . . . . . . . . . . .4.7. NMT architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . .5. Selection of the Named Entity Recognition system5.1. NER system candidates . . . . . . . . .5.2. Researched Named Entity Classes . . .5.3. Quality Analysis . . . . . . . . . . . . .5.3.1. Description of the test data sets5.3.2. Quality Analysis Description .5.3.3. Evaluation . . . . . . . . . . . .5.4. Performance Analysis . . . . . . . . . .5.4.1. Experimental Setup . . . . . . .5.4.2. Evaluation . . . . . . . . . . . .5.5. Conclusion . . . . . . . . . . . . . . . .45.6. Evaluation6.1. Description of the test data set . . . . . . . . . . . . . . . . . . . . . . . .6.2. Evaluation of the general translation quality . . . . . . . . . . . . . . . .6.2.1. Evaluation of the BLEU scores on newstest2019 . . . . . . . . . .6.2.2. Evaluation of the BLEU scores on a test set without named entities6.3. Automatic Named Entity Evaluation . . . . . . . . . . . . . . . . . . . . .6.3.1. Process description . . . . . . . . . . . . . . . . . . . . . . . . . .6.3.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.3.3. Drawbacks of the automatic named entity evaluation . . . . . . .6.4. Human Named Entity Evaluation . . . . . . . . . . . . . . . . . . . . . .6.4.1. Process Description . . . . . . . . . . . . . . . . . . . . . . . . . .6.4.2. The superiority of the human analysis . . . . . . . . . . . . . . .6.4.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.4.4. The F1-score of the spaCy NER system on random300 data set . .6.5. Translation examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6.6. Estimation of the effect of the NER annotation quality . . . . . . . . . . .7. Conclusion and future work7.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35354555656575760636365656666687171

Contents7.2. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72List of Abbreviations75Bibliography77A. Appendix89A.1. Validation data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A.2. spaCy’s NER classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8990vii

List of Figures2.1. Word-level LSTM-based architecture for NER, from Yadav and Bethard (2018)2.2. Character-level LSTM-based architecture for NER, from Yadav and Bethard(2018) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3. The encoder-decoder architecture of Sutskever et al. (2014). The colorcoding indicates weight sharing, from Stahlberg (2019). . . . . . . . . . .2.4. Depiction of the attention paid to the relevant parts of the source sentencefor each generated word of a translation example; dark shades of blueindicate high attention weights, from Ghader and Monz (2017) . . . . . .2.5. Greedy search (highlighted in green) and beam search (highlighted inorange) with beam size 𝐾 2, from Stahlberg (2019) . . . . . . . . . . . .2.6. The architecture of the Recurrent Neural Network developed by Bahdanauet al. (2015). Model generates the 𝑦 𝑗 word given the input sequence x, fromStahlberg (2019) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.7. The Transformer - model architecture, from Vaswani et al. (2017) . . . .2.8. (left) ) Scaled Dot-Product Attention. (right) Multi-Head Attention; ℎdenotes the number of attention layers running in parallel, from Vaswaniet al. (2017) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.9. An example of a NMT architecture allowing the integration of additionallinguistic information as word features, directly into the input of the neuralnetwork, from Ha et al. (2017) . . . . . . . . . . . . . . . . . . . . . . . .3.1. System architecture incorporating an external named entity translationcomponent, from Li et al. (2018a) . . . . . . . . . . . . . . . . . . . . . . .3.2. Tag-replace training method, from Li et al. (2018a) . . . . . . . . . . . . .3.3. Inline annotation applied to a source sentence after tokenization and subwords splitting, from Li et al. (2018b) . . . . . . . . . . . . . . . . . . . .910131416192122242929304.1. Categorization of named entities in training data sets: En-De and En-Zh, in % 366.1. Categorization of named entities in WMT2019 test data sets: En-De andEn-Zh, in % . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54A.1. Categorization of named entities in development data sets: En-De andEn-Zh, in % . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89ix

List of Tables2.1. Common named entity classes . . . . . . . . . . . . . . . . . . . . . . . .4.1. Annotation to assess the influence of named entity labels’ granularity; i.fine-grained: (0) for a regular sub-word (default), (1) for NE class Person,(2) for NE class Location, (3) for NE class Organization ii. coarse-grained:(0) default, (1) to denote a NE . . . . . . . . . . . . . . . . . . . . . . . . .4.2. IOB annotation denoting compound named entities; (B) indicates the beginning, (I) the inside and (O) the outside of a NE (a regular word) . . . .4.3. Inline annotation: XML markup shows the begin and the end of eachnamed entity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.4. Experimental setup – named entity annotation configurations . . . . . .4.5. Number of named entities in WMT2019 training data sets: En-De and En-Zh4.6. The architecture of the Transformer network used across all experiments4.7. NMT Sockeye training parameters (used across all experiments) . . . . .4.8. NMT Sockeye training parameters for the annotated models . . . . . . .732323334364243435.1. List of supported NE classes by examined NER systems . . . . . . . . . .5.2. Quality analysis: Properties of the test sets . . . . . . . . . . . . . . . . .5.3. Results of the quality analysis: Precision, Recall and F1-Score of testedNER systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5.4. Results of the performance analysis: Execution times required to annotatethe test sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.Characteristics of the WMT2019 test data set . . . . . . . . . . . . . . . .BLEU scores on newstest2019 . . . . . . . . . . . . . . . . . . . . . . . . .BLEU scores on nonNE-newstest2019 . . . . . . . . . . . . . . . . . . . . .Results of the automatic analysis on random300 data set for En–De withspaCy NER, NE match rate in % . . . . . . . . . . . . . . . . . . . . . . .Results of the automatic analysis on random300 data set for En–Zh withspaCy NER, NE match rate in % . . . . . . . . . . . . . . . . . . . . . . .Results of the automatic analysis on random300 data set for En–De withStanford NER, NE match rate in % . . . . . . . . . . . . . . . . . . . . . .Example of a transliteration from English to Chinese; occurring in thereference only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Example of a transliteration from English to Chinese; occurring in thehypothesis only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Deficiencies of the string-based search . . . . . . . . . . . . . . . . . . .An example of how the inaccuracies of the NER system influences theautomatic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .495158586061616263xi

List of Tablesxii6.11. Results of the human evaluation on random300 data set, NE match rate in %6.12. Precision and recall values of the spaCy NER system, evaluated on therandom300 data set, in % . . . . . . . . . . . . . . . . . . . . . . . . . . .6.13. Translation examples: The baseline model ignores the named entity (AlaskaState Troopers) in the source sentence . . . . . . . . . . . . . . . . . . . .6.14. Translation examples: Under- and over-translations produced by the baseline6.15. Translation examples: superfluous translation enforcement . . . . . . . .6.16. Results of estimation of the effect of the NER annotation quality on random300 with spaCy NER for En–De, NE match rate in % . . . . . . . . . .69A.1. Number of named entities in WMT2019 validation/development trainingdatasets: En-De and En-Zh . . . . . . . . . . . . . . . . . . . . . . . . . .A.2. Recognized NE classes by spaCy NER . . . . . . . . . . . . . . . . . . . .89906566676768

1. IntroductionThanks to the rise of World Wide Web-based technologies our world becomes moreand more interconnected. With an increasing level of digitization, there is an explosionof information in the form of news, articles, social media posts, and others forms ofcommunication. Overall, the last decade has witnessed a massive explosion of informationin life science. The productivity of companies, regions and nations depends largely ontheir ability to create and process knowledge-based information effectively.As information becomes an omnipresent element of our lives, there is a stronger needto communicate with one another across various nations. At the current time, there areapprox. 6,500 developed languages, 91 of which have more than 10 million active speakers(Eberhard et al., 2020). A further intensification of international cooperation requiresovercoming language barriers. Inevitably, this requires the ability to understand multiplelanguages. Regrettably, the human capabilities of learning multiple foreign languages areconstrained. This drives the need for automated translation solutions.Interestingly, the need for translation is not limited to face-to-face interactions onlybut also is required to translate unstructured documents such as, e.g., governmentaltexts, product descriptions, business transactions texts and others. Reliable and on the flytranslation is necessary to promote cultural exchange, collaborative, intercultural researchand globalized trade. As a result, there is a high demand for fast, economical and reliablemachine translation systems to facilitate information exchange.The research field of Machine Translation (MT) investigates the approaches to translating text from one natural human language (source language) into another (target language)with no human translator involved. It may be categorized as a sub-field of computationallinguistics that takes from a broad spectrum of other disciplines such as linguistics, computer science, information theory, artificial intelligence, and statistics.In its early days, MT has been criticized for bad quality: lack of fluency and intelligibility,low accuracy and inappropriate style. However, thanks to an intensive research in thisarea, we have been witnessing great progress in MT quality. Its quality is still lowerthan human translation, but it does not imply that no good practical uses exist. On thecontrary, MT is nowadays growing in popularity and its output is widely consumed bythe translation industry: in informal settings (e.g. offered as a plugin on a website or in amessaging chat) and by professional translators. Depending on the scenario, a varyinglevel of translation accuracy is expected. In informal scenarios users wish to receive a fast,somewhat accurate translation. In professional scenarios a very high quality is expected.Currently MT systems do not provide such quality. Therefore, the output of a MT isadditionally revised by a human translator to ensure it. In some cases, with appropriatecontrols on the language and the domain of the input texts, translations can be machinetranslated that are of high quality requiring no revision. The urge to deliver such solutionsproducing accurate and fluent translations drives the research in the area of MT.1

1. Introduction1.1. MotivationNeural Machine Translation (NMT) has recently shown promising results, replacingStatistical Machine Translation (SMT) as state-of-the-art approach to machine translation.Technological advances, such as sequence-to-sequence (Sutskever et al., 2014), attentionmechanism (Luong et al., 2015) and Transformer networks (Vaswani et al., 2017) greatlycontributed to improving accuracy and fluency of machine translation. Within a few yearsafter its first introduction, NMT is at the current time used commercially in productivesystems (e.g. Crego et al., 2016).In order to achieve the goal of creating an NMT system which translates with an identicalquality as a human translator, the ongoing research focuses on the elimination of theexisting deficiencies of NMT. Koehn and Knowles (2017) outline six main challenges ofNMT. In their work, they state the following: “both SMT and NMT systems actuallyhave their worst performance on words that were observed a single time in the trainingcorpus, (.), even worse than for unobserved words. The most common categories acrossboth are named entity (including entity and location names) and nouns”. There is, infact, an intensive research ongoing which aims to improve the translation of namedentities in the context of NMT (e.g. Li et al., 2018b; Ugawa et al., 2018; Li et al., 2018a;Yan et al., 2018, and others). Furthermore, there is a workshop series organized by theAssociation of Computer Linguistics (ACL) specifically dedicated to the research in thearea of named entity translation.In light of this deficiency of NMT, this work conductsresearch of methods which aim to improve the translation of named entities in NMT.Named entities are the phrases in human languages that explicitly link notations inlanguages to the entities existing in the real world (Wu et al., 2008). The concept of a“named entity” has been first introduced in the 6th Message Understanding Conference(MUC-6, Grishman and Sundheim (1996)). The aim of this conference was to recognizeand subsequently classify named entities into a category (e.g. their type). This task isis referred to as Named Entity Recognition (NER). NER is a research field and focuseson the automatic identification and classification of selected types of named entities inunstructured documents (Nadeau and Sekine, 2007). NER systems are often adoptedas an early annotation step in many Natural Language Processing (NLP) pipelines forapplications such as question answering, information retrieval and machine translation.The translation of named entities is challenging because new phrases in the form ofpersonal names, organizations, locations, product names, and monetary expressions appearon a daily basis and many named entities are domain specific, not to be found in bilingualdictionaries. Additionally, the lexical and syntactic ambiguity of named entities createsan obstacle during translation. For example, the word “France” (in English) may refer toa name of a person or the name of a country and depending on the target language itstranslation necessitates a different inflection.Improving named entity translation is important due to a number of reasons. First, thecorrect translation of named entities constitutes a key element to the correct interpretationof scientific, corporate or governmental texts where homogeneous understanding of thehandled material is required. Moreover, translation systems and cross-language information retrieval applications depend on their correct translation as a significant number ofusers’ requests have been found to contain them (Jiang et al., 2007). Furthermore, named2

1.2. Research Objectiveentities carry more semantic information than regular content or functional words andhave, therefore, a higher information utility. As a result, their mistranslation leads to ahigher information loss and impedes the correct understanding of translated texts moreseverely than a mistranslation of a regular word (Huang, 2005). Finally, a majority ofout-of-vocabulary terms are named entities. Consequently, their incorrect or missing translation has a considerable impact on the information retrieval effectiveness and machinetranslation quality (Wu et al., 2008).Incorporation of word features into the source sentenceNeural Machine Translation isbased on a sequence-to-sequence learning approach that interprets sentences as sequencesof generic tokens. As such, it does not explicitly exploit external sources providing potentially beneficial linguistic information. Therefore, the question arises whether providingsuch information, e.g. in form of word features, can help enhancing translation quality.We analyze this matter in the context of named entity translation. Conventional NeuralMachine Translation systems (e.g. Yonghui et al., 2016; Zhou et al., 2016) are expectedto translate named entities by learning complex linguistic aspects and ambiguous termsfrom the training corpus only. There is, however, no guarantee that a NMT system cancapture this information and produce a proper translation in all cases, especially for thoseterms which do not occur very often in the training corpus or are ambiguous. Whenfaced with named entities, NMT systems are found to be occasionally distorting location,organization or person names and even sometimes ignoring low-frequency proper namesaltogether (Koehn and Knowles, 2017).Recently Sennrich and Haddow (2016) successfully enriched the source language sentences by adding linguistic features into the neural network input. They find that addingmorphological features, part-of-speech tags, and syntactic dependency labels as inputfeatures improves translation quality. Their main innovation over the standard encoderdecoder architecture is the ability to represent the encoder input as a combination offeatures (source factors) which are subsequently concatenated or added to the embeddingvector.In general, a factor refers to “a type of additional word-level information” (Koehn andHoang, 2007). We define source factors as any type of additional word-level informationincorporated into the source sentence exclusively. Word level factors may carry linguisticinformation, for instance, part of speech tags, lemmas or morphosyntactic labels as in thework of Sennrich and Haddow (2016). However, they may be also used to augment thesource sentence with other types of information, e.g. to denote named entities. Generallyspeaking, factors could be any kind of automatically derivable information that is representable at the word level. External tools, such as e.g. a NER system, may be used toincorporate the annotations into the training corpus and at inference time.1.2. Research ObjectiveThe incorporation of word features into the neural network input in the context of namedentities is a promising approach. This thesis aims at exploring its potential and studiesmethods incorporating Named Entity Recognition (NER) into NMT with the aim to improve3

1. Introductionnamed entity translation. The NER system acts as an external source of information andits output is used to create word features.This work explores an annotation method that integrates named entities and insideoutside-beginning (IOB) (Ramshaw and Marcus, 1999) tagging into the neural networkinput with the use of source factors. In our experiments, we focus on three most commonand well-researched named entity classes: Organization, Location and Person. We alsoinvestigate how the granularity of named entity class labels influences named entitytranslation quality. Further, we execute an extensive evaluation of the MT output assessingthe influence of our annotation method on named entity translation. Finally, we discussour findings based on translation examples.Our experiments on English German and English Chinese show that by just including different named entity classes and IOB tagging, we can increase the BLEU score byaround 1 point using the standard test set from W

Neuronale maschinelle Übersetzungssysteme (NMT) haben eine bessere Übersetzungs-leistung im Vergleich zum statistischen Ansatz in den letzten Jahren erzielt und werden als die Lösung auf dem modernsten Stand der Technik in der maschinellen Übersetzung wahrgenommen. Trotzdem ist die Übersetzungsqualität ihrer Ausgabe häufig mangelhaft