Download Comparable Corpora and Computer-assisted Translation by Estelle Maryline Delpech PDF

By Estelle Maryline Delpech

Computer-assisted translation (CAT) has continuously used translation stories, which require the translator to have a corpus of earlier translations that the CAT software program can use to generate bilingual lexicons. this is challenging while the translator doesn't have the sort of corpus, for example, while the textual content belongs to an rising box. to unravel this factor, CAT examine has regarded into the leveraging of similar corpora, i.e. a suite of texts, in or extra languages, which care for an identical subject yet usually are not translations of 1 another.

This paintings had fundamental pursuits. the 1st is to evaluate the enter of lexicons extracted from related corpora within the context of a really expert human translation job. the second one aim is to spot bilingual-lexicon-extraction equipment which most sensible fit the translators’ wishes, deciding on the present limits of those recommendations and suggesting advancements. the writer focuses, specifically, at the identity of fertile translations, the administration of a number of morphological buildings, and the score of candidate translations.

The experiments are performed on language pairs (English–French and English–German) and on really good texts facing breast melanoma. This study places major emphasis on applicability – methodological offerings are guided through the wishes of the ultimate clients. This ebook is prepared in components: the 1st half offers the applicative and medical context of the examine, and the second one half is given over to efforts to enhance compositional translation.

The examine paintings offered during this ebook acquired the PhD Thesis award 2014 from the French organization for average language processing (ATALA).

Show description

Read or Download Comparable Corpora and Computer-assisted Translation PDF

Best software development books

Software Ecosystems: Understanding an Indispensable Technology and Industry

This article explains, from a number of views, how software program and the software program are diversified from different industries technologically, organizationally, and socially.

Using Aspect-Oriented Programming for Trustworthy Software Development

Effectively enforce reliable computing initiatives utilizing aspect-oriented programming This landmark book fills a spot within the literature via not just describing the elemental innovations of reliable computing (TWC) and aspect-oriented programming (AOP), but in addition exploring their severe interrelationships.

Event Processing in Action

Opher and Peter,

Just acquired my replica of occasion Processing in motion and browse it even though the weekend.
I might say that you just and Peter produced a real magnum opus. it is nice!

It could be learn by:
A) each seller that's constructing an EDA/CEP to promote; and
B) each software program engineer who's constructing an EDA/CEP program.

Your ebook is the development processing consultant for a few years to return.
Thank you and congratulations!

Magento Search Engine Optimization

Magento is a feature-rich, expert, open resource e-commerce program that provides retailers whole flexibility and keep an eye on over the glance, content material, and performance in their on-line shop. you've the main appealing Magento shop on the web with the main aggressive costs, yet with out viewers, you will fight to make major revenues.

Extra resources for Comparable Corpora and Computer-assisted Translation

Sample text

These translations were judged to be of a lesser quality than the translations generated by automatic systems, and this was based on measures such as BLEU and NIST. The authors use this experiment to remind us that these measures are not directly linked to the quality of the translations but that they only evaluate the resemblance to a reference dataset, which is moreover considered questionable, especially in translation. 2. Human MT evaluation Human evaluation consists of presenting sentence translations to humans who must judge their quality.

These very good results can be explained by the nature of their data: the evaluation lexicon is made up of words whose number of occurrences is higher than 100 and the comparable corpus is composed of the unaligned parts belonging to a single-parallel corpus. 3. e. words used as trustworthy elements, for they are automatically identifiable, are not ambiguous and belong to the comparable corpus’ topic. The authors suggest giving them more weight than other elements in the context vectors due to their properties making them highly discriminating elements.

E. the word i of the English matrix is the translation of the word i of the German matrix. Then [RAP 95] randomly switches the order of the words in the matrices to misalign them. He then observes that the similarity22 of the source and target matrices decreases when the number of misaligned words increases. [FUN 97] goes further with [RAP 95]’s experiment and uses a bilingual lexicon, which she projects onto the source and target corpora, which enables her to obtain attested translation pairs in both corpora.

Download PDF sample

Rated 4.90 of 5 – based on 45 votes