By Estelle Maryline Delpech
Computer-assisted translation (CAT) has continuously used translation stories, which require the translator to have a corpus of earlier translations that the CAT software program can use to generate bilingual lexicons. this is challenging while the translator doesn't have the sort of corpus, for example, while the textual content belongs to an rising box. to unravel this factor, CAT examine has regarded into the leveraging of similar corpora, i.e. a suite of texts, in or extra languages, which care for an identical subject yet usually are not translations of 1 another.
This paintings had fundamental pursuits. the 1st is to evaluate the enter of lexicons extracted from related corpora within the context of a really expert human translation job. the second one aim is to spot bilingual-lexicon-extraction equipment which most sensible fit the translators’ wishes, deciding on the present limits of those recommendations and suggesting advancements. the writer focuses, specifically, at the identity of fertile translations, the administration of a number of morphological buildings, and the score of candidate translations.
The experiments are performed on language pairs (English–French and English–German) and on really good texts facing breast melanoma. This study places major emphasis on applicability – methodological offerings are guided through the wishes of the ultimate clients. This ebook is prepared in components: the 1st half offers the applicative and medical context of the examine, and the second one half is given over to efforts to enhance compositional translation.
The examine paintings offered during this ebook acquired the PhD Thesis award 2014 from the French organization for average language processing (ATALA).
Read or Download Comparable Corpora and Computer-assisted Translation PDF
Best software development books
This article explains, from a number of views, how software program and the software program are diversified from different industries technologically, organizationally, and socially.
Effectively enforce reliable computing initiatives utilizing aspect-oriented programming This landmark book fills a spot within the literature via not just describing the elemental innovations of reliable computing (TWC) and aspect-oriented programming (AOP), but in addition exploring their severe interrelationships.
Opher and Peter,
Just acquired my replica of occasion Processing in motion and browse it even though the weekend.
I might say that you just and Peter produced a real magnum opus. it is nice!
It could be learn by:
A) each seller that's constructing an EDA/CEP to promote; and
B) each software program engineer who's constructing an EDA/CEP program.
Your ebook is the development processing consultant for a few years to return.
Thank you and congratulations!
Magento is a feature-rich, expert, open resource e-commerce program that provides retailers whole flexibility and keep an eye on over the glance, content material, and performance in their on-line shop. you've the main appealing Magento shop on the web with the main aggressive costs, yet with out viewers, you will fight to make major revenues.
Extra resources for Comparable Corpora and Computer-assisted Translation
These translations were judged to be of a lesser quality than the translations generated by automatic systems, and this was based on measures such as BLEU and NIST. The authors use this experiment to remind us that these measures are not directly linked to the quality of the translations but that they only evaluate the resemblance to a reference dataset, which is moreover considered questionable, especially in translation. 2. Human MT evaluation Human evaluation consists of presenting sentence translations to humans who must judge their quality.
These very good results can be explained by the nature of their data: the evaluation lexicon is made up of words whose number of occurrences is higher than 100 and the comparable corpus is composed of the unaligned parts belonging to a single-parallel corpus. 3. e. words used as trustworthy elements, for they are automatically identiﬁable, are not ambiguous and belong to the comparable corpus’ topic. The authors suggest giving them more weight than other elements in the context vectors due to their properties making them highly discriminating elements.
E. the word i of the English matrix is the translation of the word i of the German matrix. Then [RAP 95] randomly switches the order of the words in the matrices to misalign them. He then observes that the similarity22 of the source and target matrices decreases when the number of misaligned words increases. [FUN 97] goes further with [RAP 95]’s experiment and uses a bilingual lexicon, which she projects onto the source and target corpora, which enables her to obtain attested translation pairs in both corpora.