AEPC: Designing an Arabic/English parallel corpus

  • Hind M. Alotaibi
Keywords: Parallel corpus, translation, concordancer, computational linguistics, ESL
DOI:

Abstract

Abstract – Parallel corpora ‒ collections of aligned translated texts of two or more languages ‒ play a significant role in translation and contrastive studies. Given the importance of the availability of such learning resources for the education and training of translators, Arabic suffers from a lack of such learning resources. Although there are a limited number of free Arabic/English parallel corpora, a major drawback is that they are domain-restricted corpora, which limits their benefits for Arabic translation education. This paper describes an ongoing project to design and construct a balanced, representative, and free-to-use Arabic English parallel corpus (AEPC). In addition, the project involves the design and implementation of an Arabic/English concordance tool. The proposed parallel corpus and its tool can be integrated into translators’ training institutions as an educational resource for translation studies and teaching. It can be used in training and testing Arabic/English machine translation systems. The first phase of this project involved compiling high-quality translated text samples; all translations were done by human translators. The corpus covers a wide range of text types and rich metadata. The target figure for the corpus is minimally 10 million words, with the intention to increase that figure in the future. After compiling the texts, manual (i.e. human-aided) alignment was performed, offering better outcomes in terms of accuracy compared to automated alignment. The second phase of this project involved designing a web interface with a bilingual concordancer, where users can explore the content of the AEPC in both English and Arabic.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biography

Hind M. Alotaibi
King Saud University / Saudi Arabia
Published
2015-12-31
How to Cite
Alotaibi, H. M. (2015). AEPC: Designing an Arabic/English parallel corpus. Research in Corpus Linguistics, 4, 1-7. Retrieved from https://ricl.aelinco.es/index.php/ricl/article/view/36