AEPC: Designing an Arabic/English parallel corpus
Keywords:
Parallel corpus, translation, concordancer, computational linguistics, ESLAbstract
Abstract – Parallel corpora ‒ collections of aligned translated texts of two or more languages ‒ play a significant role in translation and contrastive studies. Given the importance of the availability of such learning resources for the education and training of translators, Arabic suffers from a lack of such learning resources. Although there are a limited number of free Arabic/English parallel corpora, a major drawback is that they are domain-restricted corpora, which limits their benefits for Arabic translation education. This paper describes an ongoing project to design and construct a balanced, representative, and free-to-use Arabic English parallel corpus (AEPC). In addition, the project involves the design and implementation of an Arabic/English concordance tool. The proposed parallel corpus and its tool can be integrated into translators’ training institutions as an educational resource for translation studies and teaching. It can be used in training and testing Arabic/English machine translation systems. The first phase of this project involved compiling high-quality translated text samples; all translations were done by human translators. The corpus covers a wide range of text types and rich metadata. The target figure for the corpus is minimally 10 million words, with the intention to increase that figure in the future. After compiling the texts, manual (i.e. human-aided) alignment was performed, offering better outcomes in terms of accuracy compared to automated alignment. The second phase of this project involved designing a web interface with a bilingual concordancer, where users can explore the content of the AEPC in both English and Arabic.Downloads
Downloads
Published
How to Cite
Issue
Section
License
Submission of your paper to this journal implies that the paper is not under submission for publication elsewhere. Material which has been previously copyrighted, published, or accepted for publication will not be considered for publication in this journal. Submission of a manuscript is interpreted as a statement of certification that no part of the manuscript is copyrighted by any other publisher nor is under review by any other formal publication. By submitting your manuscript to us, you agree on these copyright guidelines. It is your responsibility to ensure that your manuscript does not cause any copyright infringements, defamation, and other problems.
Submitted papers are assumed to contain no proprietary material unprotected by patent or patent application; responsibility for technical content and for protection of proprietary material rests solely with the author(s) and their organizations and is not the responsibility of the journal or its editorial staff. The main author is responsible for ensuring that the article has been seen and approved by all the other authors. It is the responsibility of the author to obtain all necessary copyright release permissions for the use of any copyrighted materials in the manuscript prior to the submission.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under the BY Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal
Article submission implies author agreement with this policy.