Creat membership Creat membership
Sign in

Forgot password?

Confirm
  • Forgot password?
    Sign Up
  • Confirm
    Sign In
Creat membership Creat membership
Sign in

Forgot password?

Confirm
  • Forgot password?
    Sign Up
  • Confirm
    Sign In
Collection
For ¥0.57 per day, unlimited downloads CREATE MEMBERSHIP Download

toTop

If you have any feedback, Please follow the official account to submit feedback.

Turn on your phone and scan

home > search >

Mining Parallel Documents across Web Sites

Author:
Pham Ngoc Khanh  Ho Tu Bao  


Journal:
Lecture Notes in Computer Science


Issue Date:
2010


Abstract(summary):

Most methods on building parallel corpora often start from large scale bilingual websites that are not always an available resource for many language pairs. In this paper we present a novel method to mine parallel documents between English and other non-popular languages which are situated on different locations on the Internet. Our method is motivated by the observation that many non-popular language news are translated from popular English news websites. Given a news in a non-popular language, a method is proposed to search for its original English version located on another website using search engines. Experiments with English-Vietnamese show that our method can provide bilingual document pairs in science domain with precision around 90%. Our method is more flexible and scalable than traditional approaches that collect parallel texts from multilingual websites as its starting point is only a set of monolingual news. Furthermore, this method can be applied to mine parallel documents between non-popular languages pairs with scarce resources.


Page:
552-563


VIEW PDF

The preview is over

If you wish to continue, please create your membership or download this.

Create Membership

Similar Literature

Submit Feedback

This function is a member function, members do not limit the number of downloads