Researchers Use Bible to Perfect Translation Algorithms
Marie Donlon | October 24, 2018To improve computer-based text translators, researchers from Dartmouth College have trained an algorithm on different versions of the Bible, making it possible to convert written works into styles appropriate for a variety of audiences.
While internet tools exist to translate text between languages — for instance, English to Spanish — style translators, or tools that alter the style of the text instead of the language, have been slower to emerge.
The team from Dartmouth recognized in the Bible "a large, previously untapped dataset of aligned parallel text." With each version containing more than 31,000 verses, the team was able to use that content to create roughly 1.5 million unique pairings of verses for the purpose of developing machine-learning training sets.
"The English-language Bible comes in many different written styles, making it the perfect source text to work with for style translation," said Keith Carlson, a Ph.D. student at Dartmouth and lead author of the research paper about the study.
Another benefit of using the Bible is that the text is thoroughly indexed by book, chapter and verse numbers. Such organization minimizes alignment errors that might occur with different versions of the same text.
"The Bible is a 'divine' data set to work with to study this task," said Daniel Rockmore, a professor of computer science at Dartmouth and contributing author on the study. "Humans have been performing the task of organizing Bible texts for centuries, so we didn't have to put our faith into less reliable alignment algorithms."
Defining style, for the purpose of this study, researchers concentrated on factors such as word choice, use of passive and active voices and sentence length.
According to the study: "Different wording may convey different levels of politeness or familiarity with the reader, display different cultural information about the writer, be easier to understand for certain populations."
Although different versions of the Bible were used to train the computer code, systems could one day translate the style of any written text for different audiences. For instance, a style translator could translate an English-language passage from "Moby Dick," making it suitable for young readers, non-native English speakers or any other audience.
"Text simplification is only one specific type of style transfer. More broadly, our systems aim to produce text with the same meaning as the original, but do so with different words," said Carlson.
The study is published in the Royal Society Open Science.