Project Proposal: Translation Example Search Engine

Post on 21-May-2015

529 views 2 download


I propose to use a local document fingerprinting algorithm, Winnowing, to find near matches of natural language translation samples.

Transcript of Project Proposal: Translation Example Search Engine

Project Proposal

CSC 630, Fall 2013, University of ArizonaSumin Byeon

Example-BasedMachine Translation

• Translation example sets (S₁→T₁), (S₂→T₂), (S₃→T₃), ...

• Given a query text S, find the closest match S’ such that (S’→T’)

• T’ is accepted as the translation of S


S2# T2#S#

Sn# Tn#

S1# T1#


h(S)# h(Sσ),#φ(S)# Ti#

Which hash function? Optimal value of k? Window size?

Relationship with Content Addressability• Content recognizability

• Hash - Winnowing

• Content recoverability

• By locating or reconstructing

• Unlike other projects like NDN or Receipt, mine is relatively straightforward

• Simple key-value storage

• Key: hash

• Value: (reference to original text, offset)

Text Matching• Full-text search may be an effective solution, but...

• Loses information regarding the ordering of the query words

• Limited support for phrase search

• Certain linguistic features will be ignored (e.g., “a”, “the”)

• Matching long enough partial text

• Longer text - lower probability of finding matches

• Shorter text - higher probability of ambiguity (i.e., homonym, false cognates)

Grand Plan

• Winnowing algorithm implementation

• Index a large number of samples (+10,000)

• Translation sample search engine with simple RESTful interface

• Integrate it with Better Translator

Better Translator

• Language translator exploiting an indirect translation trick

• e.g., (Korean)→(Japanese)→(English)

• A perfect platform to test the hypothesis

• 여러분이 몰랐던 구글 번역기

• Google Translate: You did not know Google Translate

• Better Translator: Google Translate you did not know