Translation Management System - JBoss€¦ · Working with version control systems ... Collections...

Post on 04-Jul-2020

8 views 0 download

Transcript of Translation Management System - JBoss€¦ · Working with version control systems ... Collections...

Translation Management SystemCarlos A Munoz – i18n Engineer – Red Hat

● Translation Management System● Storage

● Versioning

● Context

● Analysis

● History

● Discard the Rest!

The ProblemToo much of translating is not Translating

The Problem

ABCABCABCABC

DEFDEFDEFDEF

شقصجشقصجشقصجشقصج

ꙢꙦꙨꙢꙦꙨꙢꙦꙨꙢꙦꙨ

ⴺὣὴⴺὣὴⴺὣὴⴺὣὴ

The Problem – Translators● Conflicting work and duplicated effort

● Numerous files with text and translations

● Working with version control systems

● Multiple file formats

● Work-flow (What happens when documents are updated after translations have begun?)

The Problem – Content Authors● Numerous translation teams

● Ensure teams are working with the latest

● Progress tracking

● Gather completed translations

Mostly Work-flow Problems!

THE SOLUTION

Develop a uniform tool to abstract the complexities of the content and simplify

the translation process

Zanata● Translation platform

● Versioning

● History

● Validation

● Basic Workflow Management● Translation Reuse● Integration

History

2008

Conceptualization

2009

Internal Use by Red HatEngineers and translators

2010 2011

Public Instance launchedwww.zanata.org

Zanata - Technology● JBoss EWP 5 / JBoss AS 5.x● Seam 2.2● JSF 1.2● GWT● Hibernate● Rest API (RestEasy)● MySQL

Zanata - Production● 3 Production instances● Used by Fedora, Jboss and Red Hat users● Over 200 projects created

● Jboss AS

● Fedora Documentation

● 2M translations

Arquitecture

JBoss EWP 5.1

Zanata Web Application

SeamHibernate

REST API

Clients

Mavenclient

Pythonclient

Web Browser

TranslationEditor

Apppages

JSF + RichFaces GWT RPC

JavaClient

DBMS

PersistenceObjects

Text to Translate

Translated Text

New, Unchanged

1.0, 2.1, master, etc.

Need Review

Approved

Broadest Container

org/zanata/DocumentToTranslate

Layered Approach

ActionComponent

ActionComponent

GWT RPCHandler

GWT RPCHandler

Business Logic Components

PersistenceObjects

RESTComponent

RESTComponent

Translation Editor● Attempt to emulate a standalone application● Keyboard Shortcuts● Rich interface● IDE like (All tools are a click away)● Multi user

Translation Editor

Source TextSource Text Translated TextTranslated TextToolsTools

Translation MemoryTranslation Memory GlossaryGlossary

Translation Editor Tools● Chat with other users in the same workspace

● Validate text:● Same number of lines

● XML tags

● Java variables

● Leading and trailing new line characters

● Navigation and UI options

● System messages

Concurrent User support

● Other Translation Management Systems block documents for translation

● Zanata will notify if another user is focused on the same translation

● Zanata will notify a user if their translation has been overwritten

Translation History

● All changes in a translation over time● Who?

● When?

● What?

● Compare any two versions of the translations.

Translation Reuse● Copy Trans

● Search for exact source matches

● Data Mining on the SQL Database

● Translation Memory● Search for “likely” matches

● Hibernate Search

● Extremely fast results

● Gets out of Sync

● Slow re-sync

● Text match ratings (multiple algorithms)

Copy Trans

● Saves time when starting a new version of the same project by reusing already existing translations

● Time consuming

● Very accurate Results

Translation Memory Merge

● Bulk reuse from Translation memory

● Saves time when actively working on a document

● Very fast

● Quality might not be the best

Hibernate Search

DBMS

Full Text Entity Manager

Full Text Entity Manager

LuceneIndex

Hibernate EntityHibernate Entity

On Entity Change:

● Break down entity into Search terms

● Write an index file with entity type, id and search terms

● Proceed to update entity in the DB

Hibernate Search● Annotate Entity with

@Indexed

● Annotate indexed fields with

@Field(index = Index.TOKENIZED)

● For custom indexing use

@FieldBridge(impl = BridgeImpl.class)

Hibernate Cache● Helps with repeated loading of DB records.

● Queries can be cached too.

● Entities must be annotated with

@Cache

● Collections may also be annotated

● Very conservative

● Is transactional

Security● JAAS● Seam Security● Drools● 4 Authentication mechanisms:

● Internal (DB Based)

● Kerberos

● Open Id

● External (Any JAAS)

Security - Drools● Securing java objects@Restrict(“#{identity.hasPermission(project, ‘insert’)}”)

identity.hasPermission(project, ‘insert’)

identity.checkPermission(project, ‘insert’)

● Securing JSF pages<s:loggedIn/>

<s:hasPermission/>

<h:commandButton ... rendered=”#{s:hasRole(‘admin’)}” />

Security - Drools● Implementation (security.drl)rule CreateProject

no-loop

activation-group "permissions"

when

$project: HProject()

$authenticatedPerson: HPerson()

check: PermissionCheck(target == $project, action == "insert", granted == false )

then

check.grant();

end

Load Management● Quartz for timed processes

● @Asynchronous

● Nightly cleanup

● Pushing large files

● Internal framework for Process management

● Break transactions up when possible

Clients● Integration

● Push Source and Target files

● Secured (API keys)

● 3 supported clients● Maven Client: For SW projects

● Python Client: CLI, on it's way out

● Java Client (zanataj): CLI

● Communicate via REST API

● Supported formats: properties, xliff, po

Clients

ⴺὣὴ

XML / JSON

ETagXML / JSON

Clients - ETags

ⴺὣὴ

EtagFile

ⴺὣὴEtagFile timestampFile MD5 ETag

ETag

304EtagFile

ⴺὣὴ

ⴺὣὴ

ⴺὣὴ

ⴺὣὴ

Clients - Problems● File format support on the client side● Large files and transactions● Large files and HTTP timeouts● Versioning – When to override?● Concurrency – Editor vs REST

Clients - Solutions● File format support on the client side

● Experimental raw file REST endpoint

● Large files and transactions

● Large files and HTTP timeouts● “Push and Query” approach

● Versioning – When to override?● ??? … Sync problem

● Concurrency – Editor vs REST

Documentation and Testing● Enunciate (Rest API documentation)● TestNG● Seam Autowire (Arquillian?)● Selennium

● Maven Cargo Plugin

The Future● Move to Jboss AS 7, EAP 6● Seam 3.1● New Languages?

● Groovy

● Scala

● Enhanced User Experience● Flexible Workflow Management

The Future● Arquillian● More projects → Larger DB

● Slower results

● Sharding

● Map-Reduce

Thank you