Translation Management System - JBoss€¦ · Working with version control systems ... Collections...
Transcript of Translation Management System - JBoss€¦ · Working with version control systems ... Collections...
Translation Management SystemCarlos A Munoz – i18n Engineer – Red Hat
● Translation Management System● Storage
● Versioning
● Context
● Analysis
● History
● Discard the Rest!
The ProblemToo much of translating is not Translating
The Problem
ABCABCABCABC
DEFDEFDEFDEF
شقصجشقصجشقصجشقصج
ꙢꙦꙨꙢꙦꙨꙢꙦꙨꙢꙦꙨ
ⴺὣὴⴺὣὴⴺὣὴⴺὣὴ
The Problem – Translators● Conflicting work and duplicated effort
● Numerous files with text and translations
● Working with version control systems
● Multiple file formats
● Work-flow (What happens when documents are updated after translations have begun?)
The Problem – Content Authors● Numerous translation teams
● Ensure teams are working with the latest
● Progress tracking
● Gather completed translations
Mostly Work-flow Problems!
THE SOLUTION
Develop a uniform tool to abstract the complexities of the content and simplify
the translation process
Zanata● Translation platform
● Versioning
● History
● Validation
● Basic Workflow Management● Translation Reuse● Integration
History
2008
Conceptualization
2009
Internal Use by Red HatEngineers and translators
2010 2011
Public Instance launchedwww.zanata.org
Zanata - Technology● JBoss EWP 5 / JBoss AS 5.x● Seam 2.2● JSF 1.2● GWT● Hibernate● Rest API (RestEasy)● MySQL
Zanata - Production● 3 Production instances● Used by Fedora, Jboss and Red Hat users● Over 200 projects created
● Jboss AS
● Fedora Documentation
● 2M translations
Arquitecture
JBoss EWP 5.1
Zanata Web Application
SeamHibernate
REST API
Clients
Mavenclient
Pythonclient
Web Browser
TranslationEditor
Apppages
JSF + RichFaces GWT RPC
JavaClient
DBMS
PersistenceObjects
Text to Translate
Translated Text
New, Unchanged
1.0, 2.1, master, etc.
Need Review
Approved
Broadest Container
org/zanata/DocumentToTranslate
Layered Approach
ActionComponent
ActionComponent
GWT RPCHandler
GWT RPCHandler
Business Logic Components
PersistenceObjects
RESTComponent
RESTComponent
Translation Editor● Attempt to emulate a standalone application● Keyboard Shortcuts● Rich interface● IDE like (All tools are a click away)● Multi user
Translation Editor
Source TextSource Text Translated TextTranslated TextToolsTools
Translation MemoryTranslation Memory GlossaryGlossary
Translation Editor Tools● Chat with other users in the same workspace
● Validate text:● Same number of lines
● XML tags
● Java variables
● Leading and trailing new line characters
● Navigation and UI options
● System messages
Concurrent User support
● Other Translation Management Systems block documents for translation
● Zanata will notify if another user is focused on the same translation
● Zanata will notify a user if their translation has been overwritten
Translation History
● All changes in a translation over time● Who?
● When?
● What?
● Compare any two versions of the translations.
Translation Reuse● Copy Trans
● Search for exact source matches
● Data Mining on the SQL Database
● Translation Memory● Search for “likely” matches
● Hibernate Search
● Extremely fast results
● Gets out of Sync
● Slow re-sync
● Text match ratings (multiple algorithms)
Copy Trans
● Saves time when starting a new version of the same project by reusing already existing translations
● Time consuming
● Very accurate Results
Translation Memory Merge
● Bulk reuse from Translation memory
● Saves time when actively working on a document
● Very fast
● Quality might not be the best
Hibernate Search
DBMS
Full Text Entity Manager
Full Text Entity Manager
LuceneIndex
Hibernate EntityHibernate Entity
On Entity Change:
● Break down entity into Search terms
● Write an index file with entity type, id and search terms
● Proceed to update entity in the DB
Hibernate Search● Annotate Entity with
@Indexed
● Annotate indexed fields with
@Field(index = Index.TOKENIZED)
● For custom indexing use
@FieldBridge(impl = BridgeImpl.class)
Hibernate Cache● Helps with repeated loading of DB records.
● Queries can be cached too.
● Entities must be annotated with
@Cache
● Collections may also be annotated
● Very conservative
● Is transactional
Security● JAAS● Seam Security● Drools● 4 Authentication mechanisms:
● Internal (DB Based)
● Kerberos
● Open Id
● External (Any JAAS)
Security - Drools● Securing java objects@Restrict(“#{identity.hasPermission(project, ‘insert’)}”)
identity.hasPermission(project, ‘insert’)
identity.checkPermission(project, ‘insert’)
● Securing JSF pages<s:loggedIn/>
<s:hasPermission/>
<h:commandButton ... rendered=”#{s:hasRole(‘admin’)}” />
Security - Drools● Implementation (security.drl)rule CreateProject
no-loop
activation-group "permissions"
when
$project: HProject()
$authenticatedPerson: HPerson()
check: PermissionCheck(target == $project, action == "insert", granted == false )
then
check.grant();
end
Load Management● Quartz for timed processes
● @Asynchronous
● Nightly cleanup
● Pushing large files
● Internal framework for Process management
● Break transactions up when possible
Clients● Integration
● Push Source and Target files
● Secured (API keys)
● 3 supported clients● Maven Client: For SW projects
● Python Client: CLI, on it's way out
● Java Client (zanataj): CLI
● Communicate via REST API
● Supported formats: properties, xliff, po
Clients
ⴺὣὴ
XML / JSON
ETagXML / JSON
Clients - ETags
ⴺὣὴ
EtagFile
ⴺὣὴEtagFile timestampFile MD5 ETag
ETag
304EtagFile
ⴺὣὴ
ⴺὣὴ
ⴺὣὴ
ⴺὣὴ
Clients - Problems● File format support on the client side● Large files and transactions● Large files and HTTP timeouts● Versioning – When to override?● Concurrency – Editor vs REST
Clients - Solutions● File format support on the client side
● Experimental raw file REST endpoint
● Large files and transactions
● Large files and HTTP timeouts● “Push and Query” approach
● Versioning – When to override?● ??? … Sync problem
● Concurrency – Editor vs REST
Documentation and Testing● Enunciate (Rest API documentation)● TestNG● Seam Autowire (Arquillian?)● Selennium
● Maven Cargo Plugin
The Future● Move to Jboss AS 7, EAP 6● Seam 3.1● New Languages?
● Groovy
● Scala
● Enhanced User Experience● Flexible Workflow Management
The Future● Arquillian● More projects → Larger DB
● Slower results
● Sharding
● Map-Reduce
Thank you