Groningen nl pgroep

34
PoliticalMashup 1 PoliticalMashup Connecting promises and actions of politicians and how the society reacts on them Maarten Marx Universiteit van Amsterdam Groningen, α-informatica, 2011-03-11

Transcript of Groningen nl pgroep

  • 1. PoliticalMashup1 PoliticalMashupConnecting promises and actions of politicians and how the society reacts on them Maarten MarxUniversiteit van AmsterdamGroningen, -informatica, 2011-03-11

2. PoliticalMashup 2Content Overview PoliticalMashup project Zooming in on one cultural heritage dataset A few example applications Research ideas for NLP-scientists. 3. PoliticalMashup 3 Who am I? Political scientist turned computer scientist My eld: Theory of XML Database Systems Semi Structured Information Retrieval Cooperation with Tweede Kamer Koninklijke Bibliotheek, historians at NIOD, DNPP 4. PoliticalMashup 4PoliticalMashup project Large scale data integration project 2 years NWO funded infrastructure project 2010-2012 Partners: U. Amsterdam, Groningen and Tilburg Ongoing with irregular funding since 2008 5. PoliticalMashup5Goal of PoliticalMashup Making huge amounts of textual data available for large scale automatic quantitative data and content analysis done by scientists from the humanities and social sciences. 6. PoliticalMashup6 Mashup of what and how? 4 data sourcesPromises and actions of politiciansReactions on those in media and general public Connect data onPolitical entitiesTimeTopics 7. PoliticalMashup 7Data sourcesPromises Election manifestos, mostly scans, DNPP Party websites and blogs, Archipol Twitter of politiciansActions Parliamentary proceedings, mostly scans, KBReactions News media User generated content Fora, Blogs, Comments on news,Twitter 8. PoliticalMashup 8Used techniques Text analytics and XML DB and IR technology Named entity recognition and normalization Data mining, Machine Learning, hand-crafted rules Natural Language Processing, Language Models Make implicit structure and information explicit. 9. PoliticalMashup9Zoom in on one data corpus 10. PoliticalMashup10 Longitudinal data weakly measurement for over 150 years very stable measurement procedure and data model 11. PoliticalMashup11Data about human behaviour 12. PoliticalMashup 12Often rather boring 13. PoliticalMashup 13 But sometimes full of drama and excitement 14. PoliticalMashup 14 Loads of measurement points24.000 days, 450.000 topics, 7.5 miljoen speeches 15. PoliticalMashup 15Digitally available 16. PoliticalMashup16 De Handelingen der Staten Generaal (DutchHansards) 17. PoliticalMashup17About this collection very sparse available metadata very rich metadata sits hidden inside the raw data Rich data model Meeting (1 Day) Topic Stage direction Scene Stage direction Speech Paragraph 18. PoliticalMashup 18Same data: dierent views Raw data in PDF XML styled with stylesheet Machine readable XML format 19. PoliticalMashup 19Some applications of this 20. PoliticalMashup 20Content and structure search Combine IR style keyword search with restrictions on structure. E.g., return speeches by Wilders about Islam 21. PoliticalMashup 21Exhaustive data collection Example query for NIOD historians Search for paragraphs about fascisme OR nazisme OR dictatuurOR (nazi AND dictatuur) OR . . . Return a tsv le with for each hit date speakername speakeridspeaker-party . . . NIOD query 22. PoliticalMashup 22Link the proceedings to entities Who is speaking? Who says what to whom?Applications Summary of one speaker On old OCRed data: Linking and resolving entities 23. PoliticalMashup23 Application: Interruption graph (Attackogram) MP A interrupts B A speaks during the block of B. 24. PoliticalMashup 24NLP research topics 25. PoliticalMashup250) Topics Common European thesaurus http://eurovoc.europa.eu detection classication (sentence, paragraph, speech level) 26. PoliticalMashup261) Populist language in parliament PhD Thesis Jan Jagers (2006). 27. PoliticalMashup 27 2) Automatically detecting promises (toezegging)by ministers in Parliament https://zoek.officielebekendmakingen.nl/kst-103196.pdf(pagina 56) Eerste Kamer has a nice database onlinehttp://www.eerstekamer.nl/toezeggingen_2 28. PoliticalMashup28 ExampleDe voorzitter: Ik constateer dat wij bijna aan het einde van dezevergadering zijn gekomen. Wij hebben nog tijd om even detoezeggingen langs te lopen. Ik vraag iedereen om op te letten of erniets over het hoofd is gezien. Ik zal dit snel doen en daarna sprekenwij nog even over het vervolg. De toezeggingen.Na de zomer ligt het wetsvoorstel bij de Kamer.Er komt een brief om de Kamer erover te informeren op welke wijzeer voorkomen wordt dat er expertise verloren gaat.Minister Van Bijsterveldt-Vliegenthart: Dat heb ik niettoegezegd. Beslist niet. Nee, dat doe ik niet, want ik heb dat niettoegezegd. 29. PoliticalMashup293) Opinion detection Detect opinions expressed about entities and topics. (Speaker isknown) Detect reported speech. 30. PoliticalMashup 304) Detect type of speech Interruption, attack, answer, speech (betoog), stage-direction,... http://data.politicalmashup.nl/debates/nl/h-ek-19961997-37-58.1-tijdslijn.html 31. PoliticalMashup 31 5) Detect bullshit Tautologien . . . e Regels zijn regels, Op is op pp het is wat het is 32. PoliticalMashup326) Spelling normalization Dutch had many spelling reforms. Leads to lower recall. Search in new spelling, return results in old spellings. 33. PoliticalMashup 33Lots of data available: happy to share Now: 15 years of Dutch Parliamentary Proceedings in rich XML Now: 200 years more in poorer XML, slowly getting richer. Parliamentary proceedings from EU (15y), UK (75y), Spain (40y),Scandinavian countries, . . . Election manifestos (provincial elections 2007 and 2011) All tweets, blogs, Flickr and Youtube of all Dutch nationalpoliticians since 1.5 year. 34. [email protected]