Data & Open Technologies A Perfect Combinationuocpga.gr/ddw1/kaukakis_ddw.pdf · A Perfect...

Post on 23-Aug-2020

3 views 0 download

Transcript of Data & Open Technologies A Perfect Combinationuocpga.gr/ddw1/kaukakis_ddw.pdf · A Perfect...

Data & Open Technologies

A Perfect Combination Introduction Lecture

Καουκάκης Σταύρος Αναλυτής – Προγραμματιστής Πληροφοριακών Συστημάτων, M.Sc.

Μέλος Δ.Σ. Συλλόγου Αποφοίτων Μεταπτυχιακών Σπουδών Π.Κ. @kaukakis

11 June 2016

Contents

Main Topics

Data Sources (Who Produce Data?)

Open Source & Free Software

Some (future) stats

We have Data. So, What We Need?

Open (Source) Software Tools & Platforms

Some Examples & Case Studies

Discussion

1st Data Driven World 11 June 2016 2

Some Topics

11 June 2016 1st Data Driven World 3

Who Produce Data ?

All of us. Everybody!

Environment

When?

All the Time!

Who Collects Data?

Government

Companies

Users

Who Owns Data?

Why All Collect Data?

Take Advantage of Data!

Μερικοί Αριθμοί Big Data

Πάνω από το 90% του συνόλου των δεδομένων δημιουργήθηκαν τα τελευταία 2 χρόνια

Κάθε 2 ημέρες αποθηκεύονται τόσα δεδομένα όσα υπήρχαν ψηφιακά μέχρι το 2003

Tο 2020 το μέγεθος των δεδομένων θα 10πλασιαστεί (~40 Zettabytes)

Κάθε 1 έτος τα δεδομένα σχεδόν διπλασιάζονται

Συσκευές σε σύνδεση με το διαδίκτυο: 13 δις

Μέχρι το 2020 αναμένεται να φτάσουν στα 50 δις

Πάνω από 3 δις χρήστες

DVDs Stack to the Moon!!! (And Back) Ben Golub @golubbe

1st Data Driven World 11 June 2016 4

…E

very 6

0 S

eco

nd

s!

2015 Report Source: qmee.com

1st Data Driven World 11 June 2016 5

1st Data Driven World 11 June 2016 6

Source: wikimedia.org

1st Data Driven World 11 June 2016 7

(Big – Linked) Data & Software

Software & Tools Needed

Open Source Software

Open Hardware

Open Technologies

Open Data Platforms

1st Data Driven World 11 June 2016 8

Why Open Source? Customizability

Flexibility – Agility

Interoperability

Big Communities

Freedom

Try Before You Buy

Low Cost

Security

Online community and public directory of free and open source software

https://www.openhub.net/

1st Data Driven World 11 June 2016 9

Tools & Software for Data… Storage

Analysis

Cleaning

Mining

Visualization

Integration

Publishing

Automation

Programming Languages

… Open Technology is everywhere!

&

1st Data Driven World 11 June 2016 10

1st Data Driven World 11 June 2016 11

CKAN (Data Publishing)

CKAN is a powerful data management system

Publishing

Sharing

Using Data

Web: ckan.org

Case Study: http://www.data.gov.gr/

1st Data Driven World 11 June 2016 12

Open Refine (Data Cleaning)

A free, open source, powerful tool for working with messy data

Cleaning

Transforming from one format into another

Extending

Web: openrefine.org

An Example

1st Data Driven World 11 June 2016 13

Datawrapper (Data Visualization – Web App)

Datawrapper is like having an amazing graphic designer at the tip of your fingers

Brings Data to Life

Interactive Charts

No Coding Skills Needed

Limitations for free edition (extraction in PNG files)

Web: datawrapper.de

Examples: https://datawrapper.de/gallery

1st Data Driven World 11 June 2016 14

Data-Driven Documents (for Programmers)

D3.js is a JavaScript library for manipulating documents based on data

Brings Data to Life

Modern browsers Compatibility

Data-driven approach

Web: d3js.org

Examples: github.com/d3/d3/wiki/Gallery

Have a look to Google Charts

1st Data Driven World 11 June 2016 15

Lumify (Analysis and visualization )

Lumify is an open source big data analysis and visualization platform

Analyze relationships

Geographical view

Sharing your works in real time

Web: lumify.io

Examples: http://lumify.io/

1st Data Driven World 11 June 2016 16

R Language - Environment

R is a language and environment for statistical computing and graphics

Statistical & Graphical techniques

Linear and nonlinear modeling

Classification, Clustering

Web: .r-project.org

Examples: http://www.rexamples.com/

1st Data Driven World 11 June 2016 17

Data Storage - Management & More Open…

Hadoop (hadoop.apache.org)

MongoDB (mongodb.com)

Talend (talend.com)

Rapidminer (rapidminer.com)

Elodina Platform (elodina.net)

RDMS, like MySql and PostgreSQL

1st Data Driven World 11 June 2016 18

Online community and public directory of free and

open source

https://www.openhub.net/

https://opensource.org/

Thank You,

Questions?

1st Data Driven World 11 June 2016 19

Καουκάκης Σταύρος stavroskaukakis@gmail.com

@kaukakis