Statistics in the Web
description
Transcript of Statistics in the Web
![Page 1: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/1.jpg)
Statistics in the Web
I. Antoniou, P. Moissiadis, M. Vafopoulos
Aristotle University, Department of Mathematics Master in Web Science
supported by Municipality of Veria
![Page 2: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/2.jpg)
23rd ESI Conference - Veroia 2
Contents • What is the Web?• Web milestones• Why is so successful?• We knew the web was big... • Web generations• Studying the Web• Web Data and Structure• Web Function and Evolution• Web policy April 8, 2010
![Page 3: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/3.jpg)
23rd ESI Conference - Veroia 3
What is the Web?
a system of interlinked hypertext documents (html) with unique addresses (URI) accessed via the Internet (http)
April 8, 2010
![Page 4: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/4.jpg)
23rd ESI Conference - Veroia 4
Web milestones
1992: TBL presents the idea in CERN1993: Dertouzos (MIT) andMetakides (EU) create W3C appointing TBL as director
Two Greeks in the Web’s birth, How many in Web science’s?April 8, 2010
![Page 5: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/5.jpg)
23rd ESI Conference - Veroia 5
Why is so successful?
Is based on architecture (HTTP, URI, HTML) which is:
• simple, free or cheap, open source, extensible• tolerant• networked • fun & powerful• universal (regardless hardware platform, software
platform, application software, network access, public, group, or personal scope, language and culture operating system and ability)
April 8, 2010
![Page 6: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/6.jpg)
23rd ESI Conference - Veroia 6
Why is so successful?• New experience of exploring & editing huge
amount of information, people, abilities anytime, from anywhere
• The biggest human system with no central authority and control but with log data (Yotta* Bytes/sec)
• Has not yet revealed its full potential…
*10248
April 8, 2010
![Page 7: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/7.jpg)
23rd ESI Conference - Veroia
We knew the Web was big...
• 1 trillion unique URIs (Google blog 7/25/2008)
• 2 billion users• Google: 300 million searches/day• US: 15 billion searches/month• 72% of the Web population are active on at
least 1 social network …
7
Source blog.usaseopros.com/2009/04/15/google-searches-per-day-reaches-293-million-in-march-2009/
April 8, 2010
![Page 8: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/8.jpg)
23rd ESI Conference - Veroia
Web: the new continent
• Facebook: 400 million active users– 50% of our active users log on to Facebook in any given
day– 35 million users update their status each day– 60 million status updates posted each day– 3 billion photos uploaded to the site each month
• Twitter: 75 million active users– 141 employees
• Youtube: 350 million daily visitors• Flickr: 35 million daily visitors
8April 8, 2010
![Page 9: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/9.jpg)
23rd ESI Conference - Veroia
Web: the new continent
• Online advertising spending in the UK has overtaken television expenditure for the first time [4 billion Euros/year] (30/9/2009, BBC)
• In US, spending on digital marketing will overtake that of print for the first time in 2010
• Amazon.com: 50 million daily visitors– 60 billion dollars market capitalization– 24.000 employes
9April 8, 2010
![Page 10: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/10.jpg)
23rd ESI Conference - Veroia 10
Web generations eras description basic value source
Pre Web 1980’scalculate
The desktop is the platform Computations
[no network effect]
Web 1.0:90’sread
Surfing Web: The browser is the platform hyper-linking of documents
Web 2.0: 00’swrite
Social Web: The Web is the platform social dimension of linkage properties
Web 3.0:10’sdiscover
Semantic Web: The Graph is the platform URI-based semantic linkages
Web 4.0:20’sexecute
Metacomputing: The network is the platform Web of things (embedded
systems, RFID)
Connection & production in a global computing system for everything
Web 2w
Combine allAlmost everything is (or could be) a Web
service New inter-creativity
April 8, 2010
![Page 11: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/11.jpg)
23rd ESI Conference - Veroia
New questions for the Web
• Safe surfing• Find credible information• Create successful e-business• Reduce tax evasion• Enable local economic development• Communicate with potential voters• Find existing research effort in a subject
How will answer these questions?11April 8, 2010
![Page 12: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/12.jpg)
23rd ESI Conference - Veroia 12
Studying the Web
The Web is the largest human information construct in history. The Web is transforming society…
It is time to study it systematically as stand-alone socio-technicalartifact
April 8, 2010
![Page 13: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/13.jpg)
23rd ESI Conference - Veroia 13
Web science timeline2005: The Web Science Workshop, London• Chairs: Tim Berners-Lee, Wendy Hall • Organizing Committee: J.Hendler, N. Shadbolt, D. Weitzner 11/2006: Web Science Research Initiative is established2007: “A Framework for Web Science” is published2007: the book is translated to Greek/introduced in Univ.4/2008: EU FET workshop in Web science4/2008: 2nd Web Science Workshop, China7/2008: Summer Doctoral Program, Oxford9/2008: Web science curriculum workshop, UK9/2008: establishment of W3F2009: 1st World Conference in Web science 18-20/3 /2009, Athens Greece www.websci09.org
10/2009: master in Web science Greece, UK3/2010: UK gov. invests 40 million euros in WS institute4/2010: Rensselaer Polytechnic Institute (41st ranked in US) announce
Undergraduate program in Web Science3/18April 8, 2010
![Page 14: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/14.jpg)
23rd ESI Conference - Veroia
The Web Science frameworkthe basis: • Data Analysis Statistics • Mathematical Models • The “Econometrics” paradigm• Statistics in Economics– Initially, not accepted from economists– Commerce and Accounting become Economics– Now, the base of Economics– Evaluation of theories/models about function, structure &
evolution of economic phenomena– Public policy and business strategy
14April 8, 2010
![Page 15: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/15.jpg)
23rd ESI Conference - Veroia 15
Web Data and Structure
April 8, 2010
![Page 16: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/16.jpg)
What kind of Data we have from Networks?
• Enumerated data. Such data are collected in an exhaustive way from the full population i.e. from all the nodes of the network.– For instance, in some social network studies. such as those
that might involve the graduates from a school or a university, it is quite easy to collect data that are uploaded from the members involved.
– The same is true for networks of collaborations between researchers or between scientific journals for which there exist databases containing citation indexes and other parameters for a great window of time.
April 8, 2010 23rd ESI Conference - Veroia 16
![Page 17: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/17.jpg)
What kind of Data we have from Networks?
• Partial Data. Such data are collected from a full enumeration of only a subset of the population. – For example in order to study the network between users
of Aristotle University of Thessaloniki (AUTh) we must collect information for all the nodes-users of AUTh. These data can help the researchers to find out a number of characteristics of the network but fail to handle some others having interaction with other networks. For instance the network traffic collected from this network cannot say anything for the probability of the network to crush out, because all the traffic, not only between the members of AUTh, is needed.
April 8, 2010 23rd ESI Conference - Veroia 17
![Page 18: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/18.jpg)
What kind of Data we have from Networks?
• Sampled Data. They are produced by selecting first a sample of the units-nodes by using some random technique. They not only be a subset of the whole possible data but they also not give an exhaustive view of some sub-population. Unless the graph is random, the nodes are not independent, while their meaning varies. – For example, let us take a random sample of a doctors’
network where the link means that they have common patients. Then the response will be different if some of the most famous doctors of this network included in the sample than the case none of them be selected.
April 8, 2010 23rd ESI Conference - Veroia 18
![Page 19: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/19.jpg)
Drawing a network• The statistical analysis of a network is affected even by the
way of drawing the network. The graph may be seen as a “geometric representation of relations between the nodes”. When the nodes are only a few it is possible to construct the graph by hand successfully, and one can realize the importance of a good design. For instance the three graphs below represent the same graph but the sensation they produce is different.
April 8, 2010 23rd ESI Conference - Veroia 19
![Page 20: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/20.jpg)
Drawing a network• From Kolaczyk’s book [1] we have • 3 views of the «Zachary’s ‘karate club’ network»
It is centered on the actors a1 and a34. The yellow links actors from different groups.
Two ego-centric views of the same network. The above is viewed from a1 and the below from a34
Easy Community Detection
April 8, 2010 23rd ESI Conference - Veroia 20
![Page 21: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/21.jpg)
Drawing a network• A number of algorithms have been developed for drawing
graphs and networks in such a way that the graphs reveal the relevant information in an aesthetically pleasant way.
• Known packages as:– Mathematica, USINET, Snap, Tuchgraph, igraph (of R), NodeXL (of
Excel) and many others
have incorporated such algorithms for achieving optimal drawing of graphs. In the most of them the user can react to change the algorithm, or to move some nodes in order to make the graph more readable. As Kolaczyk points out the graph drawing involves not only “science” but also some “art”.
April 8, 2010 23rd ESI Conference - Veroia 21
![Page 22: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/22.jpg)
Drawing a network• For some networks it is needed to make some statistical analysis
before the drawing. – Let us consider that in a biological study we have N genes {1,2,…, N} and
that for any gene we observe its performance under m separate experimental conditions,
gives rise to an m1 vector xi=(xi1, xi2, …, xim)΄ for every gene i. – A usual simple measure of association of two genes i and j is by
comparing the corresponding vectors xi and xj, or equivalently to find the correlation coefficient ρij of these two vectors. If this coefficient is big enough, the two genes involved are considered to be associated. So in the graph with nodes the genes we add the edge joining the associated genes, constructing sequentially the set of edges E.
– It is obvious that in order to decide when the coefficient is big enough we must perform a hypotheses test for a suitable threshold.
April 8, 2010 23rd ESI Conference - Veroia 22
![Page 23: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/23.jpg)
Drawing a network• Regression models can also be used for network drawing.
– Let us consider a social network G(V,E), where V is the set of individuals constituting the nodes of the network.
– If the links in this network (friendship, collaborationism, nativeness, etc) are not known but can be estimated from some controllable variables such as age, sex, speciality then we represent by Y the link (i.e. Y=1 if link exists, Y=0 if link does not exist) and by X the vector of predictors.
– Afterwards, we estimate the probability P(Yij=1|Xi=xi, Xj=xj) and if it exceeds some limit we add edge ij in Ε, constructing, by this way, sequentially the whole set of edges E.
April 8, 2010 23rd ESI Conference - Veroia 23
![Page 24: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/24.jpg)
![Page 25: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/25.jpg)
![Page 26: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/26.jpg)
Κυβερνοχωρος
![Page 27: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/27.jpg)
Κυβερνοχωρος
![Page 28: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/28.jpg)
Node Degrees
April 8, 2010 23rd ESI Conference - Veroia 28
1
( ) 6p
ini
d i q=
= =å
| | , | |V n E q= =
din(3)=1, dout(3)=2
1
( ) 12 2p
i
d i q=
= =å1
( ) 6p
outi
d i q=
= =å
d(5)=1
d(3)=2
5
1.2
2.1
0.2
0.5d(1)=2
d(4)=3
d(2)=4 1.72
3
4
1
din(1)=1, dout(1)=1 3
21
2
5
9
din(4)=1, dout(4)=2
din(2)=3, dout(2)=1
1
2
3
4
![Page 29: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/29.jpg)
The degree distribution
April 8, 2010 23rd ESI Conference - Veroia 29
P(k) = P(D ≤ k) is the distribution function of the random variable D that counts the degree of a randomly chosen node.
![Page 30: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/30.jpg)
Distances, Eccentricity, Cliques…• We estimate the distribution of distances, or of eccentricities, or
of other graph characteristics.• We use different statistics, as the mean distance
or the mean connected distance by dividing the sum of distances with number m of edges instead of n(n-1).
• We estimate the clustering coefficient cv=qv/(kv(kv −1)/2), where kv are the neighbors of node v and qv the number of links between the neighbors of node v (0qv kv(kv −1)/2), or the global clustering coefficient c = c(p) = v cv/n
April 8, 2010 23rd ESI Conference - Veroia 30
,
1 ( , )( 1) u v V
L d u vn n Î
= - å
![Page 31: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/31.jpg)
Example of clustering coefficient
April 8, 2010 23rd ESI Conference - Veroia 31
a b c
graph a b c
qi 10 4 0
kv(kv −1)/2 10 10 10
ci=qi/kv(kv −1)/2 1 0.4 0
![Page 32: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/32.jpg)
Degree Distribution of random graphs
April 8, 2010 23rd ESI Conference - Veroia 32
P(k): the probability that a node has k links
11( ) (1 )k n knP k p p
k- -æ ö- ÷ç ÷= -ç ÷ç ÷çè ø
A random graph from G(n, p) has on average edges. The distribution of the degree of any particular vertex is binomial:
2n
pæö÷ç ÷ç ÷ç ÷çè ø
For large N P(k) can be replaced by a Poisson distribution
![Page 33: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/33.jpg)
Degree distribution of the SW model
April 8, 2010 23rd ESI Conference - Veroia 33
The degree distribution of a random graph with the same parameters is plotted with filled symbols.
![Page 34: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/34.jpg)
Self-Similar = Scale-free Networks• The degree distribution follows a power law, at least
asymptotically. That is:P(k) ~ k−γ
where γ is a constant whose value is typically in the range 2<γ<3, although occasionally it may lie outside these bounds.
• the clustering coefficient distribution, decreases as the node degree increases. This distribution also follows a power law.
April 8, 2010 23rd ESI Conference - Veroia 34
![Page 35: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/35.jpg)
Distribution of links on the World-Wide Web P(k)∼ k−γ power law a, Outgoing links (URLs found on an HTML document); b, Incoming links Web. c, Average of the shortest path between two documents as a function of system size [Barabasi,ea 1999]
April 8, 2010 23rd ESI Conference - Veroia 35
![Page 36: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/36.jpg)
ψ
In-degree and out-degree distributions subscribe to the power law. Power law also holds if only off-site (or "remote-only") edges are considered.April 8, 2010 23rd ESI Conference - Veroia 36
![Page 37: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/37.jpg)
example
• For a graph G let and• This gives a metric between 0 and 1, such that graphs with
low S(G) are "scale-rich", and graphs with S(G) close to 1 are "scale-free". This definition includes the notion of self-similarity implied in the name "scale-free".
April 8, 2010 23rd ESI Conference - Veroia 37
max
( )( ) s GS Gs
=( , )
( ) i ji j E
s G d dÎ
= å
![Page 38: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/38.jpg)
Sampling in networks• Sampling is necessary when the enumeration of data for the
whole network is impossible. Kolaczyk’s Example:• Consider a network G=(V,E), with Nv nodes and Ne edges. Then suppose that we have measurements from a subset V*
of V and from a subset E* of E that define the pair (V*,E*). The pair G*=(V*,E*) may be a subgraph of G but this is not always the case.
Should G*=(V*,E*) be a subgraph for best statistical estimations?
April 8, 2010 23rd ESI Conference - Veroia 38
![Page 39: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/39.jpg)
Sampling in networksEstimation of the Average Degree of the nodes of G:
April 8, 2010 23rd ESI Conference - Veroia 39
![Page 40: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/40.jpg)
Sampling in networks• For testing the estimating method 1500 nodes
selected randomly forming the subset V*, while for the edges two design methods applied.– Design 1: For every node i of V* we observe all edges {i. j} E involving i;
each such edge becomes an element of E*.– Design 2: For each pair {i, j} V*, we observe whether or not {i.j} E; in
this case, that edge becomes an element of E*.
• After 10000 selections the average degree estimated under the two design
methods and the histogram of the estimated values was formed.
April 8, 2010 23rd ESI Conference - Veroia 40
![Page 41: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/41.jpg)
Sampling in networks
The blue histogram is for the estimated average degrees under Design 1, while the red one is for Design 2.It is obvious from the figure that Design 1 gives better estimates. In fact the estimate under Design 1, was 12.117 with s.e. 0.3797, while under Design 2 was 3.528 with s.e. 0.2260. It is notable that in Design 1 the node degrees are the ones in graph G, but the pair (G*, E*) does not form a graph.The Design 2 on the other hand forms a subgraph (the induced subgraph) but the average degree under-estimated by approximately n/Nv.
April 8, 2010 23rd ESI Conference - Veroia 41
![Page 42: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/42.jpg)
Best statistical estimations are obtained when G*=(V*,E*) is not a subgraph
• Why? A crucial point for web statistics!
April 8, 2010 23rd ESI Conference - Veroia 42
![Page 43: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/43.jpg)
Network Link Estimation• If we know the nodes but we have limited
information about the links, • How can we estimate the unknown links?
April 8, 2010 23rd ESI Conference - Veroia 43
![Page 44: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/44.jpg)
Node type EstimationExample: – Can we estimate the gender of persons (being nodes in a network of friends) from some knowledge of the network?
A strategy for the estimation:• Consider each node as missing• Compute the probability to have more links with friends with the
gender of interest.• Compare with the known situation• One may form ROC curves. -----------------------------------------
Kolaczyk, Eric. Statistical Analysis of Network Data, Methods and Models, Springer 2009.
April 8, 2010 23rd ESI Conference - Veroia 44
![Page 45: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/45.jpg)
23rd ESI Conference - Veroia 45
Web Function and Evolution
Traffic on the Internet [Ivanov, Antoniou Prigogine ModelLog-Normal Power Law
Web Traffic
April 8, 2010
![Page 46: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/46.jpg)
23rd ESI Conference - Veroia 46
• Google Pagerank Algorithm• Hyperlink Matrix• Web Traffic not included initially• Random surfer assumption
April 8, 2010
Web Function and Evolution
![Page 47: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/47.jpg)
Web as a Communication Channel
Web
Users
![Page 48: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/48.jpg)
Web
Users
Queries Topics
Papadimitriou,eaAmarantidis, Antoniou, Vafopoulos
![Page 49: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/49.jpg)
Web
Users
Queries Topics
Social networks
![Page 50: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/50.jpg)
23rd ESI Conference - Veroia
Statistics and the Web
• Games: Utility, Auctions• Webmetrics: statistical models for the
Web Structure, Function and Evolution in order to evaluate individual, business and public policies
50April 8, 2010
![Page 51: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/51.jpg)
Web assessment, mathematical modeling and operation
combined with
business applications andsocietal transformations in the knowledge society.
Aristotle University, Department of Mathematics
Master in Web Sciencesupported by Municipality of Veria
www.Webscience.gr
![Page 52: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/52.jpg)
23rd ESI Conference - Veroia 52
Master in web science winter spring
Web science Economics and Business in the Web
Web Technologies Knowledge Processing in the Web
Networks and Discrete Mathematics
Statistical Analysis of Networks
Information Processing and Networks
Mathematical Modeling of the Web
April 8, 2010
![Page 53: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/53.jpg)
23rd ESI Conference - Veroia
Information about Information now!
53April 8, 2010
![Page 54: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/54.jpg)
23rd ESI Conference - Veroia 54
Computational social science• The capacity to collect and analyze massive amounts of
data has transformed such fields as physics (i.e. CERN experiment)and biology (semantic search, ontologies, system biology)
• This not the case for “computational social science” (i.e. economics, sociology, and political science)
• Computational social science is a reality in Web business (i.e. Google) and governments (i.e. CIA) • How will be practiced in the open academic
environment ?
3/18April 8, 2010
![Page 55: Statistics in the Web](https://reader035.fdocument.org/reader035/viewer/2022062501/5681685e550346895ddea1ff/html5/thumbnails/55.jpg)
23rd ESI Conference - Veroia 55
Review • What is the Web?• Web milestones• Why is so successful?• We knew the web was big... • Web generations• Studying the Web• Web Data and Structure• Web Function and Evolution• Web policy April 8, 2010