Entropy based measures for graphs

41
Entropy based measures for graphs Master in Web Science, Department of Mathematics, Aristotle University of Thessaloniki Giorgos Bamparopoulos, Nikos

Transcript of Entropy based measures for graphs

Page 1: Entropy based measures for graphs

Entropy based measures for graphs

Master in Web Science, Department of Mathematics, Aristotle University of Thessaloniki

Giorgos Bamparopoulos, Nikos Papathanasiou

Page 2: Entropy based measures for graphs

Introduction

In many scientific areas, e.g. sociology, psychology, physics, chemistry, biology etc., systems can be described as interaction networks where the interactions correspond to edges and the elements to vertices. A variety of problems in those fields can deal with network comparison and characterization.

The problem of comparing networks is the task of measuring their structural similarity. Moreover, “characterize networks” means that we are searching for network characteristics which capture structural information of networks. To analyze complex networks, several methods can be combined, such as graph theory, information theory, and statistics.

In this project, in order to characterize and compare network structures, we describe methods for measuring Shannon’s entropy of graphs.

To introduce entropy based measures for graphs, we first mention some basic graph-theoretical preliminaries:

Page 3: Entropy based measures for graphs

Preliminaries

A graph is a non-empty finite set V of elements called vertices together with a possibly empty set E of pairs of vertices called edges. It is denoted G=(V,E)

The degree of a vertex u is the number of links plus twice the number of loops incident with u. It is denoted d(u). The degree distribution is the relative frequency of vertices with different degrees. (P(k) = fraction of vertices with degree k)

The j-sphere of a vertex vi in G is defined by the set

Page 4: Entropy based measures for graphs

Preliminaries

A path of length n , n ≥1 , from vertex a to vertex b is a sequence of directed edges e1, e2, ...,en such that the initial vertex of e1 is a , and the terminal vertex of en is b, and for i=2, ..., n , the initial vertex of ei is the terminal vertex of ei-1 . When it is more convenient, vertices can be listed in order (rather than the edges) to represent the path.

If the first and last nodes of a path coincide it is called cycle.

d(u,v) denotes the distance between u V and v V expressed ∈ ∈as the minimum length of a path between u,v.

Page 5: Entropy based measures for graphs

Preliminaries

A coloring of a graph G(V,E) is an assignment of colors to the vertices such that no two adjacent vertices have the same color

 An n-coloring of a graph G = (V, E) is a coloring with n colors. More precisely, a mapping f of V onto the set {1, 2, . . . , n} such that whenever [u, v]E, f(u) f(v).

The chromatic number X(G) of a graph G is the minimum number of colors needed to color the vertices of G so that no two adjacent vertices have the same color.

An n-coloring is complete if, i, j with ij, there exist adjacent vertices u and v such that f(u) = i and f(v) = j. The sets of vertices with the same color is called color classes.

Page 6: Entropy based measures for graphs

Preliminaries

We call the quantity σ(v)= the eccentricity of v V. The ∈ diameter ρ of a connected graph G is the maximum eccentricity among all nodes of G.

For a given path P = {u0, u1, . . . , un}, a downstream edge of a vertex ut P is any edge (u∈ t,w) E with wu∈ s for s < t.

A downstream vertex of ut is any vertex w such that (ut ,w) is a downstream edge.

The downstream degree, D(ut), of a vertex is the number of downstream edges it has.

Page 7: Entropy based measures for graphs

Basic concepts

Information content of graphs

Entropy of degree distribution

Entropy as a measure of centrality

Entropy as a measure of connectivity

Topology entropy of a network

Page 8: Entropy based measures for graphs

Information content of graphs

Rashevsky, Trucco and Mowshowitz were the first researchers to define and investigate the Shannon-entropy of graphs.

Rashevsky in 1955, defined the topological information content of the graph, dealing with the complexity of organic molecules, where vertices represent physically indistinguishable atoms and edges represent chemical bonds.

His definition is based on the partitioning of the n vertices of a given graph into k classes of equivalent vertices, according to their degree dependence. Then, assigned a probability to each partition obtained as the fraction of vertices in this partition divided by the total number of vertices.

Page 9: Entropy based measures for graphs

Examples:Graph has 3 vertices of degree two (1, 3, and 4), and two vertices (2 and 5) of degree three. However, vertex 1 is topologically different from vertices 3 and 4. Vertex 1 is adjacent to two vertices of degree three, while each of the vertices 3 and 4 is adjacent to one vertex of degree three and one of degree two. Hence we have classes of vertices, 1 with probability , 2 and 5 with probability , 3 and 4 with probability . The information content of is:

By adding the vertex 6, vertices 2 and 5 have different degrees. Moreover vertex 3 is adjacent to one vertex of degree four and to one of degree two, while the vertex 4 is adjacent to one vertex of degree three and one of degree two. So, we have six distinct classes each of them have probability . Hence, the information content of is:

Page 10: Entropy based measures for graphs

Trucco in 1956 made this definition more precision. He considered that two vertices are equivalent if they belong to the same orbit of this group, i. e., if they can interchange preserving the adjacency of the graph.

We denote the group of all automorphisms of a graph X by G(X).

X is a graph with . If f∈G(X), for 1 < i < n. Hence, to each f∈G(X), there corresponds a unique permutation of the elements 1,2,...,n. We can regard G(X) as a subgroup of , the symmetric group of degree n.

Suppose K is a subgroup of and. Then is called an orbit of K. Let, be the distinct orbits of K. Then , if i≠j and , hence the orbits form a partition of the set {1, 2 , . . . , n}. We assigned probability to each orbit, where is the cardinality of .

Page 11: Entropy based measures for graphs

Examples:

={e, (12)}, Orbits of : {1, 2}, {3},{4}. Hence,

={e, (23), (24), (34), (234), (243)}, Orbits of : {2,3,4},{1} Hence,

Page 12: Entropy based measures for graphs

={e, (13), (24), (13) (24)}. Orbits of : {1, 3},{2, 4}. Hence,

G()= {(1234), (13), (24), (13) (24), (12) (34), (14) (23) (1432)}. Orbits of G(): {1,2,3,4}

Page 13: Entropy based measures for graphs

In 1968 Mowshovitz has studied in detail the information content of the graphs and has formulated, on the basis of the chromatic properties of the graphs.

A decomposition of the set of vertices V is called a chromatic decomposition of G if u, v Vi imply that [u, v] E. If f is an n-coloring, the collection of sets forms a chromatic decomposition; conversely, a chromatic decomposition determines an n-coloring f.

Page 14: Entropy based measures for graphs

For the graph G(V,E), , X(G)=3Chromatic decomposition is :A:{1,3}, {2,5}, {4,6}B: {2}, {1,3,} {4,5,6}C:{2}, {3}, {1,4,5,6}

Page 15: Entropy based measures for graphs

Information content base on functionals

M. Dehmer and F. Emmert-Streib (2007) presented a method to determine the structural information content which is not based on the problem of finding partitions of the vertices, and the overall time complexity is polynomial.

They assigned a probability to each vertex as,

where f represents an arbitrary information functional.

In this case , 1 ≤ k ≤ ρ, α >0, are real positive coefficients and they are chosen in such a way to emphasize certain structural characteristics e.g. high vertex degrees.

Page 16: Entropy based measures for graphs

The process of defining vertex probabilities using graph-theoretical quantities is not unique. Each such quantity captures different structural information of a graph. In that way, M. Dehmer induced another functional.The associated paths of the j-sphere are

and their edge sets :

Page 17: Entropy based measures for graphs

For each vertex V he determined the∈ local information graph which is induced by the paths where and

The new information functional is , 1≤k≤ρ , α>0 and where expresses the sum of the path lengths.

Page 18: Entropy based measures for graphs

Example:=4 and it is considered that:

Page 19: Entropy based measures for graphs
Page 20: Entropy based measures for graphs

The degree distribution network entropy

The entropy of the degree distribution had be defined as , where p(k) is the probability that a node has degree k an N is the number of nodes. The maximum value of entropy is obtained for a uniform degree distribution and the minimum value is achieved when all vertices have the same degree.

Entropy provides an average measure of heterogeneity, since it measures the diversity of the link distribution. Heterogeneity is in relationship with the network’s resilience to attacks.

B.Wang, H.Tang, C. Guo and Z. Xiu studied the robustness of scale-free networks to random failures ,with entropy of the degree distribution. By maximizing the entropy of the degree distribution, we get an optimal design of scale-free network’s robustness to random failures.

Page 21: Entropy based measures for graphs

Solé and Valverde used entropy of the remaining degree . The remaining degree of a vertex at one end of an edge is the number of edges connected to that vertex not counting the original edge. The remaining degree distribution q(k) is obtained from:

Fig. Here two given, connected nodes , are shown, displaying different degrees , . Since we are interested in the remaining degrees, a different value needs to beconsidered (here indicated as , )

Page 22: Entropy based measures for graphs

Entropy as a measure of centrality in networks

Frank Tutzauer (2007) has proposed a measure of centrality for networks characterized by path-based transfer flows.

To model a path-transfer process, it is helpful to think of a specific object being passed from one node to another. If the starting node is chosen, the flow either stops, otherwise, the object passes to the chosen node.

The next node then randomly chooses from among its neighbors and again the flow either stops or continues. The object thus traverses a path in the network, traveling along links, stopping when a loop is chosen. Each neighbor is assumed to be chosen with equal likelihood.

Page 23: Entropy based measures for graphs

What is the probability that a flow beginning at vertex 5 ends at vertex 2,

in the network below?Consequently there is a 1/4 probability that vertex 5 passes control to vertex 3 (because 5 chooses with equal likelihood from among nodes 3, 4, 5, and 6). Once node 3 receives control, the flow will not pass back to node 5, so there is a 1/2 probability that node 3 stops the flow, and a 1/2 probability that control passes to node 2. Likewise, vertex 2 chooses between stopping the flow or continuing it to vertex 1. As a result, the probability that a flow beginning at vertex 5 ends at vertex 2 is (1/4)(1/2)(1/2) = 1/16

The transfer and the stopping probability of a vertex is given by:

Page 24: Entropy based measures for graphs

To obtain the single path probability – i.e., the likelihood that a flow beginning at ends at by traveling along the path = {, . . . , } – simply multiply by the transfer probability of each of the first 0, . . ., –1 vertices and the stopping probability of the last vertex in the path.

Then the overall probability that a flow starting at i ends at j is given by the combined path probability, which is simply the single path probabilities summed across the K(i, j) paths from i to j:

The path-transfer centrality of vertex i is then given by the entropy

Page 25: Entropy based measures for graphs

Entropy as a connectivity measure

Entropy as a connectivity measure of a graph is defined as , where

To obtain a measure with a [0, 1] range, we divide H(G) by the maximum entropy given by ld|V|

High entropy indicates that many vertices are equally important, whereas low entropy indicates that only a few vertices are relevant.

Page 26: Entropy based measures for graphs

Topology entropy of network

In order to describe the uncertainty of complex networks, we proposed another network entropy concept based on the topology configuration of network. For a complex networks generated according to certain rules, in the given parameters, each test can generate a specific network configuration. Repeated tests, a wide range of configurations will be produced.

The topology entropy of network is defined as , where Ω is the number of all possible configurations , and is the probability of a graph configuration i.

We give an example for the ER network model G(3,0.2),

Page 27: Entropy based measures for graphs

The number of all possible edges is equal to M=3*2/2=3. Here 3 possible connected processes can be treated as 3 independent random events.

The actual link edge number is a random variable, it will comply 𝑚the binomial distribution, where . The possible configurations are =8.

All the configurations in these different configurations appear with probability . This is a conditional probability which links numbers are , so the realization of a G(3,0.02) random network, 𝑚here edges can be regard as a random incident A which has 𝑚 𝑖occurrence probability: , where i=1,…,8.

Page 28: Entropy based measures for graphs

As the probabilities of each configuration for certain are the same, the entropy can be calculated 𝑚for different respectively, so can be written as:𝑚 𝑆

Page 29: Entropy based measures for graphs

Example

For this network, we compute entropy as a connectivity and centrality measure and entropy of degree and remaining degree distribution. Then we compare them with some existing measures.

Page 30: Entropy based measures for graphs

Centrality

The table above depicts entropy-based centrality, degree centrality, betweenness centrality and closeness centrality.

Node Entropy Degree Closeness Betweenness

11 3.407 2 (9-13) 0.319 (3) 50 (3-4)7 3.239 3 (4-8) 0.341 (2) 54 (2)10 3.211 4 (1-3) 0.349 (1) 57.5 (1)6 3.131 2 (9-13) 0.313 (4) 50 (3-4)8 3.061 3 (4-8) 0.306 (5) 3.5 (10)12 3.003 4(1-3) 0.283 (6) 44 (6)5 2.944 3 (4-8) 0.278 (7) 47 (5)14 2.87 4 (1-3) 0.238 (9) 14 (8)9 2.868 2 (9-13) 0.268 (8) 0 (11-16)13 2.861 3 (4-8) 0.234 (10-12) 0 (11-16)15 2.787 3 (4-8) 0.234 (10-12) 0 (11-16)3 2.7 2 (9-13) 0.234 (10-12) 26 (7)4 2.421 1 (14-16) 0.221 (13) 0 (11-16)16 2.343 1 (14-16) 0.195 (15) 0 (11-16)2 2.308 2 (9-13) 0.197 (14) 14 (8)1 2.043 1 (14-16) 0.167 (16) 0 (11-16)

Page 31: Entropy based measures for graphs

The table below portrays Spearman’s rank-order correlation:

If we are interesting about centrality as a score, rather than simply the ranking, Pearson’s correlation coefficient is more appropriate. So the table below illustrate the Pearson’s r correlation:

Page 32: Entropy based measures for graphs

Entropy, betweenness, and closeness all agree on the top four nodes, though they rank them differently.

Measures are highly, but not perfectly correlated. Degree centrality provides the minimum correlation with entropy based centrality, in comparison with betweenness and closeness centrality.

In contrast, closeness produces the rankings most similar to the rankings of the entropy centrality.

Page 33: Entropy based measures for graphs

Scatter plots and linear regression

Entropy-Closeness

Page 34: Entropy based measures for graphs

Scatter plots and linear regression

Entropy-Betweenness

Page 35: Entropy based measures for graphs

Scatter plots and linear regression

Entropy-Degree

Page 36: Entropy based measures for graphs

Another visualization of the network:

The size of the nodes depends on the value of the entropy.

Page 37: Entropy based measures for graphs

Connectivity

For the graph with loops the number of edges is: |E|=36. We obtain H(G)=3,967. To normalize H(G), we divide it by the maximum entropy given by ld(|V|)=ld(16)=4. So, we obtain 0,991.

If we simplify the loops, the number of edges is: |E|=20. Thus, H(G)=3,877Normalize: H(G)/ld(|V|)=H(G)/ld(16)=3,877/4=0,969.

Page 38: Entropy based measures for graphs

Degree distribution

Entropy of degree distribution: H(p)=1,9544

Entropy of the remaining degree distribution: H(q)=1,832

Remaining

degreeq(k)

0 0,0751 0,252 0,3753 0,3

Page 39: Entropy based measures for graphs

Summary and conclusion

This project has attempted to demonstrate a variety of methods for measuring the entropy of graphs. We started with a review of classical measures for determining the structural information content of graphs.

Furthermore, we represented some other approaches characterized by information functionals. Then we represented topological network entropy. Moreover, we represented the entropy as centrality and connectivity measure, the entropy of degree and remaining degree distribution. Finally, we gave an example of a graph and computed the relative measures. Nowadays, graph-based models are applied in a wide range of disciplines. Shannon’s entropy, as a measure of structural characteristics has been proven very useful. Further development of that theory and more efficient algorithms for computing entropy is needed.

Page 40: Entropy based measures for graphs

References

• N. Rashevsky, Life information theory and topology, Bulletin of Mathematical Biophysics 17 (1955) 229–235.• E. Trucco, A note on the information content of graphs, Bulletin of Mathematical Biology 18 (2) (1956) 129–135.• A. Mowshowitz, Entropy and the complexity of the graphs I: an index of the relative complexity of a graph, Bulletin of

Mathematical Biophysics 30 (1968) 175–204. • A. Mowshowitz, Entropy and the complexity of graphs IV: entropy measures and graphical structure, Bulletin of Mathematical

Biophysics 30 (1968) 533–546.• Mowshowitz , V. Mitsou, Entropy,Orbits and Spectra of Graphs Analysis of Complex Networks: From Biology to Linguistics, (2009)

Ch. 1.• M. Dehmer, Information-theoretic concepts for the analysis of complex networks, Applied Artificial Intelligence 22 (7 & 8) (2008)

684–706.• M. Dehmer, A novel method for measuring the structural information content of networks, Cybernetics and Systems 39 (2008)

825–843.• M. Dehmer, F. Emmert-Streib, Structural information content of networks: graph entropy based on local vertex functionals,

Computational Biology and Chemistry 32 (2008) 131–138.• M. Dehmer, Information processing in complex networks:Graph entropy and information functionals, Applied Mathematics and

Computation 201 (2008) 82–94.• Dehmer ., Mowshowitz A., A history of Graph ntropy measures,Information Sciences 181, 57 78 (2011)Μ Ε ‐• F. Tutzauer, Entropy as a measure of centrality in networks characterized by path-transfer flow, Social Networks 29 (2007) 249–

265.• R.V. Sole, S. Valverde, Information theory of complex networks: on evolution and architectural constraints, in: Lecture Notes in

Physics, (2004), vol. 650, pp. 189–207.• B. Wang, H. Tang ,_ C. Guo, Z. Xiu, Entropy Optimization of Scale-Free Networks Robustness to Random Failures (2005).• Li J. ,ea 2008, Network Entropy Based on Topology Configuration and Its Computation to Random Networks, Chin. Phys. Letters

25, 4177-4180• R. Navigli, M. Lapata, Graph Connectivity Measures for Unsupervised Word Sense Disambiguation• L. da F. Costa, F. A. Rodrigues, G. Travieso, P. R. Villas Boas, Complex Nets Characterization Measurements Entropy (2008)

Page 41: Entropy based measures for graphs

Questions?

Thank you for your attention!!!