Survey of measurements in complex networks analysis

May 25, 2009

Costa et al. [1] provide possibly the most comprehensive survey of measurements used in complex networks analysis [Arxiv]. Starting from well known measures such as degree distribution, centrality, modularity etc., they move on to spectral measures and fractal dimensionality. (I am not sure what some of these measures are!) They finally discuss statistical methods for measurement selection, including Principal Component Analysis. I believe they are using PCA like analysis to select “features” of networks based on which they can classify them. If you are doing network analysis, I suggest you keep this survey handy.

[1] Luciano da F. Costa, Francisco A. Rodrigues, Gonzalo Travieso, P. R. Villas Boas. Characterization of complex networks: A survey of measurements. Advances in Physics, Volume 56, pages 167 – 242, Issue 1 (2007)

Advertisement

Reboot and a paper

May 25, 2009

Over the last few days I received several spam comments on this blog (written using Cyrillic alphabet, incidentally). That made me visit this blog for the first time in several months, to turn off comments on old posts. Inadvertently, that also made me consider rebooting this. (Spam can be useful!) At the least, I can use this as a space where I put down comments about interesting papers I read. I might also write about some of the things I am working on.
***

Let me begin by discussing a paper of mine that just got accepted at SMC 2009. The title of the paper is ‘Classes of Optimal Network Topologies under Multiple Efficiency and Robustness Constraints’. In this paper, we have looked at the problem of designing optimal network topologies under efficiency, robustness and cost trade-offs. Efficiency, robustness and cost are defined in terms of structural properties: diameter, average path length, closeness centrality (efficiency); degree centrality, node betweenness, connectivity (robustness); number of edges (cost). A slider $\alpha$ is used to indicate the emphasis on robustness (on a scale of [0, 1]), and thus decides the trade-off between efficiency and robustness. Another parameter $\beta$ (in [0, 1]) decides how cheap or expensive edges are. For example, if $\beta$ is 0, we might make use of the full budget allocated, whereas if $\beta$ is 1, we want to “squeeze out” the most cost-effective topology by removing superfluous edges. Finally, there is also a constraint on the maximum permissible degree (indegree and outdegree, in case of digraphs) on a node in the resulting graph. Maximum degree can be thought of as a constraint on the “bookkeeping” (for example, size of DHT finger tables) a node has to do, or as constraints on maximum inflow and outflow through a node.

Different efficiency and robustness metrics are useful in different application contexts. For example, in case of traffic flow networks, congestion is a major challenge. Hence, node betweenness is a useful robustness metric, along with diameter as the efficiency metric which relates to upper bounds on the communication delay. In a Network Centric Warfare (NCW) scenario, it might be important to have alternate communication paths in case targeted attacks occur on nodes or links. Connectivity is a useful robustness metric there. And so on. Thus, what we do next is an ‘exploration of the parameter space’ involving efficiency and robustness metrics, $\alpha$, $\beta$, cost and maximum degree. We conduct genetic algorithm based experiments to evolve the “fittest” or the “most optimal” topologies under given trade-offs.

The main results in the paper are the following:

  • Two prominent classes of topologies that emerge as optimal: (1) star-like or scale-free networks with small diameters, low cost, high resilience to random failures and very low resilience to targeted attacks (2) Circular skip lists (CSL) which are highly resilient in the face of both random failures and targeted attacks, have small diameters under moderate costs.
  • Further, we observe that star-like networks are optimal only when both the following design requirements are satisfied: (1) very low emphasis on robustness (or very high emphasis on efficiency) and (2) severe restrictions on cost. In all other cases, CSL are optimal topologies in terms of balancing efficiency, robustness and cost.
    Thus, we observe a sharp transition from star-like networks to CSL as emphasis on robustness ($\alpha$) increases or cost restrictions become less severe.
  • Circular skip lists are highly homogeneous wrt many structural properties such as degree, betweenness, closeness, pairwise connectivity, pairwise path lengths. Further, they have structural motifs such as Hamiltonicity, (near) optimal connectivity, low diameter, which are optimal under varied application requirements. We argue that CSLs are potential underpinnings for optimal network design in terms of balancing efficiency, robustness and cost.

Graph Powers

August 7, 2008

Earlier I had talked about multiplying the adjacency matrix of a graph by itself and how it leads to some interesting results. Multiplying the adjacency matrix by itself essentially corresponds to raising the graph by some integer exponent while maintaining the same number of nodes.

The powers of graphs are interesting structures. The p^th power, G^p of a graph G is obtained by adding edges between nodes in G that are separated by a pathlength (greater than 1 and) less than or equal to p. Specifically, the square of a graph is the graph resulting from joining all nodes that are separated by a pathlength 2. A cube of a graph is obtained by short-circuiting all 2 length and 3 length paths in the graph.

An important consequence of raising the graph to a power is diameter reduction. If G has diameter d, then graph G^p has diameter \ceil{d/p}. There are several other interesting properties of graph powers that we can observe, which are important in the design of optimal topologies. I’d be interested in knowing your ideas.


A note on Hamiltonian circuits (and Uncle Paul)

June 22, 2008

Sometime back I read a short paper by Chvatal and Erdos that presents a couple of sufficiency conditions for hamiltonicity.

One of the results there is – If G is an s-connected graph with at least 3 vertices, and has a maximum independence number of s, then G has a hamiltonian circuit.

It is an interesting result connecting robustness and hamiltonicity. A graph is s-connected if there are s edge independent paths between any two nodes in the graph. In other words, it is the size of the minimum edge cut i.e. the minimum number of edges whose deletion increases the number of components in the graph.

The maximum independence number of a graph is the size of the biggest independent set. An independent set is a set of vertices of the graph such that no two vertices in the set are adjacent. In other words, there is no edge between any pair of vertices in the independent set. The bigger the independence number, the easier it is to fragment the network.

As you can see both these concepts can be used to measure the robustness of a graph. Also, it is not unnatural that connectivity and hamiltonicity are related. Intuitively, the more independent paths, the greater the chances of a graph being hamiltonian. And, a hamiltonian circuit is at least 2-connected: a circle is the simplest hamiltonian circuit and it is 2-connected.

***

Well, but the point of this post is something else really. The paper I referred to in the beginning is just a 3-page note. And it has an interesting footnote on the 1st page. The footnote says: This note was written in Professor Richard K. Guy’s car on the way from Pullman to Spokane, Wash. The authors wish to express their gratitude to Mrs. Guy for smooth driving.


Goodstein’s Theorem

May 20, 2008

In the previous post, I had talked about a special kind of sequence. Such sequences are called Goodstein’s sequences. And Goodsten’s theorem states that all Goodstein sequences converge to 0. Now, this is a rather counter-intuitive result given the way the series expands. Remember, in every iteration, you are increasing the base and the exponent of the terms in a sum by 1, whereas you are subtracting the whole sum by just 1. Surely, the number is increasing very rapidly notwithstanding the subtractions? For example, even a small first number, say 4, increases in this manner: 4, 26, 41, 60, 83, 109, 139, 173,…, 3319048, …, 93850207,…

How can this sum start reducing and converge to 0? One has to look more carefully to see what is happening. Let us start with a smaller number, say 3.

Iteration 1 (hereditary base-2):
(1) 3 = 2 + 1
(2) 3 + 1
(3) 3 + 1 – 1 = 3
Iteration 2 (hereditary base-3):
(1) 3 = 3
(2) 4
(3) 4 + 1 – 1 = 3
Iteration 3 (hereditary base-4)
(1) 3 = 3
(2) 3
(3) 3 – 1 = 2

See what’s happening? Firstly, note that in step(2) of iteration 1, the last term, which is 1, does not get increased, because the number is less than the current base. Say the base is n. Then increasing the base of numbers 1 through (n – 1), does not change them. For example, 2 is base 4 is 40 + 40. And 2 in base 5 is, 50 + 50. Correct? So, in this case, the last term is not increased and there is a subtraction following that.

What happens is, even when you are increasing the bases of the terms, the subtraction by 1 operation is slowly “eating” into the last term of the sum. This process may be arbitrarily slow, but eventually, it so happens that, the power of the last term in the sum “falls off”, and the last term falls into a lower base than the ongoing one, hence ceasing to increase. So then, the subtraction can now eat away the last term easily, before it starts attacking the next term (which will now be the last term).

So, you can now see that, that the sum does in fact start reducing, albeit after numerous steps, even for small numbers. In fact, you can only check this for the numbers 1, 2 and 3 by hand. You can probably write a program to check this for bigger numbers, though it might take extremely long.

***

Now, another important thing about Goodstein’s theorem is that it is one of Godel type theorems. The truth of Goodstein’s theorem cannot be proved using the axioms of first order arithmetic (Peano arithmetic). There is a proof for this, as well as a proof for Goodsteins’s theorem using techniques that are outside of Peano arithmetic.


An interesting sequence

May 16, 2008

Mandar has an interesting set of posts on convergence and divergence of infinite series. It made me recall a fascinating result I had come across once. It is not about an infinite series, though. But it is a very interesting sequence.

The sequence needs some explanation of a notation called hereditary base-n notation. Let me explain this with an example. Let us write the number 26 in its hereditary base-2 notation. First we start by writing 26 as the sum of powers of 2.
26 = 24 + 23 + 21
Next, the powers are written as sums of powers of 2 as well.
So, 26 = 222 + 221 + 1 + 21

Similarly, the hereditary base-3 notation of 1000 is the following.

1000 = 36 + 34 + 33 + 1 = 33 + 3 + 33 + 1 + 33 + 1

Note that the bases and the powers cannot be bigger than n. Also, we can write the terms as a product of a base power n and a number smaller than n. For example – 26 can be written in hereditary base-3 notation as, 2.32 + 2.3 + 2.

Now, let us take a number, say 26, and do the following:

  1. Take the number. Start with base, n = 2.
  2. Express number in hereditary-n notation
  3. The next number is formed by changing all the n’s to n+1’s. That is, “increase” the base of the sequence by 1. [n = n + 1]
  4. Subtract 1 from the above number and goto 2. [number = number – 1]

Let’s take an example. Let’s start with 4.

4 = 22 (step 2)

33 (step 3)

33 – 1 = 26 (step 4) — Iteration 1

26 = 2.32 + 2.3 + 2 (step 2)

2.42 + 2.4 + 2 (step 3)

2.42 + 2.4 + 2 – 1 = 41 (step 4) — Iteration 2

41 = 2.42 + 2.4 + 1 (step 2)

2.52 + 2.5 + 1 (step 3)

2.52 + 2.5 + 1 – 1 = 60 (step 4) — Iteration 3

60 = 2.52 + 2.5 (step 2)

2.62 + 2.6 (step 3)

2.62 + 2.6 – 1 = 83 (step 4) — Iteration 4

83 = 2.62 + 6 + 5 (step 2)

2.72 + 7 + 5 (step 3)

2.72 + 7 + 5 – 1 = 109 (step 4) — Iteration 5

And so on. Now the question is, does this sequence converge (or terminate)? If it converges, what does it converge to? Or if you think it does not converge, explain why.

As always, people who already know this result may please defer commenting.


Seminar on Complex Networks

May 14, 2008

A series of seminars on Complex Systems has been scheduled through the summer over here. I had been asked to give the first seminar. I gave a talk yesterday. I gave an overview of the philosophy, design and analyses of  complex networks. Among other things, I talked about how several graph theoretic measures can be used in both design and analyses of complex networks. There was a lot of lively discussion. The total seminar spanned about 2.5 hours, which probably means the seminar went off quite well. In case you are interested, here are the slides I had used.


Everybody stand back…

April 21, 2008

… I have arrived!

After numerous false starts, I have finally learnt (just) enough Perl to do the essential Unix-like file processing operations like head, tail, grep etc. on Windows. I am in my “Windows usage phase”, and there is absolutely nothing you can do through the command line. (And Windows Vista sucks even otherwise, which means soon I will be back to my “Ubuntu phase”.) I do use cygwin, but Perl is fun.


Question

April 13, 2008

I have a question that has been bothering me a bit more over the last few days than ever. What are the core computer science areas? What are the core skills that computer scientists have that researchers in other disciplines don’t have (or computer scientist have an edge over others in them)?


Some interesting properties of adjacency matrices

March 30, 2008

An adjacency matrix is a boolean square matrix that represents the adjacency relationships in a graph. Given a graph with n nodes, the adjacency matrix A nxn has entries aij = 1, if there if j is adjacent to i, and 0 otherwise (or if there is an edge from i to j). In the graph is undirected, an edge from i to j implies the existence of an edge from j to i. Generally, we talk about simple graphs with no self loops. So, aii is 0. We don’t allow multi-edges either. We can see that, the diagonal entries are all 0’s. Further, in case of an undirected graph, the adjacency matrix is symmetric; this need not be so for directed graphs.

Now, it is evident that the adjacency matrix A also represents all the paths of length 1. Each entry indicates whether there is a 1-length path between the corresponding nodes or not. It also tells us how many 1-length paths are there between the two nodes. (Of course, it is either 0 or 1.)

Interesting things happen when we multiply the adjacency matrix by itself. Let’s take some examples to see what happens.

We take the graph on the left and multiply its adjacency matrix by itself. The results are on the right. (Sorry about the bad formatting; could not figure out an easy way to align the figures properly.) The matrix ‘mat2’ is the matrix A2 . The entries aii show the number of 2-length paths between the nodes i and j. Why this happens is easy to see: if there is an edge ij and an edge jk, then there will be a path ik through j. The entries ii are the degrees of the nodes i.

What happens if we compute A3? Let’s hold it for now and see an example directed graph.

Here, again the entries in mat2 show the number of 2-length paths. The diagonal entries are 0’s unlike the case of undirected graphs where they show the degrees. Next, if we continue this process, the next set of entries show the number of 3-length paths. In the case of digraphs, this can be generalised. By repeated multiplications we can all paths up to lengths n – 1. If there are some non-diagonal entries that have not taken a value greater than 0 even once in the entire process, the graph is not strongly connected.

***

Now consider mat3 in either of the above cases, which is the matrix A3. The trace of this matrix shows an important structural property. The trace of a matrix is the sum of the diagonal entries. Trace = sum(aii). The trace of A3 has a relationship with the number of triangles in the graph.

In case of undirected graphs, Trace = 6 * no. of triangles in the graph = 6 * no of K3‘s
In case of directed graphs, Trace = 3 * no. of triangles in the graph

Below are two more examples to illustrate the above point.

***

We can also note that the above procedure can be used to find the diameter of graphs. We have to find the minimum number of times the adjacency matrix has to be multiplied by itself so that each entry has taken a value greater than 0 at least once. The maximum is, of course, n – 1. Now, the complexity of this procedure is O(n * n3). This is an order bigger than finding the diameter by first finding the all pairs shortest paths. However, in the average case, the former fares better. Also, if we can use methods of fast matrix multiplication, it further improves the complexity.

***

Are there more interesting properties of adjacency matrices? I think so. It would be a good exercise to explore.