Some interesting properties of adjacency matrices

March 30, 2008

An adjacency matrix is a boolean square matrix that represents the adjacency relationships in a graph. Given a graph with n nodes, the adjacency matrix A nxn has entries aij = 1, if there if j is adjacent to i, and 0 otherwise (or if there is an edge from i to j). In the graph is undirected, an edge from i to j implies the existence of an edge from j to i. Generally, we talk about simple graphs with no self loops. So, aii is 0. We don’t allow multi-edges either. We can see that, the diagonal entries are all 0’s. Further, in case of an undirected graph, the adjacency matrix is symmetric; this need not be so for directed graphs.

Now, it is evident that the adjacency matrix A also represents all the paths of length 1. Each entry indicates whether there is a 1-length path between the corresponding nodes or not. It also tells us how many 1-length paths are there between the two nodes. (Of course, it is either 0 or 1.)

Interesting things happen when we multiply the adjacency matrix by itself. Let’s take some examples to see what happens.

We take the graph on the left and multiply its adjacency matrix by itself. The results are on the right. (Sorry about the bad formatting; could not figure out an easy way to align the figures properly.) The matrix ‘mat2’ is the matrix A2 . The entries aii show the number of 2-length paths between the nodes i and j. Why this happens is easy to see: if there is an edge ij and an edge jk, then there will be a path ik through j. The entries ii are the degrees of the nodes i.

What happens if we compute A3? Let’s hold it for now and see an example directed graph.

Here, again the entries in mat2 show the number of 2-length paths. The diagonal entries are 0’s unlike the case of undirected graphs where they show the degrees. Next, if we continue this process, the next set of entries show the number of 3-length paths. In the case of digraphs, this can be generalised. By repeated multiplications we can all paths up to lengths n – 1. If there are some non-diagonal entries that have not taken a value greater than 0 even once in the entire process, the graph is not strongly connected.

***

Now consider mat3 in either of the above cases, which is the matrix A3. The trace of this matrix shows an important structural property. The trace of a matrix is the sum of the diagonal entries. Trace = sum(aii). The trace of A3 has a relationship with the number of triangles in the graph.

In case of undirected graphs, Trace = 6 * no. of triangles in the graph = 6 * no of K3‘s
In case of directed graphs, Trace = 3 * no. of triangles in the graph

Below are two more examples to illustrate the above point.

***

We can also note that the above procedure can be used to find the diameter of graphs. We have to find the minimum number of times the adjacency matrix has to be multiplied by itself so that each entry has taken a value greater than 0 at least once. The maximum is, of course, n – 1. Now, the complexity of this procedure is O(n * n3). This is an order bigger than finding the diameter by first finding the all pairs shortest paths. However, in the average case, the former fares better. Also, if we can use methods of fast matrix multiplication, it further improves the complexity.

***

Are there more interesting properties of adjacency matrices? I think so. It would be a good exercise to explore.


Coincidence. Or is it?

March 29, 2008

I attended an interesting talk by Prof. Persi Diaconis, Professor of Statistics and Mathematics at Stanford, called ‘On Coincidences’. It was a lecture targeted towards general scientific audience wherein he explained how simple statistical tools can be used to easily explain coincidences that a lot of people are all too willing to deem paranormal. You should read about Prof. Diaconis here. Read it to find out what a strikingly unconventional career path he has taken.

He started with a story recorded by Carl Jung, the psychoanalyst: on April Fish day in 1949, Jung makes note of a picture that was half human and half fish, then eats fish for lunch, meets a patient that talks about dreaming about a big fish, another shows him pictures of fish; months later when he was writing about these things, he found a dead fish.

Now the question is, is this really as weird or even surprising as Jung or anybody else thinks? Prof. Diaconis argued that it is not. By modeling day to day incidents using simple statistics, we can think about these occurrences in a quantitative way and make sense out of them. Let’s say, we can model these independently normal, but together weird looking incidents as generated by a Poisson process. By modeling the fish story in this way, assuming that we hear about ‘fish’ once a day, Prof. Diaconis found that, there is a 22% chance of the above story occurring, which renders the occurrence not very surprising.

The other important thing he discussed was how the simple Birthday problem can be used as a tool to quantify coincidences. In the birthday problem we have 365 categories (or events). Now, what is the sample size to find at least one pair of matching birthdays in a population, with a probability of 50%? It is given by the simple formula: N = 1.2 * sqrt(C), where C is the number of categories. For a 95% chance of a match, N = 2.5 * sqrt(C). By taking C = 365, we get 23 and 48 as the sample sizes, respectively.

In this paper, he explains the above and gives similar general formulas for other cases: when you have find a match in any of several categories, or finding a close match, and so on. It’s a very simple yet useful way of thinking.

***

This talk was yesterday. Incidentally, someone was giving a seminar in the lab here today on risks in nuclear plants. While talking about the Three Mile Island reactor accident, he mentioned that the failure started at 4:00 AM sharp on March 28th, 1979, on the day of their first anniversary. It was, in fact, the first anniversary down to the minute. There were a couple of gentle giggles until after a couple of seconds, one of the fellows (who had attended yesterday’s talk) said, “That is today!” The speaker (who hadn’t attended the talk) said, “Yeah, coincidence.” The prof (who had attended the talk) said, “Or is it?” People were all giggling about the good joke. All, but one. Because through his extrasensory perceptions he had come to terms with a greater truth that no one else understood; only I had noticed that the time then was exactly 4:00:00 AM!

OK, I made up the last bit about the time being 4 AM. The world still isn’t as crazy to have started talking about nuclear reactor failures at 4 AM. The talk started around 3:40 PM, so there is more than a good chance that the time at that moment was 4 PM. But then, we all like to make a good story better, don’t we?


Graph invariants, “identifiers” and reconstruction

March 25, 2008

Graph invariants are properties associated with a graph that do not change across all possible isomorphisms of the graph. In other words, if a set of graphs are isomorphic, then they all have the same values for certain properties which are called invariants. Examples of graph invariants are the following:

  • no of nodes
  • no of edges
  • the degree distribution (or degree sequence)
  • vertex (edge) chromatic number: the minimum no. of colours needed to colour all vertices (edges) such that adjacent vertices (edges) do not have the same colour
  • vertex (edge) covering number: the minimum no. of vertices (edges) needed to cover all edges (vertices)

As you can see, the invariants are numbers. A graph invariant describes a graph in terms of a simple number. Given a graph, it has a unique number of nodes, number of edges, degree sequence etc.. They find application in chemistry where large chemical compounds are indexed based on these numbers.

It should be noted that the incidence and adjacency matrices are not graph invariants. This is because, when isomorphism is computed, if for every pair of adjacent nodes in graph-1, you can find a pair of adjacent nodes in graph-2, and vice versa, the two graphs are isomorphic; labels are not important.

Also, for a given graph, there are unique invariants. However, the converse need not hold. For example, consider the invariant degree sequence. Shown below are two graphs that have the same degree sequence, but are non-isomorphic.

***

What I am interested in finding what can be called graph “identifiers”. Graph identifiers are properties or metrics that uniquely identify a graph. Examples of identifiers are adjacency and incidence matrices. Given an adjacency matrix, there is only one graph corresponding to it. Similarly, are there more identifiers? It need not be one property; it can be a combination of a set of properties like the no. of edges, degree sequence, diameter, centrality sequence and so on. What is perhaps also interesting is to measure the “uniqueness” of sets of properties. Uniqueness indicates in percentage terms “the extent” to which a property or a set of properties identify a graph. For instance, the adjacency matrix has a uniqueness value of 1. Again, this can be measured in terms of, say edit distances.

I am not sure there is much work in this direction. Do you think this is an interesting problem?

A consequence of this is graph reconstruction. If we have a set of properties that identify a graph, it is possible to construct an isomorphic graph using those properties. It can be used to recover networks that are subject to disruptions. (My usage of reconstruction is similar to that in literature, but slightly different.)


Quiz question

March 21, 2008

Connect the following –

  1.  
  2. Christian Goldbach, Emmanuel Kant, David Hilbert

Of Gas Bills and Modulo Arithmetic

March 17, 2008

Alright, the title is meant to be catchy; there is no real connection there. Except that it is in the context of paying my apartment gas bill I found out about a simple application of modulo arithmetic in electronic transactions in the US. (I am not talking about cryptography. Nowhere near that.) While trying to pay the gas bill through an e-cheque, I found that I had to fill in something called a “routing number” before submitting the e-cheque. Having never heard of the term before, I searched a bit and found out that the number serves chiefly as an identifier of a bank (or other financial institutions) in electronic transactions. It is a 9 digit number which identifies the location of a bank and the bank. (I think a bank can have multiple routing numbers.)

The modulo arithmetic part comes in in generating and validating a routing number. To validate a routing number the following checksum is used:

[3(d1 + d4 + d7) + 7(d2 + d5 + d8) + 1(d3 + d6 + d9)] mod 10 = 0

I don’t know why this particular checksum. And I think, the the first 8 digits are known and the last digit is generated using the above formula. Not sure who does this.

[Source: wikipedia and official ABA policy (pdf link)]

***

Incidentally, Gauss was responsible for modulo arithmetic (or clock arithmetic). The book The Music of the Primes has a nice description of this and many other beautiful things that Gauss developed. On this note, I also recall a quote in the brilliant book called Concrete Mathematics. In that book, the authors have included what are called “marginal notes” — they are actually the notes/comments written by the first set of students who got to read the draft of the book. The quote is – “It seems a lot of stuff is attributed to Gauss; either he was really smart or he had a great press agent.” On the same page, there is another marginal comment – “Maybe he just had a magnetic personality.”


Graph Toughness

March 17, 2008

A small quiz – do you know what toughness of a graph is? Hint: what does it mean to say a circle topology is 1-tough?

[If you don’t know it already, just guess; should be simple to guess. If you know it already, don’t answer for now. Don’t point to the wikipedia (or any other) page in any case :p]


“Physics works”

March 16, 2008

Online video lectures is the greatest thing to have happened for me in the last few months. There are a lot of them available through MIT OCW; Yale has released some; IITs have put up very good lectures on youtube; and so on. This is a great hub that points to a lot of them. I have some pointers here. Many of them can be downloaded as well.

It’s always a great thing when an expert explains things in a clear manner. It’s also much more convenient than reading up and understanding. And it takes much less effort. Someone like me gets to learn and/or revise things by going through them. If I have to study on my own, I would keep postponing it forever!

For example, I understood how simple Newtonian physics is, if presented well, like Prof. Ramamurti Shankar from Yale does; or how intriguing the things that we simply assume are, like I felt when a prof pondered over the question of ‘what are real numbers at all’, in a lecture from IIT. It is also amazing the extent to which people can go to ensure that what they are presenting is absolutely clear. To experience this amazement, one has to watch Prof. Walter Lewin teach physics. He doesn’t just teach physics, he makes physics happen. He brings all kind of stuff to his lectures. Balls, springs, ropes, what have you. In a particular lecture on springs, SHM etc., he first records the period of a 5m long pendulum that has a 15kg ball. Then to show that the period of a pendulum is independent of the mass, he places himself on the metal ball, and swings, keeping himself as horizontal as possible. Then he notes the time, rushes to the board to write it below the one for the ball alone. He then yells, “Physics works!”


More on Regularity Preservation

March 16, 2008

As sri rightly pointed out in the comments section of the previous post, clique insertion is just one way to extend an r-regular graph into a bigger r-regular graph. We should be able to extend the graph by inserting any (r – 2)-regular graph (provided the insertion is possible — refer sri’s comment). In fact, clique insertion is the minimal case: the smallest connected (r-2)-regular graph has r -1 nodes.

There are interesting questions here. Can we insert any (r – 2)-regular graph? What is the procedure to do so? Are there some (r – 2)-regular graphs that cannot be inserted?

Let’s use some notations for ease of reference. Let Ro be the r-regular graph we start with, Ri be the (r – 2)-regular graph we want to insert and Rn be the graph resultant of the insertion.

A simple but important thing to be noted here is that, pairs of nodes that are adjacent in Ri, cannot be inserted in a single edge of Ro because that will result in an overlapping edge (we don’t allow multiedges between nodes); adjacent nodes in Ri have to be inserted in different edges. Even for small values of r, this becomes pretty complicated to visualize. I tried to insertions for small graphs, and it was possible to do. But I don’t know if we can always avoid creating a multigraph.

But then in the above discussion we have assumed Ri to be a connected graph. What if Ri is disconnected? If Ri is disconnected we can easily prove that there is no bound on the size of Ri that can be inserted. The simplest way to show this is by saying that our Ri has an arbitrarily large number of (r – 2)-cliques as disconnected components (disconnected edges, disconnected cycles, disconnected cubic graphs etc.). Now, you can either imagine this as adding an arbitrary number of cliques (like in the previous post) or as adding one large disconnected graph.

Of course, cliques are just one way to insert. We can simply insert nodes and add edges between the new nodes in such a way that all the new nodes have a degree of r.

Note that the resulting graph Rn is, of course, a connected graph.

***

Now, since we are talking about p2p networks, what is more important is the minimum number of nodes and edges that we need to add when nodes join or leave. Or more generally, the minimum change in topology that is required. When a new node comes in, let’s say we need to accommodate it without disrupting symmetry; so we may need to add dummy nodes and edges. The thing is – in a p2p network, most of the time, unrelated nodes join or leave unrelated parts of the network. So, essentially, we are dealing with disconnected components.


Regular Graphs: “Clique Insertion”

March 12, 2008

After talking about the basics of regular graphs, let me now introduce something more interesting. When we make changes to a regular graph topology, like addition or removal of edges or nodes, the resulting topology might not remain regular. An interesting question then is the following: are there ways to modify a given regular graph topology, while maintaining its regularity?

Here I introduce a simple operation called “clique insertion” that “extends” a graph (or makes a graph bigger by adding more nodes) while preserving the regularity of the graph. The basic idea of this is due to sri. Later on I developed it a bit more and now it looks like it is a useful result.

Given an r-regular graph of n nodes, the graph can be extended to an r-regular graph of n + (r – 1) nodes by a process of “node insertions” (or “clique insertion”).

Explanation: Let us start with the minimally connected non-trivial regular graph: a circle with n nodes. We can see that by “inserting” a node between any two nodes, we can extend the graph into a bigger graph while preserving the regularity. In other words, cut an arbitrary edge into two, and make their loose ends incident on a newly added node. We can go on inserting any number of nodes in this fashion. We just get bigger circles, all of which have a regularity of 2.

 

It is pretty simple to see in a circle. Now, let’s go to a slightly more complicated topology. Consider a 3-regular graph. We can get a bigger 3-regular graph by inserting 2 nodes (as explained above), and adding another edge between the 2 nodes.

 

It is equivalent to inserting a clique of exactly r – 1 (where r is the regularity) nodes in the existing graph (refer the figure below). For an arbitrary n and arbitrary r, we can construct an n + (r – 1) node graph by inserting exactly r – 1 nodes, and adding edges among them to form a r – 1 node clique.

 

                                                               

In the figure, we have a 4-regular graph (incidentally, it is the complete graph K5). We now do a “clique insertion” operation on the graph to get a new 4-regular graph of 8 nodes. The nodes marked in brown form the clique.

 

The above result can be extended thus: given an r-regular graph of n nodes, we can construct a graph of n + m * (r – 1), where m is a positive integer. So, we can easily construct arbitrarily large regular graphs of a given regularity r by inserting any number of cliques of size exactly r – 1.

 

 

 

Usefulness: This has an important connotation in distributed systems, where nodes joining and failures are frequent. Several distributed hash tables (DHTs) require a symmetric degree distribution. They also need to accommodate for node joining and leaving.

  • For one, this result can be developed to be a measure the amount of disruption in symmetry that can be expected when changes in topology occur.
  • Also, it can be used to design topology changes so as to preserve symmetry. Perhaps, by using dummy nodes.

Question: Is there a similar way to make a graph smaller while preserving the regularity?

That was one example of preserving a structural property of a graph. It is an interesting to find out if we can have a whole set of such operators. We can look at several other properties as well: diameter, average path length, centrality distributions, connectivity etc.. I shall define this problem more formally in another post later on.


Regular Graphs: Bounds

March 12, 2008

In this post, I discuss a few results about connected regular graphs that are fairly obvious. However, to the best of my knowledge they have not been discussed explicitly in literature.

  • If a connected graph of n (n > 2) nodes is regular, then it is at least 2-regular. In other words, the lower bound on the regularity r of a connected graph of more than 2 nodes, is 2.

Proof: Let us start with a minimally connected graph of n > 2 nodes. It is a straight line graph with n−1 edges, in which every node, except the two end nodes of the straight line, is connected to a “predecessor” and a “successor”. To make this regular, we add one more edge between the two end nodes of the straight line. The graph now is a circle, with every node connected to 2 other nodes, thus making it the minimal connected regular graph.

  • The upper bound on the regularity r of a n node (n > 0) graph is n − 1.

Proof: In a graph of n nodes, every node can connect to a maximum of n − 1 nodes, forming a complete graph. Hence the above result follows.

  • As a corollary of the above result, we have: the lower bound on the number of nodes n required to construct a r regular graph is r + 1.