## Everybody stand back…

April 21, 2008

… I have arrived!

After numerous false starts, I have finally learnt (just) enough Perl to do the essential Unix-like file processing operations like head, tail, grep etc. on Windows. I am in my “Windows usage phase”, and there is absolutely nothing you can do through the command line. (And Windows Vista sucks even otherwise, which means soon I will be back to my “Ubuntu phase”.) I do use cygwin, but Perl is fun.

## Question

April 13, 2008

I have a question that has been bothering me a bit more over the last few days than ever. What are the core computer science areas? What are the core skills that computer scientists have that researchers in other disciplines don’t have (or computer scientist have an edge over others in them)?

## Some interesting properties of adjacency matrices

March 30, 2008

An adjacency matrix is a boolean square matrix that represents the adjacency relationships in a graph. Given a graph with n nodes, the adjacency matrix A nxn has entries aij = 1, if there if j is adjacent to i, and 0 otherwise (or if there is an edge from i to j). In the graph is undirected, an edge from i to j implies the existence of an edge from j to i. Generally, we talk about simple graphs with no self loops. So, aii is 0. We don’t allow multi-edges either. We can see that, the diagonal entries are all 0’s. Further, in case of an undirected graph, the adjacency matrix is symmetric; this need not be so for directed graphs.

Now, it is evident that the adjacency matrix A also represents all the paths of length 1. Each entry indicates whether there is a 1-length path between the corresponding nodes or not. It also tells us how many 1-length paths are there between the two nodes. (Of course, it is either 0 or 1.)

Interesting things happen when we multiply the adjacency matrix by itself. Let’s take some examples to see what happens.

We take the graph on the left and multiply its adjacency matrix by itself. The results are on the right. (Sorry about the bad formatting; could not figure out an easy way to align the figures properly.) The matrix ‘mat2’ is the matrix A2 . The entries aii show the number of 2-length paths between the nodes i and j. Why this happens is easy to see: if there is an edge ij and an edge jk, then there will be a path ik through j. The entries ii are the degrees of the nodes i.

What happens if we compute A3? Let’s hold it for now and see an example directed graph.

Here, again the entries in mat2 show the number of 2-length paths. The diagonal entries are 0’s unlike the case of undirected graphs where they show the degrees. Next, if we continue this process, the next set of entries show the number of 3-length paths. In the case of digraphs, this can be generalised. By repeated multiplications we can all paths up to lengths n – 1. If there are some non-diagonal entries that have not taken a value greater than 0 even once in the entire process, the graph is not strongly connected.

***

Now consider mat3 in either of the above cases, which is the matrix A3. The trace of this matrix shows an important structural property. The trace of a matrix is the sum of the diagonal entries. Trace = sum(aii). The trace of A3 has a relationship with the number of triangles in the graph.

In case of undirected graphs, Trace = 6 * no. of triangles in the graph = 6 * no of K3‘s
In case of directed graphs, Trace = 3 * no. of triangles in the graph

Below are two more examples to illustrate the above point.

***

We can also note that the above procedure can be used to find the diameter of graphs. We have to find the minimum number of times the adjacency matrix has to be multiplied by itself so that each entry has taken a value greater than 0 at least once. The maximum is, of course, n – 1. Now, the complexity of this procedure is O(n * n3). This is an order bigger than finding the diameter by first finding the all pairs shortest paths. However, in the average case, the former fares better. Also, if we can use methods of fast matrix multiplication, it further improves the complexity.

***

Are there more interesting properties of adjacency matrices? I think so. It would be a good exercise to explore.

## Coincidence. Or is it?

March 29, 2008

I attended an interesting talk by Prof. Persi Diaconis, Professor of Statistics and Mathematics at Stanford, called ‘On Coincidences’. It was a lecture targeted towards general scientific audience wherein he explained how simple statistical tools can be used to easily explain coincidences that a lot of people are all too willing to deem paranormal. You should read about Prof. Diaconis here. Read it to find out what a strikingly unconventional career path he has taken.

He started with a story recorded by Carl Jung, the psychoanalyst: on April Fish day in 1949, Jung makes note of a picture that was half human and half fish, then eats fish for lunch, meets a patient that talks about dreaming about a big fish, another shows him pictures of fish; months later when he was writing about these things, he found a dead fish.

Now the question is, is this really as weird or even surprising as Jung or anybody else thinks? Prof. Diaconis argued that it is not. By modeling day to day incidents using simple statistics, we can think about these occurrences in a quantitative way and make sense out of them. Let’s say, we can model these independently normal, but together weird looking incidents as generated by a Poisson process. By modeling the fish story in this way, assuming that we hear about ‘fish’ once a day, Prof. Diaconis found that, there is a 22% chance of the above story occurring, which renders the occurrence not very surprising.

The other important thing he discussed was how the simple Birthday problem can be used as a tool to quantify coincidences. In the birthday problem we have 365 categories (or events). Now, what is the sample size to find at least one pair of matching birthdays in a population, with a probability of 50%? It is given by the simple formula: N = 1.2 * sqrt(C), where C is the number of categories. For a 95% chance of a match, N = 2.5 * sqrt(C). By taking C = 365, we get 23 and 48 as the sample sizes, respectively.

In this paper, he explains the above and gives similar general formulas for other cases: when you have find a match in any of several categories, or finding a close match, and so on. It’s a very simple yet useful way of thinking.

***

This talk was yesterday. Incidentally, someone was giving a seminar in the lab here today on risks in nuclear plants. While talking about the Three Mile Island reactor accident, he mentioned that the failure started at 4:00 AM sharp on March 28th, 1979, on the day of their first anniversary. It was, in fact, the first anniversary down to the minute. There were a couple of gentle giggles until after a couple of seconds, one of the fellows (who had attended yesterday’s talk) said, “That is today!” The speaker (who hadn’t attended the talk) said, “Yeah, coincidence.” The prof (who had attended the talk) said, “Or is it?” People were all giggling about the good joke. All, but one. Because through his extrasensory perceptions he had come to terms with a greater truth that no one else understood; only I had noticed that the time then was exactly 4:00:00 AM!

OK, I made up the last bit about the time being 4 AM. The world still isn’t as crazy to have started talking about nuclear reactor failures at 4 AM. The talk started around 3:40 PM, so there is more than a good chance that the time at that moment was 4 PM. But then, we all like to make a good story better, don’t we?

## Graph invariants, “identifiers” and reconstruction

March 25, 2008

Graph invariants are properties associated with a graph that do not change across all possible isomorphisms of the graph. In other words, if a set of graphs are isomorphic, then they all have the same values for certain properties which are called invariants. Examples of graph invariants are the following:

• no of nodes
• no of edges
• the degree distribution (or degree sequence)
• vertex (edge) chromatic number: the minimum no. of colours needed to colour all vertices (edges) such that adjacent vertices (edges) do not have the same colour
• vertex (edge) covering number: the minimum no. of vertices (edges) needed to cover all edges (vertices)

As you can see, the invariants are numbers. A graph invariant describes a graph in terms of a simple number. Given a graph, it has a unique number of nodes, number of edges, degree sequence etc.. They find application in chemistry where large chemical compounds are indexed based on these numbers.

It should be noted that the incidence and adjacency matrices are not graph invariants. This is because, when isomorphism is computed, if for every pair of adjacent nodes in graph-1, you can find a pair of adjacent nodes in graph-2, and vice versa, the two graphs are isomorphic; labels are not important.

Also, for a given graph, there are unique invariants. However, the converse need not hold. For example, consider the invariant degree sequence. Shown below are two graphs that have the same degree sequence, but are non-isomorphic.

***

What I am interested in finding what can be called graph “identifiers”. Graph identifiers are properties or metrics that uniquely identify a graph. Examples of identifiers are adjacency and incidence matrices. Given an adjacency matrix, there is only one graph corresponding to it. Similarly, are there more identifiers? It need not be one property; it can be a combination of a set of properties like the no. of edges, degree sequence, diameter, centrality sequence and so on. What is perhaps also interesting is to measure the “uniqueness” of sets of properties. Uniqueness indicates in percentage terms “the extent” to which a property or a set of properties identify a graph. For instance, the adjacency matrix has a uniqueness value of 1. Again, this can be measured in terms of, say edit distances.

I am not sure there is much work in this direction. Do you think this is an interesting problem?

A consequence of this is graph reconstruction. If we have a set of properties that identify a graph, it is possible to construct an isomorphic graph using those properties. It can be used to recover networks that are subject to disruptions. (My usage of reconstruction is similar to that in literature, but slightly different.)

## Quiz question

March 21, 2008

Connect the following –

1.
2. Christian Goldbach, Emmanuel Kant, David Hilbert

## Of Gas Bills and Modulo Arithmetic

March 17, 2008

Alright, the title is meant to be catchy; there is no real connection there. Except that it is in the context of paying my apartment gas bill I found out about a simple application of modulo arithmetic in electronic transactions in the US. (I am not talking about cryptography. Nowhere near that.) While trying to pay the gas bill through an e-cheque, I found that I had to fill in something called a “routing number” before submitting the e-cheque. Having never heard of the term before, I searched a bit and found out that the number serves chiefly as an identifier of a bank (or other financial institutions) in electronic transactions. It is a 9 digit number which identifies the location of a bank and the bank. (I think a bank can have multiple routing numbers.)

The modulo arithmetic part comes in in generating and validating a routing number. To validate a routing number the following checksum is used:

[3(d1 + d4 + d7) + 7(d2 + d5 + d8) + 1(d3 + d6 + d9)] mod 10 = 0

I don’t know why this particular checksum. And I think, the the first 8 digits are known and the last digit is generated using the above formula. Not sure who does this.

[Source: wikipedia and official ABA policy (pdf link)]

***

Incidentally, Gauss was responsible for modulo arithmetic (or clock arithmetic). The book The Music of the Primes has a nice description of this and many other beautiful things that Gauss developed. On this note, I also recall a quote in the brilliant book called Concrete Mathematics. In that book, the authors have included what are called “marginal notes” — they are actually the notes/comments written by the first set of students who got to read the draft of the book. The quote is – “It seems a lot of stuff is attributed to Gauss; either he was really smart or he had a great press agent.” On the same page, there is another marginal comment – “Maybe he just had a magnetic personality.”