First Principles

June 22, 2008

A note on Hamiltonian circuits (and Uncle Paul)

Sometime back I read a short paper by Chvatal and Erdos that presents a couple of sufficiency conditions for hamiltonicity.

One of the results there is - If G is an s-connected graph with at least 3 vertices, and has a maximum independence number of s, then G has a hamiltonian circuit.

It is an interesting result connecting robustness and hamiltonicity. A graph is s-connected if there are s edge independent paths between any two nodes in the graph. In other words, it is the size of the minimum edge cut i.e. the minimum number of edges whose deletion increases the number of components in the graph.

The maximum independence number of a graph is the size of the biggest independent set. An independent set is a set of vertices of the graph such that no two vertices in the set are adjacent. In other words, there is no edge between any pair of vertices in the independent set. The bigger the independence number, the easier it is to fragment the network.

As you can see both these concepts can be used to measure the robustness of a graph. Also, it is not unnatural that connectivity and hamiltonicity are related. Intuitively, the more independent paths, the greater the chances of a graph being hamiltonian. And, a hamiltonian circuit is at least 2-connected: a circle is the simplest hamiltonian circuit and it is 2-connected.

***

Well, but the point of this post is something else really. The paper I referred to in the beginning is just a 3-page note. And it has an interesting footnote on the 1st page. The footnote says: This note was written in Professor Richard K. Guy’s car on the way from Pullman to Spokane, Wash. The authors wish to express their gratitude to Mrs. Guy for smooth driving.

May 20, 2008

Goodstein’s Theorem

Filed under: generic, maths — sanket @ 11:16 am
Tags: ,

In the previous post, I had talked about a special kind of sequence. Such sequences are called Goodstein’s sequences. And Goodsten’s theorem states that all Goodstein sequences converge to 0. Now, this is a rather counter-intuitive result given the way the series expands. Remember, in every iteration, you are increasing the base and the exponent of the terms in a sum by 1, whereas you are subtracting the whole sum by just 1. Surely, the number is increasing very rapidly notwithstanding the subtractions? For example, even a small first number, say 4, increases in this manner: 4, 26, 41, 60, 83, 109, 139, 173,…, 3319048, …, 93850207,…

How can this sum start reducing and converge to 0? One has to look more carefully to see what is happening. Let us start with a smaller number, say 3.

Iteration 1 (hereditary base-2):
(1) 3 = 2 + 1
(2) 3 + 1
(3) 3 + 1 - 1 = 3
Iteration 2 (hereditary base-3):
(1) 3 = 3
(2) 4
(3) 4 + 1 - 1 = 3
Iteration 3 (hereditary base-4)
(1) 3 = 3
(2) 3
(3) 3 - 1 = 2

See what’s happening? Firstly, note that in step(2) of iteration 1, the last term, which is 1, does not get increased, because the number is less than the current base. Say the base is n. Then increasing the base of numbers 1 through (n - 1), does not change them. For example, 2 is base 4 is 40 + 40. And 2 in base 5 is, 50 + 50. Correct? So, in this case, the last term is not increased and there is a subtraction following that.

What happens is, even when you are increasing the bases of the terms, the subtraction by 1 operation is slowly “eating” into the last term of the sum. This process may be arbitrarily slow, but eventually, it so happens that, the power of the last term in the sum “falls off”, and the last term falls into a lower base than the ongoing one, hence ceasing to increase. So then, the subtraction can now eat away the last term easily, before it starts attacking the next term (which will now be the last term).

So, you can now see that, that the sum does in fact start reducing, albeit after numerous steps, even for small numbers. In fact, you can only check this for the numbers 1, 2 and 3 by hand. You can probably write a program to check this for bigger numbers, though it might take extremely long.

***

Now, another important thing about Goodstein’s theorem is that it is one of Godel type theorems. The truth of Goodstein’s theorem cannot be proved using the axioms of first order arithmetic (Peano arithmetic). There is a proof for this, as well as a proof for Goodsteins’s theorem using techniques that are outside of Peano arithmetic.

May 16, 2008

An interesting sequence

Filed under: generic, maths — sanket @ 11:41 pm
Tags: ,

Mandar has an interesting set of posts on convergence and divergence of infinite series. It made me recall a fascinating result I had come across once. It is not about an infinite series, though. But it is a very interesting sequence.

The sequence needs some explanation of a notation called hereditary base-n notation. Let me explain this with an example. Let us write the number 26 in its hereditary base-2 notation. First we start by writing 26 as the sum of powers of 2.
26 = 24 + 23 + 21
Next, the powers are written as sums of powers of 2 as well.
So, 26 = 222 + 221 + 1 + 21

Similarly, the hereditary base-3 notation of 1000 is the following.

1000 = 36 + 34 + 33 + 1 = 33 + 3 + 33 + 1 + 33 + 1

Note that the bases and the powers cannot be bigger than n. Also, we can write the terms as a product of a base power n and a number smaller than n. For example - 26 can be written in hereditary base-3 notation as, 2.32 + 2.3 + 2.

Now, let us take a number, say 26, and do the following:

  1. Take the number. Start with base, n = 2.
  2. Express number in hereditary-n notation
  3. The next number is formed by changing all the n’s to n+1’s. That is, “increase” the base of the sequence by 1. [n = n + 1]
  4. Subtract 1 from the above number and goto 2. [number = number - 1]

Let’s take an example. Let’s start with 4.

4 = 22 (step 2)

33 (step 3)

33 - 1 = 26 (step 4) — Iteration 1

26 = 2.32 + 2.3 + 2 (step 2)

2.42 + 2.4 + 2 (step 3)

2.42 + 2.4 + 2 - 1 = 41 (step 4) — Iteration 2

41 = 2.42 + 2.4 + 1 (step 2)

2.52 + 2.5 + 1 (step 3)

2.52 + 2.5 + 1 - 1 = 60 (step 4) — Iteration 3

60 = 2.52 + 2.5 (step 2)

2.62 + 2.6 (step 3)

2.62 + 2.6 - 1 = 83 (step 4) — Iteration 4

83 = 2.62 + 6 + 5 (step 2)

2.72 + 7 + 5 (step 3)

2.72 + 7 + 5 - 1 = 109 (step 4) — Iteration 5

And so on. Now the question is, does this sequence converge (or terminate)? If it converges, what does it converge to? Or if you think it does not converge, explain why.

As always, people who already know this result may please defer commenting.

May 14, 2008

Seminar on Complex Networks

Filed under: complex networks, computer science, distributed systems — sanket @ 2:35 am
Tags:

A series of seminars on Complex Systems has been scheduled through the summer over here. I had been asked to give the first seminar. I gave a talk yesterday. I gave an overview of the philosophy, design and analyses of  complex networks. Among other things, I talked about how several graph theoretic measures can be used in both design and analyses of complex networks. There was a lot of lively discussion. The total seminar spanned about 2.5 hours, which probably means the seminar went off quite well. In case you are interested, here are the slides I had used.

April 21, 2008

Everybody stand back…

Filed under: generic — sanket @ 4:39 pm

… I have arrived!

After numerous false starts, I have finally learnt (just) enough Perl to do the essential Unix-like file processing operations like head, tail, grep etc. on Windows. I am in my “Windows usage phase”, and there is absolutely nothing you can do through the command line. (And Windows Vista sucks even otherwise, which means soon I will be back to my “Ubuntu phase”.) I do use cygwin, but Perl is fun.

April 13, 2008

Question

Filed under: computer science, generic — sanket @ 8:12 pm

I have a question that has been bothering me a bit more over the last few days than ever. What are the core computer science areas? What are the core skills that computer scientists have that researchers in other disciplines don’t have (or computer scientist have an edge over others in them)?

March 30, 2008

Some interesting properties of adjacency matrices

Filed under: graph theory — sanket @ 5:59 am

An adjacency matrix is a boolean square matrix that represents the adjacency relationships in a graph. Given a graph with n nodes, the adjacency matrix A nxn has entries aij = 1, if there if j is adjacent to i, and 0 otherwise (or if there is an edge from i to j). In the graph is undirected, an edge from i to j implies the existence of an edge from j to i. Generally, we talk about simple graphs with no self loops. So, aii is 0. We don’t allow multi-edges either. We can see that, the diagonal entries are all 0’s. Further, in case of an undirected graph, the adjacency matrix is symmetric; this need not be so for directed graphs.

Now, it is evident that the adjacency matrix A also represents all the paths of length 1. Each entry indicates whether there is a 1-length path between the corresponding nodes or not. It also tells us how many 1-length paths are there between the two nodes. (Of course, it is either 0 or 1.)

Interesting things happen when we multiply the adjacency matrix by itself. Let’s take some examples to see what happens.

We take the graph on the left and multiply its adjacency matrix by itself. The results are on the right. (Sorry about the bad formatting; could not figure out an easy way to align the figures properly.) The matrix ‘mat2′ is the matrix A2 . The entries aii show the number of 2-length paths between the nodes i and j. Why this happens is easy to see: if there is an edge ij and an edge jk, then there will be a path ik through j. The entries ii are the degrees of the nodes i.

What happens if we compute A3? Let’s hold it for now and see an example directed graph.

Here, again the entries in mat2 show the number of 2-length paths. The diagonal entries are 0’s unlike the case of undirected graphs where they show the degrees. Next, if we continue this process, the next set of entries show the number of 3-length paths. In the case of digraphs, this can be generalised. By repeated multiplications we can all paths up to lengths n - 1. If there are some non-diagonal entries that have not taken a value greater than 0 even once in the entire process, the graph is not strongly connected.

***

Now consider mat3 in either of the above cases, which is the matrix A3. The trace of this matrix shows an important structural property. The trace of a matrix is the sum of the diagonal entries. Trace = sum(aii). The trace of A3 has a relationship with the number of triangles in the graph.

In case of undirected graphs, Trace = 6 * no. of triangles in the graph = 6 * no of K3’s
In case of directed graphs, Trace = 3 * no. of triangles in the graph

Below are two more examples to illustrate the above point.

***

We can also note that the above procedure can be used to find the diameter of graphs. We have to find the minimum number of times the adjacency matrix has to be multiplied by itself so that each entry has taken a value greater than 0 at least once. The maximum is, of course, n - 1. Now, the complexity of this procedure is O(n * n3). This is an order bigger than finding the diameter by first finding the all pairs shortest paths. However, in the average case, the former fares better. Also, if we can use methods of fast matrix multiplication, it further improves the complexity.

***

Are there more interesting properties of adjacency matrices? I think so. It would be a good exercise to explore.

March 29, 2008

Coincidence. Or is it?

Filed under: probability and statistics — sanket @ 3:45 am
Tags:

I attended an interesting talk by Prof. Persi Diaconis, Professor of Statistics and Mathematics at Stanford, called ‘On Coincidences’. It was a lecture targeted towards general scientific audience wherein he explained how simple statistical tools can be used to easily explain coincidences that a lot of people are all too willing to deem paranormal. You should read about Prof. Diaconis here. Read it to find out what a strikingly unconventional career path he has taken.

He started with a story recorded by Carl Jung, the psychoanalyst: on April Fish day in 1949, Jung makes note of a picture that was half human and half fish, then eats fish for lunch, meets a patient that talks about dreaming about a big fish, another shows him pictures of fish; months later when he was writing about these things, he found a dead fish.

Now the question is, is this really as weird or even surprising as Jung or anybody else thinks? Prof. Diaconis argued that it is not. By modeling day to day incidents using simple statistics, we can think about these occurrences in a quantitative way and make sense out of them. Let’s say, we can model these independently normal, but together weird looking incidents as generated by a Poisson process. By modeling the fish story in this way, assuming that we hear about ‘fish’ once a day, Prof. Diaconis found that, there is a 22% chance of the above story occurring, which renders the occurrence not very surprising.

The other important thing he discussed was how the simple Birthday problem can be used as a tool to quantify coincidences. In the birthday problem we have 365 categories (or events). Now, what is the sample size to find at least one pair of matching birthdays in a population, with a probability of 50%? It is given by the simple formula: N = 1.2 * sqrt(C), where C is the number of categories. For a 95% chance of a match, N = 2.5 * sqrt(C). By taking C = 365, we get 23 and 48 as the sample sizes, respectively.

In this paper, he explains the above and gives similar general formulas for other cases: when you have find a match in any of several categories, or finding a close match, and so on. It’s a very simple yet useful way of thinking.

***

This talk was yesterday. Incidentally, someone was giving a seminar in the lab here today on risks in nuclear plants. While talking about the Three Mile Island reactor accident, he mentioned that the failure started at 4:00 AM sharp on March 28th, 1979, on the day of their first anniversary. It was, in fact, the first anniversary down to the minute. There were a couple of gentle giggles until after a couple of seconds, one of the fellows (who had attended yesterday’s talk) said, “That is today!” The speaker (who hadn’t attended the talk) said, “Yeah, coincidence.” The prof (who had attended the talk) said, “Or is it?” People were all giggling about the good joke. All, but one. Because through his extrasensory perceptions he had come to terms with a greater truth that no one else understood; only I had noticed that the time then was exactly 4:00:00 AM!

OK, I made up the last bit about the time being 4 AM. The world still isn’t as crazy to have started talking about nuclear reactor failures at 4 AM. The talk started around 3:40 PM, so there is more than a good chance that the time at that moment was 4 PM. But then, we all like to make a good story better, don’t we?

March 25, 2008

Graph invariants, “identifiers” and reconstruction

Filed under: distributed systems, graph theory — sanket @ 8:38 pm

Graph invariants are properties associated with a graph that do not change across all possible isomorphisms of the graph. In other words, if a set of graphs are isomorphic, then they all have the same values for certain properties which are called invariants. Examples of graph invariants are the following:

  • no of nodes
  • no of edges
  • the degree distribution (or degree sequence)
  • vertex (edge) chromatic number: the minimum no. of colours needed to colour all vertices (edges) such that adjacent vertices (edges) do not have the same colour
  • vertex (edge) covering number: the minimum no. of vertices (edges) needed to cover all edges (vertices)

As you can see, the invariants are numbers. A graph invariant describes a graph in terms of a simple number. Given a graph, it has a unique number of nodes, number of edges, degree sequence etc.. They find application in chemistry where large chemical compounds are indexed based on these numbers.

It should be noted that the incidence and adjacency matrices are not graph invariants. This is because, when isomorphism is computed, if for every pair of adjacent nodes in graph-1, you can find a pair of adjacent nodes in graph-2, and vice versa, the two graphs are isomorphic; labels are not important.

Also, for a given graph, there are unique invariants. However, the converse need not hold. For example, consider the invariant degree sequence. Shown below are two graphs that have the same degree sequence, but are non-isomorphic.

***

What I am interested in finding what can be called graph “identifiers”. Graph identifiers are properties or metrics that uniquely identify a graph. Examples of identifiers are adjacency and incidence matrices. Given an adjacency matrix, there is only one graph corresponding to it. Similarly, are there more identifiers? It need not be one property; it can be a combination of a set of properties like the no. of edges, degree sequence, diameter, centrality sequence and so on. What is perhaps also interesting is to measure the “uniqueness” of sets of properties. Uniqueness indicates in percentage terms “the extent” to which a property or a set of properties identify a graph. For instance, the adjacency matrix has a uniqueness value of 1. Again, this can be measured in terms of, say edit distances.

I am not sure there is much work in this direction. Do you think this is an interesting problem?

A consequence of this is graph reconstruction. If we have a set of properties that identify a graph, it is possible to construct an isomorphic graph using those properties. It can be used to recover networks that are subject to disruptions. (My usage of reconstruction is similar to that in literature, but slightly different.)

March 21, 2008

Quiz question

Filed under: generic — sanket @ 9:07 pm
Tags: ,

Connect the following -

  1.  
  2. Christian Goldbach, Emmanuel Kant, David Hilbert
Next Page »

Blog at WordPress.com.