Reboot and a paper

May 25, 2009

Over the last few days I received several spam comments on this blog (written using Cyrillic alphabet, incidentally). That made me visit this blog for the first time in several months, to turn off comments on old posts. Inadvertently, that also made me consider rebooting this. (Spam can be useful!) At the least, I can use this as a space where I put down comments about interesting papers I read. I might also write about some of the things I am working on.
***

Let me begin by discussing a paper of mine that just got accepted at SMC 2009. The title of the paper is ‘Classes of Optimal Network Topologies under Multiple Efficiency and Robustness Constraints’. In this paper, we have looked at the problem of designing optimal network topologies under efficiency, robustness and cost trade-offs. Efficiency, robustness and cost are defined in terms of structural properties: diameter, average path length, closeness centrality (efficiency); degree centrality, node betweenness, connectivity (robustness); number of edges (cost). A slider $\alpha$ is used to indicate the emphasis on robustness (on a scale of [0, 1]), and thus decides the trade-off between efficiency and robustness. Another parameter $\beta$ (in [0, 1]) decides how cheap or expensive edges are. For example, if $\beta$ is 0, we might make use of the full budget allocated, whereas if $\beta$ is 1, we want to “squeeze out” the most cost-effective topology by removing superfluous edges. Finally, there is also a constraint on the maximum permissible degree (indegree and outdegree, in case of digraphs) on a node in the resulting graph. Maximum degree can be thought of as a constraint on the “bookkeeping” (for example, size of DHT finger tables) a node has to do, or as constraints on maximum inflow and outflow through a node.

Different efficiency and robustness metrics are useful in different application contexts. For example, in case of traffic flow networks, congestion is a major challenge. Hence, node betweenness is a useful robustness metric, along with diameter as the efficiency metric which relates to upper bounds on the communication delay. In a Network Centric Warfare (NCW) scenario, it might be important to have alternate communication paths in case targeted attacks occur on nodes or links. Connectivity is a useful robustness metric there. And so on. Thus, what we do next is an ‘exploration of the parameter space’ involving efficiency and robustness metrics, $\alpha$, $\beta$, cost and maximum degree. We conduct genetic algorithm based experiments to evolve the “fittest” or the “most optimal” topologies under given trade-offs.

The main results in the paper are the following:

  • Two prominent classes of topologies that emerge as optimal: (1) star-like or scale-free networks with small diameters, low cost, high resilience to random failures and very low resilience to targeted attacks (2) Circular skip lists (CSL) which are highly resilient in the face of both random failures and targeted attacks, have small diameters under moderate costs.
  • Further, we observe that star-like networks are optimal only when both the following design requirements are satisfied: (1) very low emphasis on robustness (or very high emphasis on efficiency) and (2) severe restrictions on cost. In all other cases, CSL are optimal topologies in terms of balancing efficiency, robustness and cost.
    Thus, we observe a sharp transition from star-like networks to CSL as emphasis on robustness ($\alpha$) increases or cost restrictions become less severe.
  • Circular skip lists are highly homogeneous wrt many structural properties such as degree, betweenness, closeness, pairwise connectivity, pairwise path lengths. Further, they have structural motifs such as Hamiltonicity, (near) optimal connectivity, low diameter, which are optimal under varied application requirements. We argue that CSLs are potential underpinnings for optimal network design in terms of balancing efficiency, robustness and cost.
Advertisements

Graph Powers

August 7, 2008

Earlier I had talked about multiplying the adjacency matrix of a graph by itself and how it leads to some interesting results. Multiplying the adjacency matrix by itself essentially corresponds to raising the graph by some integer exponent while maintaining the same number of nodes.

The powers of graphs are interesting structures. The p^th power, G^p of a graph G is obtained by adding edges between nodes in G that are separated by a pathlength (greater than 1 and) less than or equal to p. Specifically, the square of a graph is the graph resulting from joining all nodes that are separated by a pathlength 2. A cube of a graph is obtained by short-circuiting all 2 length and 3 length paths in the graph.

An important consequence of raising the graph to a power is diameter reduction. If G has diameter d, then graph G^p has diameter \ceil{d/p}. There are several other interesting properties of graph powers that we can observe, which are important in the design of optimal topologies. I’d be interested in knowing your ideas.


A note on Hamiltonian circuits (and Uncle Paul)

June 22, 2008

Sometime back I read a short paper by Chvatal and Erdos that presents a couple of sufficiency conditions for hamiltonicity.

One of the results there is – If G is an s-connected graph with at least 3 vertices, and has a maximum independence number of s, then G has a hamiltonian circuit.

It is an interesting result connecting robustness and hamiltonicity. A graph is s-connected if there are s edge independent paths between any two nodes in the graph. In other words, it is the size of the minimum edge cut i.e. the minimum number of edges whose deletion increases the number of components in the graph.

The maximum independence number of a graph is the size of the biggest independent set. An independent set is a set of vertices of the graph such that no two vertices in the set are adjacent. In other words, there is no edge between any pair of vertices in the independent set. The bigger the independence number, the easier it is to fragment the network.

As you can see both these concepts can be used to measure the robustness of a graph. Also, it is not unnatural that connectivity and hamiltonicity are related. Intuitively, the more independent paths, the greater the chances of a graph being hamiltonian. And, a hamiltonian circuit is at least 2-connected: a circle is the simplest hamiltonian circuit and it is 2-connected.

***

Well, but the point of this post is something else really. The paper I referred to in the beginning is just a 3-page note. And it has an interesting footnote on the 1st page. The footnote says: This note was written in Professor Richard K. Guy’s car on the way from Pullman to Spokane, Wash. The authors wish to express their gratitude to Mrs. Guy for smooth driving.


Some interesting properties of adjacency matrices

March 30, 2008

An adjacency matrix is a boolean square matrix that represents the adjacency relationships in a graph. Given a graph with n nodes, the adjacency matrix A nxn has entries aij = 1, if there if j is adjacent to i, and 0 otherwise (or if there is an edge from i to j). In the graph is undirected, an edge from i to j implies the existence of an edge from j to i. Generally, we talk about simple graphs with no self loops. So, aii is 0. We don’t allow multi-edges either. We can see that, the diagonal entries are all 0’s. Further, in case of an undirected graph, the adjacency matrix is symmetric; this need not be so for directed graphs.

Now, it is evident that the adjacency matrix A also represents all the paths of length 1. Each entry indicates whether there is a 1-length path between the corresponding nodes or not. It also tells us how many 1-length paths are there between the two nodes. (Of course, it is either 0 or 1.)

Interesting things happen when we multiply the adjacency matrix by itself. Let’s take some examples to see what happens.

We take the graph on the left and multiply its adjacency matrix by itself. The results are on the right. (Sorry about the bad formatting; could not figure out an easy way to align the figures properly.) The matrix ‘mat2’ is the matrix A2 . The entries aii show the number of 2-length paths between the nodes i and j. Why this happens is easy to see: if there is an edge ij and an edge jk, then there will be a path ik through j. The entries ii are the degrees of the nodes i.

What happens if we compute A3? Let’s hold it for now and see an example directed graph.

Here, again the entries in mat2 show the number of 2-length paths. The diagonal entries are 0’s unlike the case of undirected graphs where they show the degrees. Next, if we continue this process, the next set of entries show the number of 3-length paths. In the case of digraphs, this can be generalised. By repeated multiplications we can all paths up to lengths n – 1. If there are some non-diagonal entries that have not taken a value greater than 0 even once in the entire process, the graph is not strongly connected.

***

Now consider mat3 in either of the above cases, which is the matrix A3. The trace of this matrix shows an important structural property. The trace of a matrix is the sum of the diagonal entries. Trace = sum(aii). The trace of A3 has a relationship with the number of triangles in the graph.

In case of undirected graphs, Trace = 6 * no. of triangles in the graph = 6 * no of K3‘s
In case of directed graphs, Trace = 3 * no. of triangles in the graph

Below are two more examples to illustrate the above point.

***

We can also note that the above procedure can be used to find the diameter of graphs. We have to find the minimum number of times the adjacency matrix has to be multiplied by itself so that each entry has taken a value greater than 0 at least once. The maximum is, of course, n – 1. Now, the complexity of this procedure is O(n * n3). This is an order bigger than finding the diameter by first finding the all pairs shortest paths. However, in the average case, the former fares better. Also, if we can use methods of fast matrix multiplication, it further improves the complexity.

***

Are there more interesting properties of adjacency matrices? I think so. It would be a good exercise to explore.


Graph invariants, “identifiers” and reconstruction

March 25, 2008

Graph invariants are properties associated with a graph that do not change across all possible isomorphisms of the graph. In other words, if a set of graphs are isomorphic, then they all have the same values for certain properties which are called invariants. Examples of graph invariants are the following:

  • no of nodes
  • no of edges
  • the degree distribution (or degree sequence)
  • vertex (edge) chromatic number: the minimum no. of colours needed to colour all vertices (edges) such that adjacent vertices (edges) do not have the same colour
  • vertex (edge) covering number: the minimum no. of vertices (edges) needed to cover all edges (vertices)

As you can see, the invariants are numbers. A graph invariant describes a graph in terms of a simple number. Given a graph, it has a unique number of nodes, number of edges, degree sequence etc.. They find application in chemistry where large chemical compounds are indexed based on these numbers.

It should be noted that the incidence and adjacency matrices are not graph invariants. This is because, when isomorphism is computed, if for every pair of adjacent nodes in graph-1, you can find a pair of adjacent nodes in graph-2, and vice versa, the two graphs are isomorphic; labels are not important.

Also, for a given graph, there are unique invariants. However, the converse need not hold. For example, consider the invariant degree sequence. Shown below are two graphs that have the same degree sequence, but are non-isomorphic.

***

What I am interested in finding what can be called graph “identifiers”. Graph identifiers are properties or metrics that uniquely identify a graph. Examples of identifiers are adjacency and incidence matrices. Given an adjacency matrix, there is only one graph corresponding to it. Similarly, are there more identifiers? It need not be one property; it can be a combination of a set of properties like the no. of edges, degree sequence, diameter, centrality sequence and so on. What is perhaps also interesting is to measure the “uniqueness” of sets of properties. Uniqueness indicates in percentage terms “the extent” to which a property or a set of properties identify a graph. For instance, the adjacency matrix has a uniqueness value of 1. Again, this can be measured in terms of, say edit distances.

I am not sure there is much work in this direction. Do you think this is an interesting problem?

A consequence of this is graph reconstruction. If we have a set of properties that identify a graph, it is possible to construct an isomorphic graph using those properties. It can be used to recover networks that are subject to disruptions. (My usage of reconstruction is similar to that in literature, but slightly different.)


Graph Toughness

March 17, 2008

A small quiz – do you know what toughness of a graph is? Hint: what does it mean to say a circle topology is 1-tough?

[If you don’t know it already, just guess; should be simple to guess. If you know it already, don’t answer for now. Don’t point to the wikipedia (or any other) page in any case :p]


More on Regularity Preservation

March 16, 2008

As sri rightly pointed out in the comments section of the previous post, clique insertion is just one way to extend an r-regular graph into a bigger r-regular graph. We should be able to extend the graph by inserting any (r – 2)-regular graph (provided the insertion is possible — refer sri’s comment). In fact, clique insertion is the minimal case: the smallest connected (r-2)-regular graph has r -1 nodes.

There are interesting questions here. Can we insert any (r – 2)-regular graph? What is the procedure to do so? Are there some (r – 2)-regular graphs that cannot be inserted?

Let’s use some notations for ease of reference. Let Ro be the r-regular graph we start with, Ri be the (r – 2)-regular graph we want to insert and Rn be the graph resultant of the insertion.

A simple but important thing to be noted here is that, pairs of nodes that are adjacent in Ri, cannot be inserted in a single edge of Ro because that will result in an overlapping edge (we don’t allow multiedges between nodes); adjacent nodes in Ri have to be inserted in different edges. Even for small values of r, this becomes pretty complicated to visualize. I tried to insertions for small graphs, and it was possible to do. But I don’t know if we can always avoid creating a multigraph.

But then in the above discussion we have assumed Ri to be a connected graph. What if Ri is disconnected? If Ri is disconnected we can easily prove that there is no bound on the size of Ri that can be inserted. The simplest way to show this is by saying that our Ri has an arbitrarily large number of (r – 2)-cliques as disconnected components (disconnected edges, disconnected cycles, disconnected cubic graphs etc.). Now, you can either imagine this as adding an arbitrary number of cliques (like in the previous post) or as adding one large disconnected graph.

Of course, cliques are just one way to insert. We can simply insert nodes and add edges between the new nodes in such a way that all the new nodes have a degree of r.

Note that the resulting graph Rn is, of course, a connected graph.

***

Now, since we are talking about p2p networks, what is more important is the minimum number of nodes and edges that we need to add when nodes join or leave. Or more generally, the minimum change in topology that is required. When a new node comes in, let’s say we need to accommodate it without disrupting symmetry; so we may need to add dummy nodes and edges. The thing is – in a p2p network, most of the time, unrelated nodes join or leave unrelated parts of the network. So, essentially, we are dealing with disconnected components.