In a previous post we introduced the maximum clique problem (MCP) and reported performance of the efficient bit-parallel exact solver BBMC, implemented with BITSCAN and GRAPH libraries available in the Biicode repositories pablodev/bitscan and pablodev/graph respectively. BBMC is concerned with small and middle size graphs.
Also very challenging are real networks that arise from different fields such as social networks, infrastructure networks, scientific networks and so on. These graphs tend to have many thousands—even millions— of nodes but are usually very sparse —the Network Data Repository hosts more than 500 of such graphs—. An open problem is finding the clique number (the cardinality of a maximum clique) of such networks to get an insight into their structure. Unfortunately the problem is NP-hard and no polynomial algorithm is expected to be found; tailoring and exact solver for this task is as yet an open question .
The maximum clique problem (MCP) consists in finding in a graph a clique—i.e., a complete subgraph, one such all vertices are pairwise adjacent— of maximum cardinality. The term *clique* has its roots in the social sciences; in a social network, a clique refers to a group of individuals who all know—or are linked together by some relation such as friendship— each other. The first algorithm for the MCP can be traced back to Harary & Ross in the late 50s. The first modern enumerative algorithm is most probably the one from Bron & Kerbosch in the early 70s.
Figure 1. A retweet network
The vertex coloring problem (VCP) is an NP-hard classical problem in graph theory which can be traced back to a letter written to W.R. Hamilton by A. de Morgan in 1852 in which the famous Four Color Theorem has its roots. Besides its obvious theoretical relevance, it has found practical applications connected to scheduling and allocation of resources (i.e. memory for different processes, frequencies for WLANs etc.).
A (proper) vertex coloring of a simple undirected graph G=(V, E) is an assignment of color numbers to all vertices such that pairwise adjacent vertices have different colors. The size of the coloring is the number of different colors employed. The chromatic number of a graph χ(G) is the minimum number of colors required to color G, i.e. the size of its optimum coloring. The VCP can be formulated as finding a minimum coloring for a given graph.
Compared with other related graph optimization problems such as the maximum clique problem (i.e. finding the largest possible subgraph in a given graph), VCP is considerably more challenging; for example it is possible to compute a maximum clique exactly in massive sparse graphs with millions of vertices, whereas fast exact coloring of a random graph with 80 vertices and 0.5 edge density already requires efficient algorithms and a powerful CPU.
A comprehensive view of BITSCAN
BITSCAN is a C++ library dedicated to efficient processing of bit strings. In programming, a bit string is a data structure that stores collections of bits (ones and zeros). It gets interesting when these bits have semantics, i.e. refer to a Boolean property of a group, so that each element is identified by a bit in the chain. In previous posts I have repeatedly stated that BITSCAN “is a useful library to manage bit strings”, and that was that. Readers of our blog have questioned the usefulness of bit strings in practice so I will start this post explaining some common situations in which bit strings may be an alternative to more common data structures.
Sparse bitsets in C++
Sparsity when referring to systems indicates that they are loosely coupled. Thus, a sparse matrix is a matrix in which most of its elements are zero, a sparse graph has very few adjacent vertices (its adjacency matrix is also sparse) etc. Opposite to sparsity is density, and dense systems are those that are strongly coupled.
In a recent post we proposed BITSCAN a recent C++ library to manipulate bit strings. A comparative survey with other state of the art implementations (such as bitset (STL), or dynamic_biset(BOOST)) may be found here. This post brifely describes how BITSCAN operates with sparse bitsets.
What is k-cores analysis?
A major concern of social network analysis is to determine subgroups of actors which cooperate together within a network, in other words from cohesive subsets. To this avail a number of notions were introduced in the past such as cliques, k-plexes, lambda sets, k-cores etc. Most of these notions are difficult to compute (non linear and many in NP), but computing cores is the exception because linear algorithms are known to exist1 .This makes k-cores analysis an important source of information for real networks with hundreds of thousands of nodes, such as road graphs, internet graphs etc.
k-core decomposition (0-core, 1-core, 2-core, 3-core)