The excellent performance reported by BBMCSP—an exact maximum clique algorithm tailored for massive real networks— in a previous post has raised a number of comments, some even questioning either the report itself or the problem’s complexity. This post gives an insight on how BBMCSP works. In the process, and similar to what happens when magicians explain their tricks, we are aware that some of the magic will be lost.
In a previous post we introduced the maximum clique problem (MCP) and reported performance of the efficient bit-parallel exact solver BBMC, implemented with BITSCAN and GRAPH libraries available in the Biicode repositories pablodev/bitscan and pablodev/graph respectively. BBMC is concerned with small and middle size graphs.
Also very challenging are real networks that arise from different fields such as social networks, infrastructure networks, scientific networks and so on. These graphs tend to have many thousands—even millions— of nodes but are usually very sparse —the Network Data Repository hosts more than 500 of such graphs—. An open problem is finding the clique number (the cardinality of a maximum clique) of such networks to get an insight into their structure. Unfortunately the problem is NP-hard and no polynomial algorithm is expected to be found; tailoring and exact solver for this task is as yet an open question .
The maximum clique problem (MCP) consists in finding in a graph a clique—i.e., a complete subgraph, one such all vertices are pairwise adjacent— of maximum cardinality. The term *clique* has its roots in the social sciences; in a social network, a clique refers to a group of individuals who all know—or are linked together by some relation such as friendship— each other. The first algorithm for the MCP can be traced back to Harary & Ross in the late 50s. The first modern enumerative algorithm is most probably the one from Bron & Kerbosch in the early 70s.
The vertex coloring problem (VCP) is an NP-hard classical problem in graph theory which can be traced back to a letter written to W.R. Hamilton by A. de Morgan in 1852 in which the famous Four Color Theorem has its roots. Besides its obvious theoretical relevance, it has found practical applications connected to scheduling and allocation of resources (i.e. memory for different processes, frequencies for WLANs etc.).
A (proper) vertex coloring of a simple undirected graph G=(V, E) is an assignment of color numbers to all vertices such that pairwise adjacent vertices have different colors. The size of the coloring is the number of different colors employed. The chromatic number of a graph χ(G) is the minimum number of colors required to color G, i.e. the size of its optimum coloring. The VCP can be formulated as finding a minimum coloring for a given graph.
Compared with other related graph optimization problems such as the maximum clique problem (i.e. finding the largest possible subgraph in a given graph), VCP is considerably more challenging; for example it is possible to compute a maximum clique exactly in massive sparse graphs with millions of vertices, whereas fast exact coloring of a random graph with 80 vertices and 0.5 edge density already requires efficient algorithms and a powerful CPU.
A comprehensive view of BITSCAN
BITSCAN is a C++ library dedicated to efficient processing of bit strings. In programming, a bit string is a data structure that stores collections of bits (ones and zeros). It gets interesting when these bits have semantics, i.e. refer to a Boolean property of a group, so that each element is identified by a bit in the chain. In previous posts I have repeatedly stated that BITSCAN “is a useful library to manage bit strings”, and that was that. Readers of our blog have questioned the usefulness of bit strings in practice so I will start this post explaining some common situations in which bit strings may be an alternative to more common data structures.
Sparse bitsets in C++
Sparsity when referring to systems indicates that they are loosely coupled. Thus, a sparse matrix is a matrix in which most of its elements are zero, a sparse graph has very few adjacent vertices (its adjacency matrix is also sparse) etc. Opposite to sparsity is density, and dense systems are those that are strongly coupled.
In a recent post we proposed BITSCAN a recent C++ library to manipulate bit strings. A comparative survey with other state of the art implementations (such as bitset (STL), or dynamic_biset(BOOST)) may be found here. This post brifely describes how BITSCAN operates with sparse bitsets.
What is k-cores analysis?
A major concern of social network analysis is to determine subgroups of actors which cooperate together within a network, in other words from cohesive subsets. To this avail a number of notions were introduced in the past such as cliques, k-plexes, lambda sets, k-cores etc. Most of these notions are difficult to compute (non linear and many in NP), but computing cores is the exception because linear algorithms are known to exist1 .This makes k-cores analysis an important source of information for real networks with hundreds of thousands of nodes, such as road graphs, internet graphs etc.
Today, by the hand of Pablo San Segundo, we present graph and ugraph, two simple, easy-to-use, C++ wrappers for unweighted graphs encoded as bit strings.
One of the most interesting and versatile applications of bit strings is to encode simple unweighted graphs dynamic in the number of edges (i.e. there is an efficient way to add/remove edges but not so for the vertices). Graphs encoded as bit strings have recently attracted the attention of researchers in relation to well known NP-hard problems such as vertex coloring or clique. The main reason is that efficient algorithms that exploit bit-parallelism at CPU level have been found for such problems.
This is is a small explanation about BITSCAN, the C++ library exclusively developed by Pablo San Segundo.
BITSCAN is dedicated to the efficient processing of bit strings. In programming, a bit string is a data structure that stores collections of 1s and 0s. It gets interesting when those 1s and 0s refer to a Boolean property of a group, so that each element of this group is identified by a bit of that chain.
Time to move your pawns forward
A good example of this is the game of chess. In a chessboard each position is formed by six different types of pieces with two possible colours. A string of 64 bits can encode the position of all pieces of the same type and color on the board (e.g. “white pawns”) by referring each bit to a square with the semantics of a value to 1 (TRUE) if the square is occupied by a piece of the chosen type and 0 (FASLE) otherwise. Continue reading