Given a graph G = (V, E) consists of a set of vertices V and a set of edges
E⊆V(2), graph layout p:V→
is a function that assigns to each vertex a position in the d-dimensional space
where d is the number of dimensions and
is the set of real numbers. For graph drawing, nodes are usually represented
as disks or boxes while the edges are always represented as straight lines or
Graph layout is an assistant tool for exploratory observing the data inherent
characteristics in many fields, such as social network (Ghoniem
et al., 2005), sensor nets (Wei et al.,
2010a, b; Wei and Zhou, 2012)
web macrostructure, bioinformatics (Battista et al.,
1994), etc. Additionally, it also can serve as a knowledge map to support
Information visualization. There is not absolute criterion to measure the quality
of layout. It is expected to reveal clusters in web macrostructure mining, while
for navigation it is more important to prevent overlapping of nodes. The proposed
algorithm will hold three properties:
||Nodes are non-overlapping such that the contents have unobstructed
view for user
||Can reveal the clustered structure and render them using different
||Can highlight center vertices
There have been many graph layout algorithms available and the most popular
are energy-based algorithms. These algorithms treat graph layout as energy system
and computes positions through energy minimization. The vertex-repulsion LinLog
energy model (Noack, 2004) was the first energy model
to compute clustering layouts for graphs with uniform edge degree, and the edge-repulsion
LinLog model can tackle this limitation. Hierarchical clustering layout algorithm
(Clemencon et al., 2011) reveals cluster structure
through hierarchical maximal modularity clustering.
Motivated by modular decomposition-based layout algorithm (MDLayout) (Charis
and Constantinos, 2006), an improved modular decomposition-based layout
algorithm (IMDLayout) is designed. It first decomposes the graph into the modular
decomposition tree (MD-tree) and then computes the positions hieratically according
the tree. IMDLayout has more artistic sense and control ability for different
applications. In IMDLayout P-nodes and S-nodes are drawn using modified circle
layout algorithm which has lower complexity and more artistic than the grid.
Fatherly, P- and S-node layout are optimized when it has only one adjacent to
reflect core role of the adjacency node and improve aesthetic quality. One parameter
λ is introduced for extending traditional center distance to prevent overlapping.
A module is a set of vertices that have the same neighborhood A module does
not overlap any other is called strong module. In the following, the model is
considered with respect to the strong module if not specify particular. The
family of modules can be represented by a tree which is called modular decomposition
tree (MD-tree) (Gallai, 1967) for the graph. The nodes
are divided into three types:
||Parallel node: If the graph is disconnected the root
is called parallel node
||Serial node: If the graphs complement is disconnected,
the root is called serial node
||Prime node: In all other cases the root is called prime
As one example, One graph G and its MD-tree T(G) are displayed in Fig.
1. Modular decomposition is very important for graph theory. The first Modular
decomposition algorithm is due to Cowan, James and Stanton (Battista
et al., 1999) and runs in O(n4). One cubic time algorithm
is proposed by Habib and Maurer (1979). The first linear
time algorithms appeared in 1994 (Cournier and Habib, 1994).
Since then a series of simplified algorithms has been published. Some of them
running in linear time, some others in almost linear time (Dahlhaus
et al., 2001; McConnell and Spinrad, 2000)
. In this study, the linear time algorithm proposed by Tedder
et al. (2008) was adopted. Graph layout based on MD-tree has the
following structure (Charis and Constantinos, 2006):
||Compute the MD tree T(G)of the input graph G using one of
||In a bottom-up fashion, traverse the MD-tree T(G) and calculate
the relative drawing for every internal node tεT(G)
||Set the position of the root and traverse the MD-tree T(G)
from top to bottom. For every internal node t, the displacement dis(t) is
added to its children tiεch(t) and obtains the final position
THE PROPOSED IMDLAYOUT ALGORITHM
To make the node non-overlapping and the content tag on it has clear view for
user, the radius and the contents length of node are both considered in
IMDLayout algorithm. As shown in Fig. 2, b(t) = (w(t), h(t))ε2
is the size of the box of t, where, w(t) depends on the content s length
while h(t) = 2*r(t). The edges are drawn using straight-line while its width
is proportional to the closeness degree among vertices.
||Graph and its modular decomposition tree (a) Graph (b) MD
||Description of node and edge (a) Charis Papadopoulos, etc.
(b) the proposed algorithm
The nodes in cluster are rendered using the same color. p(t) =(x(t), y(t))ε
represents the coordinates of. If one vertice is focused, it is surrounded by
three orange circles with increasing radius.
The main algorithm: The MD-tree T(G) are first obtained by one of the
known linear-time algorithms (Tedder et al., 2008).
Then in bottom-up fashion, the relative drawing (b(t), r(t), dis(ti),pos(ti))
is computed for every internal node tεT(G) where, b(t) is box size of node
t, r(t) is the radius of node t, dis(ti) dis(t) is displacement from
tiεchild(t) to t and pos(ti) is relative position
of tiεchild(t). The position p(t), displacement dis(t), box
size b(t) and radius r(t) are first initialized to zero for each internal node
tεT(G). For every leaf nodes, the center position p(t), displacement disp(t)
is initialized to zero while the box size b(t) and radius r(t) are computed
by predefined size since they are the vertices of G.
In bottom-up fashion, the relative coordinates p(ti) for every node tiεchild(t) are computed taking into account the dimensions of the box b(ti). Then compute the boundaries to decide box size b(t) and position p(t) in proportion to the frame boundaries of the relative drawing. The relative displacements disp(ti) from ti to the parent can be obtained. For P-, S-node, circle layout is applied while for N-node energy model is adopted, respectively.
In up-bottom fashion, root node tεL0 position is first set
as b(t)/2, The MD-tree T(G) is traversed in a top-down fashion and for every
internal node tεT(G), the displacement dis(ti) is added to the
position of tεT(G). In this way, all the vertices of G(M(t)) obtain the
right coordinates relative to the center of their ancestor node t. The IMDLayout
algorithm can be summarized as following:
|| IMDLayout Algorithm
The combine and divide: The Fig. 3 and 4 is the layout result of before and after combine. The Q node is surrounded by children of P- or S-node and it is obvious that Q is their center. Moreover, the adjusted layout has more aesthetic feeling than that of layout before adjustment. This adjustment is implemented by two operations: combine and divide. In order to improve efficiency, the adjustment operation is embedded into circle layout algorithm and prime layout, i.e., before circle layout and prime layout, these node pairs that need combine are firstly found and combined them. After layout, they will be divided again and compute the layout for them respectively. Obviously, the first work is to find which nodes should be combined.
||Appearance of P node combined or not (a) Charis Papadopoulos,
etc. (b) the proposed algorithm
|| Appearance of S node combined or not
As shown in Fig. 5, The parent may be S-, P- and N-node. Since there are no edges among children of P-node, P cant adjoin with Q, so they dont need to combine. When the parent is S-node, any two different nodes are all connected and thus the case need to combine only occurs when the sum of nodes in the serial subgraph is two. If the parent is N-node, P is checked whether has only one adjacent. From the above analysis, the combining condition can be summarized as: 1) the parent is S-node and the number of children is two; or 2) the parent is N-node and S-node or P-node exist who have only adjacent.
Combine operation mainly includes: Firstly, combine the children of P and the node Q to produce a new node S. In order to divide them in the divide step, the P, Q and P, S are stored into a map. Secondly, like circle algorithm, children of P are set around the Q node. The difference is that the radius cant be smaller than Q-node;s radius, i.e., if the obtained radius is smaller than radius sum, then it should be extended. Finally, compute the displacement and the box size for P and his children. Because the children of P are often outsiders of S, the position of P is same to S node. Therefore, the displacement from P to S is zero and the b(p) is same to b(s). The displacement of the children is pos(ti). For Q node, the position is (0, 0) and therefore its displacement is dis(ti). Its box doesnt change. The P and Q are removed from children set of P-node and the new node S was inserted. The combine operation can be summarized as followings.
|| Six cases in modular decomposition tree
After get the processed children set, graph layout was carried according to
their type through the corresponding algorithm. The displacement of P is dis(ti).
The divide operation can be summarized as follows:
Circle layout for P- and S-node: Circular drawing satisfies the aesthetic
criterion of symmetry and is the usual way of representing complete graphs and
edgeless graph. The vertices are placed in an arbitrary order on equal arcs,
on the circumference of a cycle centered at (0, 0). For each node tiεchild,
two radiuses ppre and paft that define two concentric
circles centered at (0, 0) are computed. In the circle defined by ppre,
vertex ti and its previous ti-1 are placed on a α
degrees arc, so that the minimum distance from their boundaries, is k. The radius
paft defines a circle where ti and ti+1are
placed in the same way. To prevent overlaps, p(ti) is set to be the
maximum of these two radiuses. In the special case where the box b(ti)
of ti has the same dimensions with any of the two adjacent vertices
in the circle, is set to be the radius defined by the equal sized box adjacent
vertex. Finally, every vertex ti are drew on the circle defined by
radius p(ti), with angle α. Obviously, for a complete graph
with uniform nodes the drawing is a perfect circle. The formal description of
circle layout is given in details as.
Modified energy based layout for N-node: Energy model are popular for creating straight-line drawings of general undirected graphs. For a given layout p and two vertices u and v, the term ||pu-pv|| denotes the Euclidean distance of the two vertices. Generic energy model including three parameters to strengthen its adaptive capacity for different applications :
where, α is exponent to the distance in the attraction term; r applies value r as exponent to the distance in the repulsion term if r≠0 and applies the logarithm to the distance in the repulsion term if r = 0; e eliminates the edge-weight factor in the repulsion term by e setting to value 0. When α = 1, r = 0 and e = 1, i.e., the energy model is the weighted edge-repulsion energy model. The (3,0,0)-energy model has been proposed by Fruchterman Reingold, (1,0,0)-energy is known as Vertex-repulsion LinLog and (1,0,1)-energy is known as Edge-repulsion LinLog.
In the experiments on generic model, overlapping phenomenon always occurs especially
when the radius of nodes are very nonuniform. One of the important reasons lies
in that center distance without regulating the impact of radius is applied.
In some literature, center distance is replaced with boundary distance to prevent
node overlapping. In the implements, cotree structure are introduced to accelerate
computation and dont use the distance between nodes directly, so boundary
distance isnt improper here. To address this problem, a parameter λ
is introduced to regulate the center distance, i.e. regulated distance d' =
(d/λ). Thus the model can be modified as:
The strengths of the forces are often chosen to be proportional to some power of the distance. Formally, for a layout p and two vertices u, v with u≠v, the attractive force exerted on u by v is:
and the repulsive force exerted on u by v is:
where, ||pu-pv|| is the distance between u and v,
is the unit-length vector pointing from u to v and a and r are real constants
The condition a>r ensures that the attractive force between connected vertices grows faster than the repulsive force. For most practical force models holds a≥0 and r≤0, i.e., the attractive force is non-decreasing and the repulsive is non-increasing with growing distance.
As shown in Fig. 6, the system will stop at equilibrium point (d'0) after iterative learning. Because d' = (d/λ), the center distance between two nodes is d0 = λ d'0. Through adjusting λ, proper distance that prevents overlapping can be obtained. When λ>1, the system will reach equilibrium status in advance and prevent overlapping.
For energy minimization, the hierarchical energy minimization algorithm of
Barnes and Hut (Barnes and Hut 1986) is introduced
to graph drawing. Its runtime is in O(|E|+|v|log|v|) per iteration. The overall
runtime grows somewhat faster because the number of iterations needed for convergence
tends to grow with. Some other efficient minimization algorithms are not expected
to find good energy minima for clustering energy models like LinLog and for
graphs with small diameter (Hachul and Junger, 2005).
|| Force equilibrium
Cluster structure in graph: A clusteringof a graph partitions the vertex set into disjoint subsets called clusters and thereby maps each vertex v to a cluster c(v). Proposals of quality measures for clustering are numerous and scattered over the literature of diverse research fields. One of the most widely used was introduced by Newman and Girvan and is called modularity. The modularity of a clustering is:
where, V is the set of all vertices in the graph and c(V) is the set of clusters,
w(c,c) is the total edge weight within the cluster c and w(c)
is the total weight of the vertices in c. Intuitively, the first term of the
modularity measure is the actual fraction of intra-cluster edge weight. In itself,
it is not a good measure of clustering quality, because it takes the maximum
value for the trivial clustering where one cluster contains all vertices. This
is corrected by subtracting the second term, which specifies the expected fraction
of intra-cluster edge weight in a network with uniform density.
MDLayout, IMDLayout and Linloglayout are tested on Karate, Hitech and DBLP dataset. Hitech data describes the 147 friends= relationship data between 33 employees. Karate data describes the multiple links between karate club members (for example, relations between the university students to participate in the same stage game, etc.), the data set includes 34 members and 156 relational data. DBLP indexes more than 1.8 million articles and contains many coauthor sip relations of computer scientists. It is impossible visual all of them and 3 cases are considered: one author with his coauthors, a research community and multi research community.
The results on Karate are shown in Fig. 7. The optimized parameter values for IMDLayout are (λ = 0.55, a = -0.25, r = 3.0, e = 0.15), (λ = 0.6, a = -0.25, r = 3.0, e = 0.15) for MDLayout and (λ = 1, a = -0.25, r = 3.0, e = 0.15) for LinlogLayout. It can be found that the three algorithms can reveal clusters and prevent overlapping effectively through adjusting parameter values.
The result on Hitech is demonstrated on Fig. 8. The optimized
parameter values for IMDLayout are (λ = 0.55, a = -0.25, r = 3.0, e = 0.15),
(λ = 0.6, a = -0.25, r = 3.0, e = 0.15) for MDLayout and (λ = 1, a
= -0.25, r = 3.0, e = 0.15) for LinLogLayout, respectively.
||(a)Improved modular decomposition-based layout(0.55, -0.25,
3.0, 0.15), (b) Modular decomposition-based layout(0. 6, -0.25, 3.0, 0.15)
and (c) Linlog Layout (λ,-0.25, 3.0, 0.15)
||(a) IMD layout for Hitech dataset(0.55, -0.25, 3.0, 0.15)
(b) MD layout for Hitech dataset(0..6, -0.25, 3.0, 0.15) (c) Linlog layout
for Hitech dataset (λ,-0.25, 3.0, 0.15)
||(a) Improved modular decomposition-based layout on coauthor
ship data(λ, -0.25, 3.0, 0.15) three nodes, (b) Modular decomposition-based
layout on coauthor ship data (λ,-0.25, 3.0, 0.15) and (c) Linlog Layout
on coauthor ship data(λ, -0.25, 3.0, 0.15)
It can be found that IMDLayout can reveal clusters clearly.
The results on DBLP are demonstrated in Fig. 9, It can be
found that the proposed algorithm can put the acroteric nodes around the center.
This screw style can make the layout more aesthetic and prevent overlapping.
Additionally, in order to make the content tag have clearly view for users,
horizontal nodes are far away as possible and thus for two nodes, they are arranged
from π/2 as shown in Fig. 10a, while for three nodes
or more nodes, will be arranged from 0 as shown in Fig. 10b
and c. From Fig. 9. It can be found that
IMDLayout can show modular structure more artistic than other two algorithms.
In order to make it more clearly one author and his coauthors are displayed since they are typical circle data as shown in Fig. 11. It can be found that IMDLayout is like Atrumpet shell and more pretty than other algorithms. The nodes are sorted ascendant firstly such that the layout is more compact and regularly.
From Fig. 11a, it can be observed the radius are not sorted
like mentioned before, the reason lies that the sorted radius considering the
content tags length. The content tag are longer than in the front of serial.
||(a) IMD Layout on coauthor ship data (λ, -0.25, 3.0,
0.15) (b) MD Layout on coauthor ship data (λ,-0.25, 3.0, 0.15) and
(c) Linlog Layout on coauthor ship data (λ, -0.25, 3.0, 0.15)
||(a) Improved modular decomposition-based layout on coauthor
data (b) Modular decomposition-based layout on coauthor data and (c) Linlog
Layout on coauthor data (λ, -0.25, 3.0, 0.15)
||Improved modular decomposition-based layout for 4 different
distance parameter values (a) λ = 1, (b) λ = 0.7, (c) λ =
0.4 and (d) λ = 0.3
For example AIvor Wai-Hung Tsang, AKwan Lawrence Yeung
these long content node are put large radius circle to make it can show without
In order to observe distance parameters affect on the layout, four results are listed for λ = 1, λ = 0.7, λ = 0.4 and λ = 0.3 as shown in Fig. 12. When λ = 1, the distance is unadjusted center distance which is used popularly, one module is attracted strongly by another such that it enter inside the another module. But with the change of λ, i.e., from 0.7 to 0.3, the module gradually away and when λ = 0.3 there isnt overlapping.
CONCLUSION AND FUTURE WORK
Through introducing distance parameter λ, the energy model is generalized
to make it has more control ability for obtain non-overlapping result. Through
modular decomposition, the S-type and P-type nodes (or subgraphs) are recognized
which can be layouted by regular circle are participate in energy model layout
and avoid complexity. Because the energy model can obtain real optimized value,
the more nodes can increase the probability of overlapping. Although, through
parameter λ, node with large radius overlapping can be prevent, nodes with
small radius will be very far away from each other. Additionally, Because different
graph need different parameter value, thus our future work is to propose one
adaptive method to decide parameter value. Local Zoom like Google map to provide
multi granularities displacement is our another future work.
The study was partially supported by the National Science Foundation of China under Grant Nos. 61100166 and 61202184, the Science Research Plan of Shaanxi under Grant No. 11JK1035 and No.2013JK1139, and the Key Projects in the National Science under Grant No. 2009BAH51B00, China CNGI project under Grant No. CNGI-09-01-13, doctoral fund of ministry of education of china under Grant No. 20090201110060, the Fundamental Research Funds for the Central Universities under Grant No. xjj20100057.