INTRODUCTION
The mobile computing boom is in full swing. Online marketplaces of mobile application,
e.g., the Google Android Market and the Apple AppStore, have been launched for
users to meet their needs for all kinds of mobile applications. These markets
are undergoing tremendous expansion in recent years. Take Google Android Market
for example, since its announcement on August 28, 2008, it has raised an explosive
growth during these years, both on the number of applications and the popularity
of publishers. Till recently, it has topped the milestone of 700,000 active
applications (Cnet, 2013).
Such dramatically growing online application markets and their increasing amount
of applications not only serve opportunities but also call for better understanding
of the mobile apps, user behaviors and market ecosystems. Recently, Girardello
and Michahelles (2010), Yan and Chen (2011) and
Lim et al. (2011) have proposed recommendation
solutions to alleviate the application discovery problem in online markets.
Holzer and Ondrus (2011a) and Henze
and Boll (2011) took a developers perspective to explore the trends
which will impact the development of markets and to investigate the proper release
time for applications. Muller et al. (2011)
made a comprehensive comparison of the business models of seven popular mobile
app stores. Campbell and Ahmed (2011) presented an
assessment of the mobile OS-centric ecosystems. Moreover, these marketplaces
have gathered large scale of user data for researchers (Cramer
et al., 2011; Henze and Boll, 2010, 2011;
Yan and Chen, 2011). However, there still lacks a systematic
and large scale study of such markets, especially measurements on what statistical
features they retain and how they affect the application discovery of mobile
users.
For the above reason, this study aims to take a look at the online markets
on both the relationships among applications and their navigation effects on
the users. Main efforts and contributions of this study are as follows:
• |
It initially crawls real data from the Google Play market
and construct three complex networks based on the data set. Such networks
are constructed to capture the relationships among mobile applications as
well as the navigating effects on users placed by the online market |
• |
It originally conducts multilevel measurements of the three networks.
Such measurements reveal their characteristics as complex networks, including
the scale-free degree distribution, the clustering tendency, the small average
shortest path length and the feature of community structure |
• |
It specifically generates analysis of the application market based on
its network measurements, which help to gain better understanding on both
the relationships among applications and its possible effects on the online
behaviors of users |
The measurements, observations and analysis in the study provide original insights
to the inner relationship among applications, the marketing features of android
ecosystem and the influences placed by the android market on user behaviors.
Such findings are expected to fundamentally guide the recognition, design, development
and evolution of the android market. To the best of the knowledge, this study
is the first to take a complex network approach to systematically measure and
analyze the android market. It is worth noting that although this study focuses
on the Google Android Market, the methodologies developed can be easily applied
to other online mobile app markets.
The reminder of this study firstly presents the related works. It then describes
the data set crawled for this study. Following that it presents the statistics
and analysis of the app relationship networks and the user navigation network,
respectively. Finally, conclusions are drawn for this study.
Complex network research can be tracked back to pioneering works of Flory
(1941), Rapoport (1951, 1954,
1957) and Erdos and Renyi (1959,
1960, 1961). It has been particularly
contributed by the small-world concept (Watts and Strogatz,
1998), scale-free models (Barabasi and Albert, 1999)
and the community identification methods (Girvan and Newman,
2002). Afterward, the structure and dynamics of various natural and human-made
networks have been studied (Albert and Barabasi, 2002;
Rubinov and Sporns, 2010; Newman
2003b; Boccaletti et al., 2006) along with
the development of complex network theory (Lancichinetti
and Fortunato, 2009; Shanker, 2010). However, the
online marketplaces with huge amount of applications have not been characterized
before. Therefore, this study not only constructs the complex networks of mobile
apps but also originally characterizes and measures them from multiple levels
and scopes.
The prior studies about mobile application marketplaces mostly focused on the
business models from a variety of perspectives (Muller
et al., 2011; Tuunainen et al., 2011;
Holzer and Ondrus, 2011b). Recently, they capture interests
from research community by their explosive growth and usage as distribution
channels for gathering large scales of user feedbacks (Cramer
et al., 2011; Henze et al., 2011).
Meanwhile, some work has paid attention to the application recommendation (Girardello
and Michahelles, 2010; Lim et al., 2011;
Yan and Chen, 2011) and promotion (Henze
et al., 2010). Little effort, however, has been paid to the systematical
observation and analysis of online markets. Not to mention the measurements
based on a large scale of real data. Furthermore, there is little literature
which has introduced the complex network to investigate the online markets of
mobile applications, which is filled by this study.
DATA SET
To support this study, web pages of 104303 applications have been crawled from
the website of Google Play Market. Start points of the crawling include the
popular applications, the latest applications and randomly selected applications.
Such applications concurrently serve as the start points of accessing the online
market by most of the users. The crawling stops when there are no more new applications
following the application links on the webpages. Such a process guarantees that
the data set would cover applications accessed by most users at most time. The
results of measurements thus are expected to be reliable.
Each webpage crawled for an application contains a wide range of information,
e.g., the descriptions, reviews and permissions. To conduct the measurements,
specific app features are extracted from the webpages to characterize each application,
which are listed in Table 1 along with their descriptions.
Specifically, there are several alsoview and alsoinstall
links on the webpage of each application, which serve as selected application
recommendations. It is defined that there is an alsoview or an alsoinstall
relationship between two applications when one of them is recommended to another
by alsoview or alsoinstall links, respectively.
Table 1: |
Application characterized features |
 |
APP RELATIONSHIP NETWORK
Study of this section measures the relationship among applications. To this
end, two kinds of application relationship networks, i.e., the alsoview
network and the alsoinstall network are constructed. These two networks
are derived from the relationships among applications, which are featured with
user viewed this application also viewed and user installed
this application also installed, respectively.
CONSTRUCTION OF NETWORKS
Details of Googles recommendation methods for each alsoview and alsoinstall
applications keep unknown to the public. It is assumed that both the alsoview
and the alsoinstall relationships are symmetry ones by the semantic understanding.
Thus both networks are constructed using the undirected graph methods. In such
two networks, each node denotes an application and an edge denotes an alsoview
or alsoinstall link crawled from the websites.
According to the common definitions of complex networks, the alsoview and the
alsoinstall networks are denoted by Gv and Gi, respectively.
Nodes and edges of them are defined by sets of N(Gv), E(Gv)
and N(Gi), E(Gi), in which each node vv∈N(Gv)
or vi∈N(Gv) is an application identified by its application
id and each edge ev∈ E(Gv) or ev∈
E(Gv) is the relationship alsoview or the alsoinstall,
respectively. The sizes of the two networks can be identified by the number
of their nodes and edges, as listed in Table 2.
General statistics: To systematically reveal the relationship among
applications, this section measures the two networks from the node level to
the network level. It leverages several mainstream metrics including those fine-grain
and coarse-grain ones. Based on these quantitative indicators, the investigation
further explains the possible causes behind them.
Node level: Given a single application, it is usually cared how many
applications are connected to it, which means how many applications are related
to it in terms of the user behaviors. To generate such a measurement, this study
leverages the metric of node degree and lists the results in Table
2. To gain a global view of this metric, it further measure the distribution
of it across the whole network, which is illustrated in Fig. 1.
The distribution of node degree determines an important feature, called scale-free,
in complex networks.
From Fig. 1 and Table 2, it can be derived
that although the android online market provides only four related applications
for an application by each relationship, there still are a few nodes with extreme
large degree while a majority of nodes have small one.
Table 2: |
Statistics and measurements of app relationship networks |
 |
|
Fig. 1: |
Degree distributions in app relationship networks |
|
Fig. 2: |
Distributions of shortest path length between node pairs in
app relationship networks |
That is, there are popular applications that are connected by a large amount
of applications based on user behaviors across the whole market.
Dyad level: The bilateral relationship between nodes in network primarily
depends on their connections. Such connections can be identified using the path
between them. Hence, measurements in this section are generated based on the
path between two nodes. To be specified, this section investigates the length
of the shortest path between each pair of nodes, which indicates how close the
two applications are with each other.
Since some nodes in networks may be isolated from others, the length of shortest
path for the node pairs, which include at least one reachable path between them,
is examined. The results are illustrated in Fig. 2. It shows
that the maximum and average distances between applications are much smaller
in the alsoview network than that in alsoinstall network. It suggests that the
viewing behaviors of mobile users are more concentrated than the installing
behaviors. This may result from that viewing is more freely but installing needs
more efforts so that users are more cautious on the installing than just viewing.
Triple level: On the triple node level, this study investigates the
clustering coefficient of network which is the measure of degree to which neighbors
of a node tend to connect together. This tendency has been observed in most
real-world networks, especially in social networks (Watts
and Strogatz, 1998). Therefore, examine whether such a feature also exists
in the application relationship networks.
This part of measurements takes the definition of clustering around a node
v as the number of triangles in which the vertex v participates
normalized by the maximum possible number of such triangles:
where, Tv is the number of triangles through the node v and
dv is the degree of the node.
It can be derived from the average clustering coefficient in Table
2 that weak clustering tendency exists in both app relationship networks
which may attribute to the intrinsic similarity between applications related
to the same neighbor. Meanwhile, it concurrently suggests that users are find
more novel applications.
Community level: One of the most important features of complex networks
is the community structure, which has been empirically found in many real technological,
biological and social networks. Furthermore, its emergence seems to be at the
heart of the network formation process.
A community is a group of nodes with most edges inside groups and few edges
between them. Particularly, when no edge exists between communities, the network
is broken up to components. This study leverages the community detection methods
from (Blondel et al., 2008) and generates communities
which maximize the modularity using the Louvain heuristics. As shown in Table
2, isolate nodes count large proportion of the communities. Hence, only
several largest components are reported in Fig. 3.
It illustrates that there are less than 10 communities that are not isolated
in Gv and Gi. More precisely, the network of alsoinstall
relationship has only 4 such communities while the number of such communities
in the network of alsoview relationship is 9. Furthermore, it is revealed that
almost all nodes are included in the largest community which covers 103094 nodes
out of 103324 nodes in the alsoview network and 103130 nodes out of 103318 nodes
in the alsoinstall network. This observation indicates that almost all applications
are related in terms of the viewing and installing behaviors. This may result
from the large population of online users, which makes the experiences of other
users valuable to alleviate the application discovery.
Network level: On the network level, this study considers two metrics
which play important roles on characterizing the internal structure of network,
especially in measurement of separation between two nodes. They are the network
diameter and the average shortest path length.
|
Fig. 3(a-b): |
Size of top 10 components in app relationship networks (a)
Alsoview and (b) Alsoinstall |
The network diameter indicates the largest length among the shortest paths
of node pairs. The average shortest path length denotes the average length of
the paths between all node pairs.
The android market doesnt provide related applications for some applications.
Thus not all the nodes in the relationship networks are connected to each other.
This study defines the network diameter as the diameter of the largest connected
component and the average shortest path length as the average of all distances
between connected nodes. As shown in Table 2, the average
shortest path length of the alsoview network is much smaller than that of the
alsoinstall network. Such quantitative results demonstrate that in app relationship
networks, applications that are connected more closely by the viewing behavior
but looser by the installing behavior. This may also attribute to the reason
that installing takes more efforts than viewing.
Advanced measurements: This part of study takes further analysis in
this section to better understand the users preferences and consumption
habits, which may influence the developing and marketing decision of applications.
It examines problems such as which kinds of applications are popular, how the
price of application affects users installations and why applications
are related to each other in terms of users viewing or installing behavior.
The measurements and analysis are conducted from two perspectives, i.e., the
measurements of applications and that of relationships.
Measurement of applications: As for applications, this section investigates
the price distribution of applications, the popularity across different categories
and the relationship among application installations, prices and ratings. Among
them, the price distribution plays an indicator to the marketing feature of
android ecosystem; the popularity across categories of applications reveals
the taste of users. Furthermore, the relationship among installations and the
price and rating of applications exposes the acceptance of users against the
expectation of developers.
Price distribution: As shown in Fig. 4, the price
distribution indicates that in android market there are much more free applications
than paid applications and those paid applications concentrate their prices
in a narrow range.
|
Fig. 4(a-b): |
Comparison of (a) Free and paid applications and (b) Price
distribution of paid applications |
|
Fig. 5: |
Installation distribution across prices |
The reason may be uncovered by Fig. 5, which suggests that
most consumers tend to install applications of low price, which may serve as
an important indicator for the design and development of online app markets.
Installations across categories: To find which category of applications
are of the most popular, come top 5 categories which occupy the most installations
are generated (1) Tools (2) Arcade and action, (3) Communication (4) Entertainment
and (5) Travel and local, as shown in Fig. 6.
|
Fig. 6: |
Installation distribution across categories |
Price, rating and installations: To understand the relationship among
the price, rating and installations of applications, this study models a multivariate
regression problem. That is, the installation of an application yi
is a variable dependent on the price xp and the rating xr
of the application. That is:
Given all the data collected, the model is denoted as:
where, Y = (yi1, yi2,
, yin)T,
A = (αp, αt)T, X = (Xp,
Xr)T, in which Xp = (xp1, xp2,
,
xpn)T, Xt = (xr1, xr2,
, xrn)T and n denotes the dimension of vectors,
i.e., the number of applications.
The study extracts the values of those variables from the data set and constructs
vectors for the installations, price and rating of applications with 100,000+dimensions.
Then the multivariate regression model is solved using the least squares and
get the coefficients as follows:
xp = -4378.49, xr = 41855.03,
β = -85808.38. |
The results tell that the installations of users comply with a natural law
that the lower the price and the higher the rating, the more the installations.
Measurement of relationship: This section steps further to exploit the
correlation between applications which are connected in terms of the properties
of applications. To be specified, it investigates the assortativity in node
degree and the similarity of application category, price and popularity.
Degree assortativity: Assortativity in complex network is a characteristic
to identify the tendency of nodes to connect to others that are similar or different
in specific ways. Although there are kinds of measurements for the similarity
of nodes, researchers often examine the assortativity using the nodes
degree, for correlation between nodes of similar degree are often observed in
many real networks.
The degree assortativity coefficient is defined according to study of Newman
(2003a). That is:
where, qi denotes the distribution of the remaining degree, eij
refers to the joint probability distribution of the remaining degrees of two
nodes and where σq is the standard deviation of the distribution
qi.
Table 3: |
Degree and property assortativity in app relationship networks |
 |
Empirical results listed in Table 3 show that the two networks
of application relationship show disassortative mixing, or dissortativity, as
applications with high degree tend to attach to low degree applications. That
is, popular applications are not tending to connect with popular applications.
This would facilitate the application discovery, which means users will not
be constrained only in the circle of popular applications.
Property assortativity: The property assortativity indicates the similarity
between connected applications from the view point of application properties,
including the price, rating, installation and category. The definition of assortativity
coefficient is defined as in study (Newman 2003b) as:
where, ε is the matrix consisted of elements eij. Here, eij
denotes the fraction of edges connecting a node with property i to another node
with property j in the network.
It can be figured out from the results listed in Table 3
that in both app relationship networks, connections between applications mostly
rely on their categories and do not correlate much with their ratings, while
installations between connected applications tend to behave reversely across
the two networks. That is, users may tend to view or install applications with
same categories.
USER NAVIGATION NETWORK
This section studies the possible influences imposed by the android market
on the navigating behaviors of mobile users. To this end, it construct a user
navigation network based on the major relationship among applications on the
website. Statistics and analysis in this section reveal the characteristics
of the user navigation network as a complex network.
Table 4: |
Statistics and measurements of user navigation network |
 |
Network construction: Although, each of the alsoview and alsoinstall
relationships is treated as a symmetric relationship by semantic understanding,
their links on website are not all reciprocity. This section constructs the
user navigation network leveraging the directed graph since web links on the
webpages are directed. Since users would follow both kinds of links, it is assumed
that the alsoview and alsoinstall links dont make much difference in terms
of navigating users to discover applications. After all, in the user navigation
network, each node denotes an application and a directed edge denotes a navigating
link on the website between two applications. The size of user navigation network
is identified by its number of nodes and edges, as shown in Table
4.
Statistics and analysis: This Section investigates some statistic metrics
of the user navigation network. They are expected to reveal the influences imposed
by the online market on the behaviors of mobile users. The empirical results
and their analysis are described from both coarse-grain and fine-grain.
In-degree distribution: In the user navigation network, the out-degree
of a node indicates is destined and limited by the website. Therefore this part
of study focuses on how many applications will link in to each application,
which is denoted as the in-degree of each node.
The scale-free in-degree distribution, as shown in Fig. 7,
demonstrates that there are applications refereed by many other applications
in the user navigation network, with the maximum in-degree as 491, regardless
of the out-degree of each application which is limited to be less than 8. This
suggests that users tend to flock to some popular applications when surfing
in the app market.
Reciprocity links: The reciprocity of the network is further investigated.
It measures the tendency of node pairs to form mutual connections. Any pair
of applications with a reciprocity link point to each other in the market, thus
user can jump forward and back between them.
|
Fig. 7: |
In-degree distribution in user navigation network |
This kind of links facilitates the browsing convenience of users but occupy
the chances for other applications to be discovered by users. The link reciprocity
is defined as the ratio of the number of links pointing in both directions ε↔
to the total number of links ε:
Table 4 shows that more that 30% of the edges are mutual
ones, which is higher than the one in address book of E-mail networks (Garlaschelli
and Loffredo, 2004). Therefore there is a probability of 0.308 for users
to find a link to the previous application when they want to jump back but the
probability of discovering a new application is only 0.692 when users follow
a link.
Strongly connected components: Components in a directed graph are maximum
connected sub graphs with weakly or strongly connected nodes. Nodes in strongly
connected components are reachable by each other while nodes in weakly connected
components are reachable by each other in the corresponding undirected graph.
Measurements related to components in the user navigation network indicate
the range of the application discovery in the market. A node in a same strongly
connected component can be reached from any other node which is browsing by
users on an online market website. While weakly connected components that are
components in the undirected conversion of the user navigation network have
been studied in the app relationship networks, this study currently focus on
the strongly connected components.
Figure 8 illustrates the sizes of all components and the
distribution of component sizes, which shows that almost all nodes, with 103028
out of 103348 nodes, are in the largest component. It can also be found that
most strongly connected components only include single node, with 157 out of
176 components. That is, most applications in the user navigation network can
be reached each other, while few of them dont have bilateral paths to
other applications. Thus it needs no extra search for users to reach other application
from one in the market.
Average path length: The path length in the user navigation network
means how many hops users need to forward before reaching an application from
another one. To some extent this metric determines the efficiency of application
discovery in the market.
Since not all nodes in the user navigation network are connected, the average
path length is defined as the average one of all connected nodes. Numerical
results listed in Table 4 tell that although the average path
length 8.117 is not large, the scale of the shortest path length steps across
a wide range, with the maximum length 31, which is the network diameter. That
is, it is generally not long from an application to another. However, distances
between some applications are not short.
|
Fig. 8(a-b): |
(a) Size of components and (b) Distribution of component
sizes |
CONCLUSION
Online marketplaces of mobile applications have undergone a rapidly growth.
For better understanding and utilizing it, this study takes a novel perspective
to look into the android market, by leveraging the complex network view. It
exploits the app relationship networks and the user navigation network from
the data set crawled from Google Android Market. Based on these networks, the
study measures and analyzes multilevel characteristics of them using the concepts
of complex network to reveal the application relationships and the effects of
application discovery. Findings of this study are original in the measurement
of mobile app online markets compared to the results in the literature. They
are expected to give a fundamental guidance for the better recognition, design,
development and evolution of the android market. Moreover, methodologies developed
by this study can be easily applied to other online markets of mobile applications.
ACKNOWLEDGMENTS
We would like to thank the anonymous reviewers for their valuable comments.
This project is supported by the National Natural Science Foundation of China
projects (Program No. 61202486, No. 61170260, No. 61070201 and No. 61172018).
This study is also supported by Scientific Research Program Funded by Shaanxi
Provincial Education Department (Program No. 2013JK1139).