Achieving Utilization of LBS Anonymity Datasets for Third Party by Mining Spatial Association Rules

Information Technology Journal

Year: 2014 | Volume: 13 | Issue: 5 | Page No.: 801-815
DOI: 10.3923/itj.2014.801.815

Achieving Utilization of LBS Anonymity Datasets for Third Party by Mining Spatial Association Rules

Zhang Hai-Tao, Huang Hui-Hui, Xu Liang and Zhang Bo-Bo

Abstract: With the development of Location-Based Services (LBS), privacy protection associated with LBS is becoming a challenge in terms of research and industrial concerns. Among the available privacy protection techniques for LBS users, spatial-temporal K-anonymity has become a prominent method because of its easy implementation and broad applicability. However, this method and its variants suffer from a common drawback: They decrease the use of LBS anonymity datasets due to a reduction of the spatial and temporal resolution of an LBS query. Today, the utilization of LBS anonymity datasets is very important for LBS providers because the datasets benefit many LBS applications. In this study, we propose a method for the implementation of LBS anonymity using mining spatial association rules among the topological relationships between LBS anonymity datasets and the related geographic background datasets. First, we define the basic concepts of the proposed approach, subsequently, we present the corresponding implementation. In phase 1, the LBS anonymity datasets are preprocessed by spatially comparing them to the related geographic background datasets to obtain a spatial transaction database. In phase 2, the traditional Boolean association rules for mining algorithms are adopted to obtain spatial association rules from the spatial transaction database generated in the phase 1. Finally, we present an experimental application of the proposed approach to large-scale LBS anonymity datasets which were simulated from real GPS trajectories. The experimental results indicate that based on the analysis of the spatial association rules generated by the approach, LBS anonymity datasets can be utilized by LBS providers and other third parties.

Fulltext PDF Fulltext HTML

How to cite this article

Zhang Hai-Tao, Huang Hui-Hui, Xu Liang and Zhang Bo-Bo, 2014. Achieving Utilization of LBS Anonymity Datasets for Third Party by Mining Spatial Association Rules. Information Technology Journal, 13: 801-815.

Keywords: spatial association rules, LBS privacy protection, spatial-temporal K-anonymity and utilization

INTRODUCTION

With the rapid advancement in mobile communication and proliferation of location tracking devices (e.g., GPS), Location-Based Services (LBS) are widely recognized as an important feature of the future computing environment (Ali et al., 2006; Baldauf et al., 2007). However, without strict safeguards, the deployment of LBS poses a severe threat to user privacy (e.g., with untrustworthy servers, location-based services may lead to several privacy threats ranging from worries over employers snooping on their workers’ whereabouts to fears of tracking by potential stalkers (Chow and Mokbel, 2007) which has gained attention from academia and industry (Kulik, 2009; Krumm, 2009).

There are mainly four primitives that make up user privacy in LBS applications: identity, issuer, location and time (Baugh and Gao, 2006; Bettini et al., 2007). Identity is the unique information about an individual that would allow the individual to be recognized as a separate entity from a set of individuals (i.e., security identifier: SID). Issuer is the identity of the user that issues the query request. Location deals specifically with the whereabouts of an individual or group which may include coordinates, landmarks, or other information. Time points out when the query is requested. Additionally, based on the composite of the three primitives, attackers can reach a greater threat to user privacy.

Initial research on the privacy protection LBS users mainly focused on establishing specifications and laws. However, this research lacked flexibility and lagged behind attack technologies. Several algorithms for protecting LBS user privacy have been proposed, such as the use of dummies (Kido et al., 2005), spatial transformation based on the Hilbert curve (Um et al., 2010) and Private Information Retrieval (PIR) protocols (Shang et al., 2010). Spatial-temporal K-anonymity (Gruteser and Grunwald, 2003) has become a prominent privacy protection technique for LBS users because it is easy to implement and widely applicable.


Fig. 1:	Centralized cloaking architecture

Spatial-temporal K-anonymity is an extension of K-anonymity and is based on the use of obfuscation. Specifically, a query request submitted to an LBS server does not consist of the requestor’s identifier and accurate location but rather the pseudonyms of the requestor, at least K-1 others nearby and a cloaking area that encloses at least K-1 locations of others. If the resolution of the cloaking area is too coarse for service quality, temporal cloaking is applied (i.e., the user’s service request is delayed). Finally, at least K pseudonyms, a cloaking area and temporal cloaking comprise an LBS anonymity dataset (hereafter referred to as an anonymity dataset) corresponding to the query request that is processed by spatial-temporal K-anonymity. Currently, the general architecture for the implementation of spatial-temporal K-anonymity consists of one trusted server (Um et al., 2009) (Fig. 1).

The components of the centralized cloaking system can be broadly classified into four subsystems: Mobile clients, base stations, an anonymizer and an LBS server. In this architecture, mobile clients act as query users and send requests to an anonymizer via base stations. According to the requests of the mobile clients, the anonymizer forms an anonymity dataset and sends it to the LBS server. The LBS server processes the anonymity dataset and returns a candidate result to the anonymizer. The anonymizer then creates an actual result from the candidate result by filtering out undesired information and returns the actual result to the mobile clients.

As mentioned above, by using pseudonyms instead of identities of users, the identity privacy is protected. Furthermore, because any pseudonym within the anonymity dataset may be requesting the service, the issuer privacy is protected from disclosure with a certain level of assurance. Likewise, the cloaking area and temporal cloaking of the anonymity dataset reduce the accuracy of the requester’s location and time information in the spatial and temporal dimensions. Specifically, the requestor’s location and time information are indistinguishable from those of at least k-1 other subjects, thus, the location and time privacy is protected to a certain extent. Generally, spatial-temporal K-anonymity can guarantee the protection of the four primitives of LBS users.

The basic concept of spatial-temporal K-anonymity has inspired a series of variants in research publications. These variants are mainly applied in LBS snapshot queries, such as “Where is the nearest gas station?” and “What restaurants are within one mile of my location?” and in continuous queries, such as “Continuously report the nearest police car” and “Continuously report the gas stations within one mile of my car”. Previously developed variant methods include enhanced privacy protection of query identification (Gedik and Liu, 2005), privacy protection in multi-mode query (Mokbel et al., 2006), flexibility in setting the privacy protection level (Xu and Cai, 2009), privacy protection within the spatial network (Ku et al., 2007) and the use of a distributed sensing network (Chow et al., 2006). Later variant methods included the robust spatial cloaking technique (Chow and Mokbel, 2007) and the prediction method based on movement characters (e.g., speed or direction) (Pan et al., 2009).

However, spatial-temporal K-anonymity and its variants suffer from a common drawback: They reduce the resolution of location information. Hence, the use of anonymity datasets may decrease or even cease in future applications. To our knowledge, studies on the successful use of LBS anonymity datasets have not been published, although this information is important for LBS providers (Zhang et al., 2013). In this study, we attempt to solve this problem by introducing an approach that successfully implements an anonymity dataset. The principle of our approach is to mine spatial association rules among spatial relationships between LBS anonymity datasets and related geographic background datasets.

PRELIMINARIES

Here, the typical method used to implement spatial-temporal K-anonymity in LBS snapshot queries is introduced.


Fig. 2:	Example of adaptive-interval cloaking algorithm

The basic units of an anonymity dataset are then presented together with a hierarchy expressing their relationships. Finally, the basic concepts of spatial association rules are described.

Spatial-temporal K-anonymity: The representative implementation of spatial-temporal K-anonymity in LBS snapshot queries is based on the adaptive-interval cloaking algorithm introduced by Gruteser and Grunwald (2003). This algorithm uses grid-based schemes which recursively partition the spatial domain into grid cells in a quad-tree style. The partitioning stops when the size of any grid cell becomes less than the threshold set by the users. All of the grid cells generated during the partitioning form a pyramid structure which is depicted in Fig. 2.

The top level of the pyramid (level 0) has only one grid cell that covers the entire spatial domain. Each grid cell, except those at the bottom level, is composed of four grid cells at the next lower level (referred to as “child grid cells”). After a query is received from user U, it traverses the quad-tree (top down) until it finds the grid cell that contains U and fewer than K-1 users. Then, the query returns the parent of the grid cell as the cloaking region. For example, U₁ issues a query where K = 2, grid cells <(0, 2), (1, 3)> contain only U₁ and thus, the parent grid cells <(0, 2), (2, 4)> are the cloaking region.

The result using the adaptive-interval cloaking algorithm consists of users’ pseudonyms (UPs), a Cloaking Region (CR) and Temporal Cloaking (TC). If we assume that the LBS server knows the spatial and temporal resolution of the anonymity datasets, i.e., the grid cell (Cell_i) and Time Interval (TI_i) of the adaptive-interval cloaking, then the anonymity dataset can be formalized as:

AS = <UPs, CR, TC>, UPs = <U₁, U₂,..., U_k>,
CR = <Cell₁, Cell₂,... Cell_m>, TC = <TI₁, TI₂,..., TI_n>

The three-level hierarchy that expresses the relationship of the units of an anonymity dataset is presented in Fig. 3 (Zhang et al., 2013).


Fig. 3:	Three-level hierarchy expressing the relationship of the units of an anonymity dataset

In this study, we focus on the spatial properties of LBS anonymity datasets and the temporary equivalence of the anonymity dataset (i.e., AS = CR = <Cell₁, Cell₂,..., Cell_k>).

Spatial association rules: Inspired by the approach of mining association rules from the transactional dataset introduced by Koperski and Han (1995) first investigated spatial association rules among a set of spatial and (possibly) non-spatial predicates. Specifically, P₁∧...P_m⇒Q₁∧Q_n.(S%, C%), where (P₁,..., P_m, Q₁,..., Q_n) are predicates, (P₁,..., P_m) are antecedent predicates, (Q₁,..., Q_n) are consequent predicates and at least one of the predicates (P₁,..., P_m, Q₁,..., Q_n) is a spatial predicate (Koperski and Han, 1995).

In fact, spatial predicates (P₁,..., P_m, Q₁,..., Q_n) describe the spatial relationships of objects in spatial datasets. Different from the traditional relationships in DBMS, spatial relationships are implicit and methods for materializing this implicit information are specifically required. Three spatial relationships are considered: The direction, distance and topology (Guting, 1994). Due to the space limitation of this study, we primarily consider topological relationships to be the spatial predicates.

Topological relationships characterize the type of intersection between two spatial objects. The formal definition of topological relationships is usually given by the nine-intersection model. In this model, each spatial object is represented as a set of points in the Euclidean plane R². For a given point set A, it is distinguished between its interior A₀, its boundary δA and its exterior (or complement)_A-1. The point sets of two spatial objects A and B are then interrelated by building all nine intersections between these point sets to determine how they are topologically related. The model is usually represented by the following nine-intersection matrix:

(1)

Each intersection described by these matrix elements is distinguished between the empty set ø and the nonempty set ¬ø which results in 2⁹ = 512 possible combinations. In this study, we focus on the topological relationships between anonymity datasets and related geographic background datasets which are composed of two-dimensional, coherent and simple spatial objects (i.e., a point, line and polygon), therefore, the exterior does not need to be considered. The result is a four-intersection matrix:

(2)

This matrix reduces the possible combinations to 2⁴ = 16. Some combinations are not logical because they do not describe any possible topological relationship. Furthermore, the geometry types of one or more cells of anonymity datasets are polygons. Removing the contradictory combinations and constraining the topological relationships between polygons and other geometric types (point, line and polygon) leads to the topological relationships illustrated in Fig. 4.

S% and C% are the support and confidence, respectively, of the rule P₁∧P_m⇒Q₁∧Q_n.(S%, C%). For example, let D = <T₁, T₂,..., T_n> be a spatial transaction database and I = <I₁, I₂,..., I_m> is the spatial itemsets (another term for a spatial relationship or a spatial predicate) of all of the spatial transactions in D. Any subset X of I is included in D. If XεT_i, then T_i contains X. The count transactions which are in D and contain X, are defined as the support counts of X and denoted as ∂x. Accordingly, the support of X is denoted as support(X) and formulated as follows:

(3)

If support(X) is greater than the threshold specified by the users, then X is called a frequent spatial itemset.

For any spatial itemsets X and Y, if X∩Y = ø, then the equation X→Y is called a spatial association rule of X (antecedent) and Y (consequent). The support of X→Y is that of X∪Y which is denoted as support(X→Y) and formulated as follows:

Support(X→Y) = support(X∪Y)

(4)

The confidence degree of X∪Y is denoted as confidence(X∪Y) and formulated as follows:

(5)

If support(X→Y)≥sup_minand confidence(X→Y)≥ conf_min, X→Y is called an association rule.

APPROACH OF MINING SPATIA ASSOCIATION RULES AMONG TOPOLOGICAL RELATIONSHIPS

An approach used to mine spatial association rules is applied to determine the topological relationships between anonymity datasets and related geographic background datasets.


Fig. 4:	Topological relationships between simple spatial objects in Euclidean plane R²

First, we format the basic definitions of the spatial association mining rules for the anonymity datasets and we then design the implementation process.

Basic definitions
Definition 1: L_tar = <O_tar1, O_tar2,..., O_tarm> represents a target layer, where O_tar1, O_tar2,..., O_tarm are instances in the target layer L_tar and each instance holds a Cloaking Region (CR) of an anonymity dataset.

Definition 2: L_rel = <O_rel1, O_rel2, O_relm> represents a relevant layer, where O_rel1, O_rel2,..., O_relm are instances in the relevant layer L_rel and L_rels = <L_rel1, L_rel2,..., L_rels> represents a set of relevant layers.

Definition 3: R = (O_tar1-O_rel1), O_tar1εL_tar, O_rel1εL_rel, L_relεL_rels represents a topological relationship between the instance O_tar1 in the target layer L_tar and the instance O_rel1 in the relevant layer L_rel, where L_rel is a member of L_rels.

Definition 4: Rr_I = <O_tar1, R = (O_tar1-O_rel1), O_rel1> and Rr_L = <O_tar1, R = (O_tar1-O_rel1), O_rel1>, where O_tar1εL_tar, O_reliεL_rel, L_relεL_rels represent topological relationships stored in a relational table at the instances level and at the relevant layers level, respectively.

Definition 5: Rt_I = <O_tar1, (O_tar1-O_rel1)_O_rel1> and Rt_L = <O_tar1, (O_tar1-O_rel1)_L_rel>, where O_tar1εL_tar, O_rel1εL_rel, L_relεL_rels represent topological relationships stored in a transactional table at the instances level and relevant layers level, respectively.

Definition 6: Let SD = <Rt₁, Rt₂,..., Rt_n> be a spatial transaction database, where Rt₁, Rt₂,..., Rt_n are a set of topological relationships that are stored in transactional tables at the instances level or relevant layers level.

Preprocessing anonymity datasets: Three basic steps are used to preprocess the anonymity datasets:

•

Selection of a target layer and a set of relevant layers: A series of anonymity datasets and a set of geographic background datasets are organized and chosen as the target layer and the relevant layers, respectively. An example is illustrated in Fig. 5. The target layer includes five anonymity datasets: AS1 = <Cell₁(2x2), Cell₂(2x3), Cell₃(3x2)>, AS2 = <Cell₁(3x5), Cell₂(4x4), Cell₃(4x5), Cell₄(4x6)>, AS3 = <Cell₁(1x9), Cell₂(2x8), Cell₃(2x9), Cell₄(3x8)>, AS4 = <Cell₁(4x9), Cell₂(5x8), Cell₃(5x9), Cell₄(6x8)>, AS5 = <Cell₁(8x7), Cell₂(8x8), Cell₃(8x9), Cell₄(8x10), Cell₅(9x7), Cell₆(9x8), Cell₇(9x9), Cell₈(10x8), Cell₉(10x9)>, AS6 = <Cell₁(7x2), Cell₂(7x3), Cell₃(7x4), Cell₄(8x3), Cell₅(9x3), Cell₆(10x2), Cell₇(10x3)>. The relevant layers include 4 layers: “Main residential area”, “Bridge”, “First class road” and “Road networks”. The geometry type of the instances of the relevant layer “Main residential area” is a polygon, whereas the geometry type of the instances of the other relevant layers is a line

•

Computation of the topological relationships and storage in a relational table: The relevant layer “Main residential area” includes three instances (Fig. 5). According to the computation method for topological relationships mentioned in preliminaries section and definition 3 in section, we can obtain three topological relationships: “WITHIN” between the anonymity dataset AS1 and instance 1, “OVERLAPS” between the anonymity dataset AS2 and instance 2 and “OVERLAPS” between the anonymity dataset AS5 and instance 3


Fig. 5:	Illustration of target layer and relevant layers

Applying the same logic, other topological relationships can be defined. According to definition 4 in basic definition section, all topological relationships between anonymity datasets and geographic background datasets illustrated in Fig. 5 can be organized at the instances level in a relational table (Table 1), where ID1 and ID2 are unique identifiers of the instances in the target layer and the relevant layer, respectively.

However, to improve the efficiency of the subsequent algorithms, it is best to store the topological relationships at the relevant layers level (Bogorny et al., 2005). This method is used here (i.e., ID2 attributes in Table 1 are discarded).

Table 1:	An example of topological relationships stored in a relational table

Transforming the topological relationships to a transactions table: For the purpose of adopting traditional Boolean association rules while mining algorithms to obtain spatial association rules, the topological relationships generated in the previous process should be transformed into a transactional table. The principle of the transformation is that all of the topological relationships of tuples with the same ID1 attribute are combined to form a transaction, as for the tuples with ID1 = AS5 and ID1 = AS6. However, the same topological relationships are not allowed in a transaction, for example, instance ID1 = AS5 has the same topological relationship “CROSSES” with the relevant layers “first class road” and “road networks”. The suffix “name of relevant layers” of the tuples is added to the topological relationships to form new types, such as “CROSSES_first class road” and “CROSSES_road networks”. The final results are shown in Table 2.

Mining spatial association rules from spatial transactions database: The typical mining spatial data association rules are generally classified into three types of methods: Methods based on clustering over layers, methods based on spatial transactions and methods without spatial transactions. Among these methods, the third type is the most popular (Zhang et al., 2007) and adopts traditional Boolean association rules mining algorithms. The process can be split into two phases: The extraction of the frequent spatial itemsets (spatial relationships or spatial predicates) and the generation of the spatial association rules from the frequent spatial itemsets. The extraction of the frequent spatial itemsets phase follows the paradigm of candidate generation-and-test to combine frequent K-1 spatial itemsets to obtain candidate K spatial itemsets which iteratively meet the sup_min threshold. The candidates are pruned according to the Apriori pruning principle: if there are any spatial itemsets that are infrequent, then their super spatial itemsets should not be generated or tested.

The generation of spatial association rules can be performed by partitioning the frequent spatial itemsets X into two non-empty subsets, X and Y-X, such that (X→Y-X) satisfies the conf_min threshold.

Table 2:	An example of topological relationships stored in a transactional table

Being limited by the manuscript length, we do not describe the details of the algorithm. For more details, see the literature by Koperski and Han (1995).

EXPERIMENT AND RESULTS ANALYSIS

Experiment datasets: In the experiment, we adopt the adaptive-interval cloaking algorithm for GPS trajectories to simulate the anonymity datasets. The GPS trajectories were collected from 2,612 taxis in Nanjing city on July 15, 2007. An outline of this process is as follows.

Preprocessing of the GPS trajectories:

•	Based on the spatial-temporal distribution of the 2,612 taxi GPS trajectories, we configure the spatial and temporal resolution parameters which are shown in Table 3. We partition the spatial-temporal domains into 12 periods and 10,000 (100x100 m) spatial-temporal grids for every period. The discrete spatial-temporal grids are illustrated in Fig. 6a
•	For each period, we randomly sample 500 spatial-temporal grids which are simulated as the location and time of each anonymous request. The sampled spatial-temporal grids are illustrated in Fig. 6b

Table 3:	Parameters of adaptive-interval cloaking algorithm

The parameters in the table can be adjusted according to the privacy protection level required by users. For example, if some users argue that the value of the “Grid Cell” parameter is set too small, not enough to protect their location privacy. There are two methods to adjust the parameters. The first method is to only set a larger value (e.g., 5 or 10 km) for the “Grid Cell” parameter, and the second method is to set both the “K” parameter and “Spatial delay” to larger values. Because the “Grid Cell” parameter is the basic unit of our mining association rules algorithm, the larger value of the “Grid Cell” parameter will result in the spatiotemporal distribution which is reflected by the mined association rules, more rough. On the contrary, the second method does not involve the “Grid Cell” parameter. Therefore, we recommend using the second method

Generation of the anonymity datasets:

•	For each sampled spatial-temporal grid, we adopt the adaptive-interval cloaking algorithm to generate an anonymous dataset. The parameters of the algorithm are shown in Table 3 and the anonymity datasets generated for one period are illustrated in Fig. 6c

All of the anonymity datasets generated from the 12 periods are translated to a layer as the target layer which stores the Cloaking Region (CR) as the geometry attribute and Users’ Pseudonyms (UPs) and Temporal Cloaking (TC) as the categorical attributes. The generated layers are illustrated in Fig. 6d.

We select 10 geographic background dataset as the relevant layers which are shown in Table 4. The overlying mapping of the target layer and the relevant layers are shown as Fig. 7.

As observed in Fig. 7, the target layer is mainly distributed in the Gulou district in Nanjing which is located west of the Qinhuai River, east of Xuanwu Lake, south of Xinjiekou and north of Xinmofan Road, the area is one of the busiest traffic areas in Nanjing city. In addition, the basic space-time variance distribution of the target layer can be observed based on the colors of the anonymity datasets which change gradually over time, however, the interpretation is not straightforward or accurate because of the tremendous amount of overlap within the cloaking region of the anonymity datasets. The outcome is that the patterns of the space-time variance distribution are too rough to aid city traffic guidance.

By preprocessing the target layer and the relevant layers according to steps 2 and 3 of preprocessing anonymity dataset section, we obtain a spatial transactions database with 8,873 records.

Table 4:	Basic information of geographic background datasets


Fig. 6(a-d):	Workflow of generating anonymity datasets from GPS trajectories



Fig. 7(a-l):	Overlaying mapping of the target layer and the relevant layers (a) 0:00~2:00, (b) 2:00~4:00, (c) 4:00~6:00, (d) 6:00~8:00, (e) 8:00~10:00, (f) 10:00~12:00, (g) 12:00~14:00, (h) 14:00~16:00, (i) 16:00~18:00, (j) 18:00~20:00, (k) 20:00~22:00 and (l) 22:00~24:00

Of those records, 16 new topological relationships are defined (Table 5). The “Count” attribute represents the count of records supporting those topological relationships.

Experiment result and analysis: We adopt two algorithms that are implemented in SPMF which is an open-source data mining framework written in Java, to mine the spatial association rules. The algorithms are Eq. 1 mining all association rules and Eq. 2 mining perfectly sporadic association rules:

•

Mining all association rules: In this algorithm, the thresholds for support (S%) and confidence (C%) of the rules to be mined (Sup_min and conf_min) are set at 65 and 80%, respectively. The mined results include six rules listed in Table 6. These rules include three topological relationships, “CROSSES_road networks”, “CROSSES_first class road” and “OVERLAPS_main residential area”, between the target layer and the three relevant layers (i.e., “road networks”, “first class road” and “main residential area”)

It is well known that Sup_min represents the minimum amount of evidence and that conf_min specifies the strength of a spatial association rule’s implications.

Table 5:	Information of coding of topological relationships

Here, the confidences of those six rules have large values that are equal to or greater than 80%. The associations reflected by the confidence values are very strong and can be visually expressed by overlaying the anonymous datasets and the geographic background datasets, among which the topological relationships constitute the transaction supporting those six rules (Fig. 8).

Subsequently, those six rules can be interpreted as conditional expressions, as shown in Fig. 8. For example, rule 5 in Table 6 can be expressed as a conditional expression with an anonymity dataset that overlaps the main residential areas and crosses road networks has (with 99% confidence) a high percentage of crossing first class roads. Likewise, other rules can be interpreted as similar conditional expressions.

Based on the visual conditional expressions of the mined rules, patterns of popular public activities can be predicted because the rules also have a large support value (equal to 68.1%). For example, based on the rules in Table 6, we can infer that most users will prefer to anonymously request LBS services in areas where the road networks and first class roads are crossed and the main residential area is overlapped. Based on this prediction, the department of urban traffic management can make decisions regarding the urban traffic flow guidance system.

Mining perfectly sporadic association rules: Sporadic rules represent rare cases that are scattered sporadically through the database but with a high confidence of occurring together, this can indicate a rare but fatal pattern. To identify the sporadic rules based on the anonymity datasets, the support (C%) threshold is constricted to an interval range (i.e., [Sup_min, Sup_max]). In this experiment, we set the interval range to [0.10, 0.20%]. Similar to previous rules, the confidence threshold conf_min of the rules to be mined is set to a large value of 60%.

Based on the results in Table 7, we observe that “CONTAINS” between the anonymity datasets and the relevant layers of “vegetation” and “main residential area” exists in the mined rules. Similarly, according to Fig. 9 which visualizes the topological relationship associations related to the rules, we can make predictions using the conditional expressions of the mined rules.

Table 6:	Basic information of mined spatial association rules

Table 7:	Basic information of mining sporadic spatial association rules



Fig. 8(a-l):	Mapping display of spatial association rules (a) 0:00~2:00, (b) 2:00~4:00, (c) 4:00~6:00, (d) 6:00~8:00, (e) 8:00~10:00, (f) 10:00~12:00, (g) 12:00~14:00, (h) 14:00~16:00, (i) 16:00~18:00, (j) 18:00~20:00, (k) 20:00~22:00 and (l) 22:00~24:00


Fig. 9:	Mapping display of mining sporadic spatial association rules

According to rule 1, a condition states that the anonymity datasets spatially containing “vegetation” will have (with 63% confidence) a high percentage of space containing “main residential area”. In contrast, according to rule 2, if the places where the anonymous requests spatially occur contain “main residential area”, then they will spatially contain “vegetation” (with 100% confidence).

However, the meanings of the prediction based on the sporadic spatial association rules are different from those based on the previous rule. The reason for this difference is that the support threshold of the mined spatial association rules is very small, the patterns implied by the spatial association rules may belong to a few specific users (Fig. 9). For example, a few specific users habituated to anonymously request LBS services near “zifeng plaza”, a landmark in Nanjing City that lies in a main residential area and a vegetated region, can be ascertained. With an accurate prediction from the pattern, the LBS server can provide its users with personalized services in an efficient manner.

CONCLUSION

In this study, we first suggested that spatial-temporal K-anonymity and its variants suffer from a common drawback in that they severely decrease the utilization of anonymity datasets. Then, we proposed a method to solve this problem by mining spatial association rules among the topological relationships between LBS anonymity datasets and related geographic background datasets. Finally, we conducted experiments by applying the proposed approach to large-scale LBS anonymity datasets to mine association rules and the experimental results demonstrate that the spatiotemporal distribution of LBS anonymity datasets can be clearly expressed and be utilized by LBS providers and third parties.

Many remaining issues must be addressed to successfully implement LBS anonymity datasets, including issues regarding classification, cluster analysis and outlier analysis. Alternative data mining technologies should also be explored. Furthermore, the implementation of LBS anonymity datasets may pose a challenge for protecting the privacy of anonymous users.

ACKNOWLEDGMENTS

This research is supported by Jiangsu Government Scholarship for Overseas Studies, the grants from the Natural Science Foundation of China under Grant No. (41201465) and the grants from the Natural Science Foundation of Jiangsu province under Grant No. (BK2012439). The authors thank the Institute of Cartography and Geoinformatics, Leibniz University Hannover for providing us with a good work environment.

REFERENCES

Ali, S., T. Torabi and H. Ali, 2006. Location aware business process deployment. Proceedings of the International Conference on Computational Science and its Applications, May 8-11, 2006, Glasgow, UK., pp: 217-225.

Baldauf, M., S. Dustdar and F. Rosenberg, 2007. A survey on context-aware systems. Int. J. Ad Hoc Ubiquitous Comput., 2: 263-277.
Direct Link

Chow, C.Y. and M.F. Mokbel, 2007. Enabling private continuous queries for revealed user locations. Proceedings of the 10th International Conference on Advances in Spatial and Temporal Databases, July 16-18, 2007, Boston, MA,USA., pp: 258-275.

Kulik, L., 2009. Privacy for Real-time Location-based services. SIGSPATIAL Special, 1: 9-14.
CrossRef Direct Link

Krumm, J., 2009. A survey of computational location privacy. Pers. Ubiquit. Comput., 13: 391-399.
CrossRef Direct Link

Baugh, J.P. and J. Guo, 2006. Location privacy in mobile computing environments. Proceedings of the 3rd International Conference on Ubiquitous Intelligence and Computing, Septemper 3-6, 2006, Wuhan, China, pp: 936-945.

Bettini, C., S. Mascetti, X.S. Wang and S. Jajodia, 2007. Anonymity in location-based services: Towards a general framework. Proceedings of the IEEE International Conference on Mobile Data Management, May 1-1, 2007, Mannheim, Germany, pp: 69-76.

Kido, H.,Y. Yanagisawa and T. Satoh, 2005. Protection of location privacy using dummies for Location-based services. Proceedings of the 21st International Conference on Data Engineering Workshops, April 5-8, 2005, Kyushu, Japan, pp: 1248-1248.

Um, J.H., H.D. Kim and J.W. Chang, 2010. An advanced cloaking algorithm using Hilbert curves for anonymous location based service. Proceedings of the 2nd IEEE International Conference on Social Computing, August 20-22, 2010, Minneapolis, MN., pp: 1093-1098.

Shang, N., G. Ghinita, Y. Zhou and E. Bertino, 2010. Controlling data disclosure in computational PIR protocols. Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security, April 13-16, 2010, Beijing, China, pp: 310-313.

Gruteser, M. and D. Grunwald, 2003. Anonymous usage of Location-based services through spatial and temporal cloaking. Proceedings of the 1st International Conference on Mobile Systems, Applications and Services, May 5-8, 2003, San Francisco, California, pp: 31-42.

Um, J.H., M.Y. Jang, K.J. Jo and J.W. Chan, 2009. A new cloaking method supporting both K-anonymity and L-diversity for privacy protection in Location-based service. Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, August 10-12, 2009, Chendu, China, pp: 79-85.

Gedik, B. and L. Liu, 2005. Location privacy in mobile systems: A personalized anonymization model. Proceedings of the 25th IEEE International Conference on Distributed Computing Systems, June 10-10, 2005, Columbus, OH., pp: 620-629.

Mokbel, M.F., C.Y. Chow and W.G. Aref, 2006. The new casper: Query processing for location services without compromising privacy. Proceedings of the 32nd International Conference on Very Large Data Bases, September 12-15, 2006, Seoul, Korea, pp: 763-774.

Xu, T. and Y. Cai, 2009. Feeling-based location privacy protection for Location-based services. Proceedings of the 16th ACM Conference on Computer and Communications Security, November 9-13, 2009, Chicago, USA., pp: 348-357.

Ku, W.S., R. Zimmermann, W.C. Peng and S. Shroff, 2007. Privacy protected query processing on spatial networks. Proceedings of the 23rd IEEE International Conference on Data Engineering Workshop, April 17-20, 2007, Istanbul, Turkey, pp: 215-220.

Chow, C.Y., M.F. Mokbel and X. Liu, 2006. A Peer-to-peer spatial cloaking algorithm for anonymous Location-based service. Proceedings of the 14th Annual ACM International Symposium on Advances in Geographic Information Systems, November 5-11, 2006, Arlington, Virginia, USA., pp: 171-178.

Pan, X., X. Meng and J. Xu, 2009. Distortion-based anonymity for continuous queries in Location-based mobile services. Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, November 4-6, 2009, Seattle, WA., USA., pp: 256-265.

Zhang, H., L. Xu, H. Huang and S. Gao, 2013. Mining spatial association rules from LBS anonymity dataset for improving utilization. Proceedings of the 21st International Conference on Geoinformatics, June 20-22, 2013, Kaifeng, China, pp: 1-6.

Koperski, K. and J. Han, 1995. Discovery of spatial association rules in geographic information databases. Proceedings of the 4th International Symposium on Advances in Spatial Databases, August 6-9, 1995, Portland, ME., USA., pp: 47-66.

Guting, R.H., 1994. An introduction to spatial database systems. VLDB J., 3: 357-399.
Direct Link

Bogorny, V., P.M. Engel and L.O. Alvares, 2005. A Reuse-based spatial data preparation framework for data mining. Proceedings of the 17th International Conference on Software Engineering and Knowledge Engineering, July 14-16, 2005, Taipei, Taiwan, pp: 649-652.

Zhang, X., F. Su, Y. S. Shi and D. Zhang, 2007. Research on progress of spatial association rule mining. Prog. Geography, 26: 119-128.

HOME JOURNALS CONTACT

Information Technology Journal

Year: 2014 | Volume: 13 | Issue: 5 | Page No.: 801-815 DOI: 10.3923/itj.2014.801.815

Achieving Utilization of LBS Anonymity Datasets for Third Party by Mining Spatial Association Rules

Zhang Hai-Tao, Huang Hui-Hui, Xu Liang and Zhang Bo-Bo

How to cite this article

Zhang Hai-Tao, Huang Hui-Hui, Xu Liang and Zhang Bo-Bo, 2014. Achieving Utilization of LBS Anonymity Datasets for Third Party by Mining Spatial Association Rules. Information Technology Journal, 13: 801-815.

Keywords: spatial association rules, LBS privacy protection, spatial-temporal K-anonymity and utilization

REFERENCES

Year: 2014 | Volume: 13 | Issue: 5 | Page No.: 801-815
DOI: 10.3923/itj.2014.801.815