INTRODUCTION
Rough set theory presented by Pawlak (1982) and Pawlak
and Skowron (2007) is an effective mathematical method to analyze and deal
with the imprecise, inconsistent and vague information. With over 30 year’s
development, it has been widely used in the fields such as pattern recognition,
machine learning, decision analysis, knowledge acquisition and data mining (Qian
et al., 2010; Hu et al., 2007; Sen
and Pal, 2009; Xu et al., 2012a; Liang
et al., 2012; Gu et al., 2011) and
so on.
The classical rough set theory is based on the complete information systems
and there are no unknown values. However, the Incomplete Information Systems
(IIS) (Yang et al., 2009a; GrzymalaBusse,
2004; Stefanowski and Tsoukias, 2001) can be found
everywhere for a lot of unpredictable reasons. Therefore, mining rules from
incomplete information systems is one of the important directions for the development
of rough set. Most of decision rules are positive property (Pawlak,
1982; Xu et al., 2012b; Gu
et al., 2011; Yang et al., 2009b;
GrzymalaBusse, 2004; Stefanowski
and Tsoukias, 2001), but we also need some form of negative rules, it means
that if an object doesn’t satisfy the certain attributevalue pairs in
the condition part, then we can exclude such object from the decision class,
which was firstly proposed and successfully applied in the medical diagnosis
expert system by Tsumoto (2000). Yao
(2010, 2011) solved the acquisition of threeway
decisions rule, such as the positive, negative and obtained decision rules based
on the model of probabilistic rough sets, but it’s a pity that he didn’t
pay attention to the reduction. To acquire the negative decision rule in IIS,
Yang et al. (2009a) put forward the difference
relationship based on rough set and Xu et al. (2012a)
proposed the variable precision rough set model based on negative support set
of descriptor, but their attribute reduction algorithm based on discernibility
matrix, was not conducive to automatic realization by computer.
In this study, the notion of similarity and dissimilarity knowledge granule
based on similarity relation in incomplete decision information system, is introduced
firstly. The lower and upper approximation sets based on similarity and dissimilarity
knowledge granule, are constructed to acquire the positive and negative decision
rules from the incomplete decision information system. Secondly, to obtain a
simplified decision rule, the heuristic algorithm of the attribute reduction
based on the significant attributes is proposed. The final results of example
analysis show that this algorithm is feasible and practical.
INCOMPLETE DECISION INFORMATION SYSTEM
A Decision Information System (DIS) is defined as a 4tuple DIS = <U, AT,
V, f>, where U is a nonempty finite set of objects, AT = C∪D is a nonempty
finite set of attributes, C denotes the set of condition attributes and D denotes
the set of decision attributes and C∩D = ø. Each attribute α∈AT
is associated with a set V_{a} of its values, V_{a }is called
the domain of α and then V = V_{C}∪V_{D}. For any x∈U,
f (x,α) is the value that x holds on α (α∈AT). If D = {d},
DIS = <U, C∪{d}, V, f> is a single attribute decision information
system, otherwise it is called a multiple attribute decision information system.
To simplify our discussion, we only discuss the single attribute decision information
system.
An Incomplete Decision Information System (IDIS) is a decision information
system IDIS = <U, C∪{d}, V, f>, where the conditionattribute values
for some objects are unknown. In this study, we consider the unknown value of
the condition attributes as missing (GrzymalaBusse, 2004),
which is denoted by the special symbol “*”. For instance, if
f (x, α) = ‘>*” (x∈ U, α∈AT), here we assume
that “*” only may appear in the values of C, thus, V = V_{C}∪V_{d.}∪{*}.
DECISION RULES
In the rough set theory, the knowledge hidden in IDIS = <U, C∪{d}, V, f>, will be extracted by the form of decision rules. Through the investigation to the object x∈U, we can get the decision rule such as:
where, des([x]_{C}) is the condition part of the rule, it shows the description of object x under the conditionattribute set C, i.e., des ([x]_{C}) =∧ _{c∈C}(c, v_{c}) and v_{c} ≠ *; des ([x]_{d}) is the decision part of the rule, it shows the description of object x under the decision attribute d, i.e., des([x]_{d}) = ∨_{i∈Vd}(d, i), here i∈V_{d }is the label of the class. On the basic of des([x]_{C})→des([x]_{d}), we can conclude that the object x belongs to some certain decision class, therefore, des ([x]_{C})→des([x]_{d}) is called a positive decision rule.
For each positive decision rule r_{x}, the certainty factor Cer (r_{x}) is defined to measure it’s decision ability:
where, card (X) is the cardinal number of the set X, [x]_{C} is the set of those elements having the same description as x on the condition attribute set C, denoted as [x]_{C} = {y∈U: f (y, c) = f (x, c), ∀c∈C} and [x]_{d} is the set of those elements having the same description as x on the decision attribute d, it is recorded as [x]_{d} = {y∈U: f (y, d) = f (x, d)}.
If Cer (r_{x}) = 1, then, r_{x} is a definite positive decision rule; if 0<Cer (r_{x})<1, r_{x} is an indefinite positive decision rule. The degree of certainty is measured by the value of Cer (r_{x}).
In addition to the above positive decision rule, we always need such decision
rule called negative decision rule as, if an object’s condition attributevalue
pairs don’t satisfy some certain condition in the condition part, then
we can conclude that the object x is not belong to the certain decision class,
in other words, we can exclude it from some decision class. The negative rule
can be described as:
Accordingly, the general form of the negative decision can be expressed as:
where, des(¬[x]_{C}) is the condition part of the negative
decision rule and shows the description of the object x under the conditionattribute
set C, that is to say des (¬[x]_{C}) = ∧_{c∈C}¬(c,
v_{c}) and v_{c} ≠ *; correspondingly des (¬[x]_{d})
is the decision part of the negative decision rule and shows the description
of the object x under the decision attribute d, i.e., des¬ ([x]_{d})
= ∨_{I∈Vd}¬ (d, i), here I∈V_{d }is the label
of the class. Similarly, for each negative decision rule, the certainty factor
can be denoted by:
where, ¬[x]_{C} is the set of those elements with the same description as x on the condition attribute set C, recorded as ¬[x]_{C} = {y∈U: f (y, c) ≠ f (x, c), ∀c∈C} and ¬[x]_{d} is the set of those elements having the same description as x on the decision attribute d, recorded as ¬[x]_{d} = {y∈U: f (y, d) ≠ f (x, d)}. If Ncer (r_{x}) = 1, then, r_{x} is a definite negative decision rule; if 0<NCer (r_{x})<1, then, r_{x} is an indefinite negative decision rule. The degree of certainty is measured by the value of Ncer (r_{x}).
RULE ACQUISITION
For a lot of unpredictable reasons in real word, the study is mostly in incomplete decision information system. Therefore, the study mainly discussed how to acquire the positive and negative decision rules in incomplete decision information system.
Similarity knowledge and positive rule
Definition 1: Let IDIS = <U,
C∪{d},V, f>, for any A⊆C, S (A) is a similarity relation on the
universe U, decided by the attribute subset A and define as:
Obviously, the binary relation S (A) is reflexivity and transitivity, but not necessarily symmetric. Let SIM_{A} (x) = {y∈U: (x, y)∈S (A)} be the set of those elements similar to x, regarded SIM_{A} (x) as the basic knowledge granule (i.e., similar knowledge granule), the lower and upper approximations of X can be constructed to acquire the positive decision rule.
Definition 2: Suppose IDIS = <U,C∪{d},V, f>, for any X⊆U, A⊆C, the lower and upper approximations of X with regard to A, can be defined with SIM_{A} (x) as the basic knowledge granule:
Based on the similar knowledge granule SIM_{A} (x), the ordered pair
<,
>
is definded as the rough set of X with regard to A. Through the lower and upper
approximation of X, the positive, boundary and negative region can be, respectively
described as follows:
Theorem 1: Suppose IDIS = <U, C∪{d}, V, f>, U/d = {D_{1}, D_{2},…,D_{i}} is the partition of U induced by decision attribute d, D_{i} = {x∈U: f (x, d) = i} and i is the label of the class. Then, for any x∈U, we have:
• 
If SIM_{A} (x)⊆POS_{SIM} (D_{i}),
then, r_{x}: des ([x]_{C})→ des ([x]_{d}) is
a definite positive decision rule 
• 
If SIM_{A} (x)⊆BND_{SIM}(D_{i}), then r_{x}:
des ([x]_{C})→ des ([x]_{d}) is an indefinite positive
decision rule 
Proof 1: Since, SIM_{A} (x)⊆POS_{DIM}(D_{i}), for any x∈SIM_{A} (x), then SIM_{A} (x)≠ø and SIM_{A} (x)⊆D_{i}. By the definition of SIM_{A} (x), the elements whose description is same as x in X, have the same decision attributevalue i, i∈V_{d}. So, Cer (r_{x}) = 1 sets up, then, r_{x} is a definite positive decision rule.
Proof 2: The proof is similar to proof 1.
Dissimilarity knowledge and negative rule: Through the similarity knowledge
induced by a similarity relation, we can obtain the positive decision rule,
but can’t get the negative decision rule. Below is the acquisition of the
negative rules.
Definition 3: Suppose IDIS = <U,C∪{d},V, f>, for any A⊆C, S (A) is a similarity relation on the universe U determined by the condition attribute subset A, DIM_{A} (x) is the set of those elements dissimilar to x and defined as:
To acquire the negative decision rules, the following discussion studies how to construct the lower and upper approximations of X, based on DIM_{A} (x) as the basic knowledge granule (that is the dissimilar knowledge granule).
Definition 4: Suppose IDIS = <U,C∪{d},V, f>, for any X⊆U, A⊆C, the lower and upper approximations of X with regard to A, can be definded with DIM_{A} (x) as the basic knowledge granule:
Based on the dissimilar knowledge granule DIM_{A} (x), the ordered
pair (,
is defined as the rough set of X with regard to A. Through the lower and upper
approximations of X, the positive, boundary and negative region can be respectively
described as follows:
Theorem 2: Let IDIS = <U, C∪{d}, V, f>, U/¬d = {¬D_{1}, ¬D_{2},…,¬D_{i}} is the cover of U induced by the decision attribute d, ¬D_{i} = {x∈U: f (x,d) ≠ i} and i is the label of the class. Then, for any x∈U, we have:
• 
If DIM_{A} (x)⊆POS_{DIM} (¬D_{i}),
then, r_{x}: des (¬[x]_{C})→ des (¬[x]_{d})
is a definite negative decision rule 
• 
If DIM_{A} (x)⊆BND_{DIM} (¬D_{i}), then,
r_{x}: des (¬[x]_{C})→ des (¬[x]_{d})
is an indefinite negative decision rule 
Proof 1: Since, DIM_{A} (x)⊆POS_{DIM} (¬D_{i}),
for any x∈DIM_{A} (x), then, DIM_{A} (x) ≠ ø
and DIM_{A} (x)⊆¬D_{i}. By the definition of DIM_{A}
(x), the elements whose description is different from x in X, all don’t
take the decisionvalue i, i∈V_{d}. So, Ncer (r_{x}) =
1 sets up, then, r_{x} is a definite negative decision rule.
Proof 2: The proof is similar to proof 1.
REDUCTION
Reduction is also called attribute reduction or feature selection and it is the key issue to study the rough set theory [34,6,9]. Through the attribute reduction, the simplified decision rule can be obtained. In this section, the heuristic algorithm based on the importance is introduced to simplify the positive and negative decision rules.
Definition 5: Suppose IDIS = <U,C∪{d},V, f>, A⊆C, D = {D_{1},D_{2},…,D_{r}} is the partition of U induced by the decision attribute d, where D_{i} = {x∈U: f (x,d) = i} and ¬D_{i} = {x∈U: f (x,d)≠i}. we define:
• 
If L_{SIM} (A) = L_{SIM} (C), then, A is
called a lower approximation distribution consistent set about similar knowledge
granule. And if any proper subset of A is not a lower approximation distribution
consistent set, then we called that A is the lower approximation distribution
reduction about the similar knowledge granule 
• 
If U_{SIM} (A) = U_{SIM} (C), then A is called an upper
approximation distribution consistent set about similar knowledge granule.
And if any proper subset of A is not an upper approximation distribution
consistent set, then we called that A is the upper approximation distribution
reduction about the similar knowledge granule 
• 
If L_{SIM} (A) = L_{SIM} (C) and U_{SIM} (A) =
U_{SIM}(C), then, A is called a distribution consistent set about
the similar knowledge granule. And if any proper subset of A is not a distribution
consistent set about similar knowledge granule, then we called that A is
the distribution reduction about the similar knowledge granule 
• 
If L_{DIM} (A) = L_{DIM} (C), then, A is called a lower
approximation distribution consistent set about dissimilar knowledge granule.
And if any proper subset f A is not a lower approximation distribution consistent
set, then, we called that A is the lower approximation distribution reduction
about the dissimilar knowledge granule 
• 
If U_{DIM} (A) = U_{DIM} (C), then, A is called an upper
approximation distribution consistent set about dissimilar knowledge granule.
And if any proper subset of A is not an upper approximation distribution
consistent set, then we called that A is the upper approximation distribution
reduction about the dissimilar knowledge granule 
• 
If L_{DIM} (A) = L_{DIM} (C) and U_{DIM} (A) =
U_{DIM} (C), then, A is called a distribution consistent set about
the dissimilar knowledge granule. And if any proper subset of A is not a
distribution consistent set about dissimilar knowledge granule, then we
called that A is the distribution reduction about the dissimilar knowledge
granule 
How to obtain the distribution reduction based on similar and dissimilar knowledge
granule, it is focused on how to look for all of the lower and upper approximation
distribution reductions of the decision classes. The heuristic algorithm of
the attribute reduction based on the significant attributes is introduced in
the following discussion.
Suppose DIS = <U, C∪{d},V, f>, D = {D_{1},D_{2},…,D_{r}} is the partition of U induced by decision attribute d, where D_{i} = {x∈U: f (x,d) = i} and ¬D_{i} = {x∈U: f (x,d) ≠ i}. For any condition attribute α∈C , based on the similar knowledge granule and dissimilar knowledge granule, it’s dependence degrees on the lower and upper approximation of decision class D are, respectively expressed as follows:
Obviously, S_{a} (SIM,D), S^{a} (SIM,D), S_{a} (DIM,D) and S^{a} (DIM,D) are all greater than or equal to zero.
If S_{a} (SIM,D) = 0 (or S^{a} (SIM,D) = 0), after eliminating attribute α, we can conclude that the lower (or upper) approximation distribution of decision class D based on the similar knowledge granule doesn’t change.
If S_{a} (SIM,D)>0 (or S^{a} (SIM,D)>0), after eliminating
attribute α, we can find that the lower (or upper) approximation distribution
of decision class D based on the similar knowledge granule will be changed,
then the attribute α can’t be reducted.
If S_{a} (DIM,D) = 0 (or S^{a} (DIM,D) = 0), after eliminating attribute α, we can conclude that the lower (or upper) approximation distribution of decision class D based on the dissimilar knowledge granule doesn’t change.
If S_{a} (DIM,D)>0 (or S^{a} (DIM,D)>0), after eliminating attribute α, we can find that the lower (or upper) approximation distribution of decision class D based on the dissimilar knowledge granule will be changed, then the attribute α can’t be reducted.
Based on the definition 5, to obtain the distribution reduction of similar and dissimilar knowledge, the key is finding out their lower and upper approximation distribution reduction. Below is a heuristic algorithm of the attribute reduction based on the significant attributes.
Algorithm 1: The lower/upper approximation distribution reduction about the similar knowledge.
Suppose IDIS = <U, C∪{d}, V, f>, C = {α_{1}, α_{2},…,α_{i}}
is the set of attributes, D = {D_{1},D_{2},…,D_{r}}
is the partition of U induced by decision attribute d. The steps to obtain the
lower approximation distribution reduction of C which is denoted by R, can be
described as follows:
Step 1: 
Suppose R = ø 
Step 2: 
If L_{SIM} (R) = L_{SIM} (C), then, return to Step 4,
otherwise return to Step 3 
Step 3: 
For any α_{i}∈(CR), calculate S_{ai} (SIM,D),
when find the minimal value S_{ai} (SIM,D), get a_{i} and
replace R = R∪{α_{i}}, then, return to Step 2 
Step 4: 
Export R. R is a lower approximation distribution reduction of C about
the similar knowledge 
Note: If the judgment condition L_{SIM} (R) = L_{SIM}
(C) in step 2 is replaced by U_{SIM} (R) = U_{SIM} (C) and the
S_{ai} (SIM,D) is replaced by S^{ai} (SIM,D), then we can get
R, where R is an upper approximation distribution reduction of condition attribute
set C about the similar knowledge.
Algorithm 2: The lower/upper approximation distribution reduction about the dissimilar knowledge.
Suppose IDIS = <U,C∪{d}V,f>, C = {α_{1}, α_{2},…,α_{i}}
is the set of attributes, U/¬d = {¬D_{1}, ¬D_{2},…,
¬D_{i}} is the cover of U induced by decision attribute d and ¬D_{i}
= {x∈U: f (x,d) ≠ i}. The lower approximation distribution reduction
of condition attribute set C, is denoted by R. The steps of acquiring Reduction
R are as follows:
Step 1: 
Suppose R=ø£+ 
Step 2: 
If L_{DIM} (R) = L_{DIM} (C), then return to Step 4, otherwise
return to Step 3 
Step 3: 
For any α_{i}∈(CR), calculate S_{ai} (DIM,D),
when find the minimal value S_{ai} (DIM,D), get α_{i}
and replace R = R∪{α_{i}}, then, return to Step 2 
Step 4: 
Export R. R is a lower approximation distribution reduction of condition
attribute set C about the dissimilar knowledge 
Note: If the judgment condition L_{DIM} (R) = L_{DIM} (C) in step 2 is replaced by U_{DIM} (R) = U_{DIM} (C) and S_{ai} (DIM,D) becomes S^{ai} (DIM,D), then we can get R, where R is an upper approximation distribution reduction of condition attribute set C about the dissimilar knowledge.
Example: Table 1 is an IDIS, compute all of the definite positive and negative decision rules.
By Table 1, we know that U = {x_{1}, x_{2},
x_{3}, x_{4}, x_{5}, x_{6 },x_{7}} is
the domain of discourse, C = {a, b, c} is the set of condition attributes, d
is the single decision attribute and V_{a} = V_{b} = V_{c}
= V_{d} = {1, 2, 3}. By definition 1, we can draw the following sets:
By the single decision attribute d, U/d = {D_{1}, D_{2} D_{3}} = {{x_{1}, x_{7}},{x_{3}, x_{4}},{ x_{2}, x_{5}, x_{6}}}
By definition 2,
According to the heuristic algorithm 1, we can consider {a, b} as a reduction
of the conditionattribute set C through the program. Combining theorem 1, we
can obtain the following definite positive decision rule:
r_{1}: f (x, b) 
= 
1→f (x, d) = 1, // x_{1}, x_{7}∈POS_{SIM}
(¬D_{1}) 
r_{2}: f (x,a) 
= 
2→ f (x, d) = 2, // x_{3} ∈POS_{SIM} (¬D_{2}) 
r_{3}: f (x,a) 
= 
∧f (x, b) = 2→f (x, d) = 3 
r_{4}: f(x,a) 
= 
3∧f (x, b) = 3→f (x, d) = 3
// x_{1}, x_{7}∈POS_{SIM}(¬D_{3}) 
Table 1: 
Incomplete decision information system 

By the Table 1, U/¬d = {¬D_{1}, ¬D_{2},
¬D_{3}} = {{x_{2},x_{3}, x_{4}, x_{5},
x_{6}}, {x_{1} x_{2}, x_{5},x_{6},x_{7}},
{x_{1}, x_{3}, x_{4}, x_{7}}}, By definition
3, we can draw the following sets:
By definition 4, we have:
According to the heuristic algorithm of lower approximation distribution reduction,
we can consider {a, c} as a reduction of the conditionattribute set C through
the program. Further, we can obtain the following definite negative decision
rule:
CONCLUSION
In this study, the notion of similarity and dissimilarity knowledge granule based on the similarity relation is firstly introduced in IDIS. The lower and upper approximation sets based on similarity and dissimilarity knowledge granule, are constructed to acquire the positive and negative decision rules. By the properties of the new rough set model, we discuss the acquisition of the positive and negative decision rules, respectively. Finally, to obtain a simplified decision rule, we further introduce the heuristic algorithm of the attribute reduction based on the significant attributes and an example shows its effectiveness and feasibility.
ACKNOWLEDGMENTS
This study is supported by the Natural Science Foundation of China (No. 61100116),
Natural Science Foundation of Jiangsu Province of China (No. BK2011492), Natural
Science Foundation of Jiangsu Higher Education Institutions of China (No. 11KJB520004).