Fuzzy Swarm Based Text Summarization
Mohammed Salem Binwahlan,
Problem statement: The aim of automatic text summarization systems is to select the most relevant information from an abundance of text sources. A daily rapid growth of data on the internet makes the achieve events of such aim a big challenge. Approach: In this study, we incorporated fuzzy logic with swarm intelligence; so that risks, uncertainty, ambiguity and imprecise values of choosing the features weights (scores) could be flexibly tolerated. The weights obtained from the swarm experiment were used to adjust the text features scores and then the features scores were used as inputs for the fuzzy inference system to produce the final sentence score. The sentences were ranked in descending order based on their scores and then the top n sentences were selected as final summary. Results: The experiments showed that the incorporation of fuzzy logic with swarm intelligence could play an important role in the selection process of the most important sentences to be included in the final summary. Also the results showed that the proposed method got a good performance outperforming the swarm model and the benchmark methods. Conclusion: Incorporating more than one technique for dealing with the sentence scoring proved to be an effective mechanism. The PSO was employed for producing the text features weights. The purpose of this process was to emphasize on dealing with the text features fairly based on their importance and to differentiate between more and less important features. The fuzzy inference system was employed to determine the final sentence score, on which the decision was made to include the sentence in the summary or not.
||The velocity of the particle i in the time point t in the search space
along the dimension d
||The best position in which the particle previously got high fitness value,
it is called pbest
||The current position of the particle i in the search space
|r1 and r2
||Random generated numbers in the range [0,1]
||The overall best position in which a particle got best fitness value,
it is called the gbest
|c1 and c2
||Inertia weight, its value is decreased linearly over the time from 0.9-0.4:
||The new position which the particle must move to
||The current position of the particle
||The new velocity of the particle resulting in the calculation in (1) which
mainly determines the new position of the particle
The velocity of the particle must be in the range [Vmax, Vmin].
There are two types of PSO: Continuous particle swarm optimization which is to optimize continuous nonlinear problems and binary particle swarm optimization which is extension of continuous PSO, in which the particle position is represented as bit string rather than real numbers; the update of the position in continuous PSO is done directly by adding the velocity to the previous position but in binary PSO, the velocity is used only in the sigmoid function as in (3) to calculate the probability of the bit value to be changed to 1 or 0, where the value retrieved from the sigmoid function is compared with random generated value in the range between zero and one:
Text Features: The features used in this study are five:
||Sentence Centrality: The sentence centrality consists of three
features: The similarity, shared friends and shared n-grams between the
sentence in hand and all other document sentences, normalized by n-1, n
is the number of sentences in the document
||Title feature: This feature is formed as average of two features
which are title-help sentence (THS): The sentence containing n-gram terms
of title and title-help sentence relevance sentence (THSRS): The sentence
containing n-gram terms of any title-help sentence
||Word sentence score (WSS): It is calculated as the following:
||Minimum score the sentence gets in case it’s terms are not important
||As in (5) is the term weight (TF-ISF) of the term tij in the
||Summary length and HTFS is highest term weights (TF-ISF) summation of
a sentence in the document:
||Key word feature: The top 10 words whose high TF-ISF score
are chosen as key words
||The similarity to first sentence: This feature is to score
the sentence based on its similarity to the first sentence in the document,
where in news article, the first sentence in the article is very important
Fuzzy logic: The term "fuzzy logic" resulted in the development of the theory of fuzzy sets by Zadeh. Due to the limitation of classic logic is that deals only with two values, true or false created the need for extending it to be able to handle the partial truth (neither completely true nor completely false). The fuzzy logic is extension of the classical logic in form of generalization of the classical logic inference rules (like modus ponens, modus tollens and hypothetical syllogism) which has ability to deal with approximate reasoning. The fuzzy set is an elaboration for the traditional set "crisp set" in which each member has a degree of membership to that set determined by membership function. The membership function is a function assigns membership degree to each member in the target set, the range of membership degree between zero and one. The computer can translate linguistic statement into actions based on a set of such IF-THEN rules of the fuzzy logic. The fuzzy IF-THEN rules are normally created as the form "if A then B" in which the condition is connected with actions, where A and B are fuzzy sets. The fuzzy logic has advantage in terms of simplicity of development and modification because the rules are well understandable and easy to modify, add new rules or remove existing rules.
The typical fuzzy inference system consists of the following stages:
Fuzzification: Fuzzification is a kind of uncertainty that requires fuzzy sets, in which the input values are translated into grades of membership in the range between zero and one for linguistic terms of fuzzy sets using a membership function which is used to assign a grade to each linguistic term.
|| Typical fuzzy system
Inference: The inference is the core part of a fuzzy system, which merges the facts obtained from the fuzzification part with a series of production rules to perform the fuzzy reasoning process.
The most important fuzzy inference methods are Mamdani’s fuzzy inference method and Sugeno or Takagi-Sugeno-Kang method. The rule consequent in Mamdani fuzzy systems is represented using fuzzy sets, while in Takagi-Sugeno-Kang fuzzy systems; it is form as linear functions of input variables. Typical fuzzy system is shown in Fig. 1.
Defuzzification: The goal of defuzzification is to convert the fuzzy results of the inference into a crisp output.
The swarm based summarization: The swarm model is defined as combination of text features scores as in (6), where those features scores are adjusted using the weights resulting in the training of the Particle Swarm Optimization (PSO).
Therefore the first part in this model is for training PSO, 100 documents were
selected from Document Understanding Conference (DUC) data collection,
DUC 2002 and used as training and testing data. Figure 2-4
show the swarm model. The second part in this model is for testing the proposed
||The score of the sentence s
||The weighted of the feature i produced by PSO, i = 1-5
||The score of the feature i
|| Training the model
|| Swarm identification process
|| Testing the model
For more details and the experimental results refer to. In this study, we employ the fuzzy logic for scoring the sentences instead of the formula in (6).
The fuzzy swarm based summarization: To implement our fuzzy system, we use built-in Mamdani’s fuzzy inference method of Matlab fuzzy logic toolbox. Below the main parts of the fuzzy inference process are described:
Fuzzification: The inputs are crisp numerical values of five features used in our, those values are limited to the universe of discourse in the range [0, 1]. The features values are adjusted using the weights resulting in the training of the particle swarm optimization (PSO; this forms the central point of merging of the fuzzy logic with swarm intelligence. To determine the degree to which the input values belong to each of the appropriate fuzzy sets, we use the trapezoidal membership function due to its simplicity and widely use. Three fuzzy sets are used: Low, medium and high.
The trapezoidal membership function contains four parameters (a, b, c and d) with the four breakpoints of the trapezium which determine the shape of the function. Moreover the membership function is described by the two indices i and j. For example, the membership function Aij (aij, bij, cij, dij) belongs to the ith fuzzy set and the jth input variable. Bi (ai, bi, ci, di) is the output membership function of the ith fuzzy set. The rapezoidal curve is a function of a vector, x, (the jth fuzzy variable) in the ith fuzzy set and depends on the four scalar parameters a, b, c and d, as given by:
Or in short form:
The parameters a and d locate the "feet" of the trapezoid and the parameters b and c locate the "shoulders."
The output of the trapezoidal membership function is a fuzzy degree of membership (in the range [0, 1]) in the fuzzy set. Figure 5 shows the membership functions of fuzzification of the input value of the Sentence Centrality feature (SC).
Inference: The facts resulted in the fuzzification step need to be merged with a series of the production rules (IF-THEN rules) to perform the fuzzy reasoning process; we defined around 200 IF-THEN rules for that purpose. The following is an example for those rules:
||If (WSS is H) and (SC is H) and (S_FD is M) and (SS_NG is
H) and (KWRD is H) then (output is important)
The antecedent of the fuzzy rule in this example has more than one part. To get the output of such antecedent rule, the fuzzy operator is applied to obtain one number which will then be applied to the output function. We use AND operator, it was set as min (minimum) to select the antecedent part with minimum value as output of the antecedent rule. The output of the antecedent rule is used as input for implication process. In this process, we use min (minimum) to reshape the output fuzzy set by truncating it; the output fuzzy membership function is used in this study is the trapezoid membership function as shown in the Fig. 6.
The implication process is implemented for each fuzzy rule. The next sub-step in the inference process is aggregation of outputs of all fuzzy rules and combining them into a single fuzzy set which represents the final output variable.
Defuzzification: The last step in the fuzzy inference process is the defuzzification which is to convert the fuzzy results of the inference into a crisp output which represents the final score of the sentence.
||The trapezoid membership functions of the Sentence Centrality
||The trapezoid membership function of the output
We use the centroid method for defuzzification Eq. 9, which returns the center (one crisp number) of the area under the curve of the output fuzzy set resulting in the aggregation process:
The center of mass
||The membership in class c at value zj
After getting the scores of all sentences produced by the fuzzy inferences system, the sentences are reranked based on those scores in descending order, then the top n sentences are selected as summary, where n is equal to the compression rate which is 20% of the total number of the document sentences.
Generalizing the proposed method results via confidence limits: The aim of generalization is to get one value which can express all values in the population of the results. For each summary, evaluation values (recall, precision and f-measure) are created using the evaluation measure ROUGE. Measuring the performance of the proposed method needs to check each evaluation value separately. Doing so is tough job and a waste of resources. The solution is to use the sample of results (summaries evaluation values) to calculate a range within which any value in the population is likely (95% of the time) to fall. Therefore the range is called the 95% confidence interval. The minimum and maximum values in that range are called the confidence limits. The interval is all values between the confidence limits. The ROUGE generalizes the evaluation results using bootstrapping (resampling) method.
We evaluate the proposed method using the DUC 2002 document sets (D061j, D062j, D063j, D064j, D065j, D066j, D067f, D068f, D069f, D070f, D071f, D072f, D073b and D077b) comprising 100 documents.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) toolkit is used for evaluation, where ROUGE compares a system generated summary against a human generated summary to measure the quality. ROUGE is the main metric in the DUC text summarization evaluations. It has different variants, in our experiment, we use ROUGE-N (N = 1 and 2) and ROUGE-L, the reason for selecting these measures is what was reported by same study that those measures work well for single document summarization.
In DUC 2002 document sets, each document set contains two model or human generated summaries for each document. We gave the names H1 and H2 for those two model summaries. The human summary H2 is used as benchmark to measure the quality of our proposed method summary, while the human summary H1 is used as reference summary. Beside the human with human benchmark (H2-H1) (H2 against H1); we also use another benchmark which is MS word summarizer (Msword).
Table 1 shows a comparison between the proposed method evaluation
and the other three methods (the swarm model and the two benchmarks (Msword
and H2-H1)) based on the average recall, precision and F-measure using ROUGE-1,
ROUGE-2 and ROUGE-L, where those averages for the four methods (the proposed
method, the swarm model, Msword and H2-H1) were generalized using the confidence
limits (95%-confidence interval). The Fig. 7-9
visualize the same results drawn in the Table 1.
||The fuzzy swarm, swarm model, ms word summarizer and H2-H1
comparison: average recall using ROUGE-(1, 2 and l) at the 95%-confidence
||The fuzzy swarm, swarm model, ms word summarizer and H2-H1
comparison: Average recall using ROUGE-1
||The fuzzy swarm, swarm model, Msword summarizer and H2-H1
comparison: Average recall using ROUGE-2
||The fuzzy swarm, swarm model, Msword summarizer and H2-H1
comparison: Average recall using ROUGE-L
The purpose of using the human summarizer (H2-H1) as benchmark is to show how much the performance of the proposed method, the swarm model and Msword summarizers is acceptable compared with that performance of the human (H2-H1).
Based on the generalization of the results obtained by the four methods (the proposed method, the swarm model, Msword and H2-H1), the proposed method performs better than the swarm model and the Msword summarizer. The low overlapping between the summaries generated manually by the human makes achieving high evaluation values difficult even for human summarizer; we found that the overlapping between the two human summaries which we used in this study is 49% similar to each other.
The experimental results lead to two interesting observations. Firstly, the proposed method based on fuzzy logic and swarm intelligence (PSO) for text summarization problem showed good performance compared to other methods used in this study. Secondly, the low overlap between the summaries generated manually by the humans made achieving high evaluation values difficult. For instance, we found that the overlapping between the two human summaries (H2 and H1) which we used in this study is 49% similar to each other. The weights suggested by PSO promoted the scores of the highly important features, which give each text feature the right score it was worth. The experimental results supported the incorporation of fuzzy logic with swarm intelligence to make the risks, uncertainty, ambiguity and imprecise values for choosing the weights (scores) of the text features to be flexibly tolerated. For our future study, we will incorporate the proposed method with diversity based methods in a different hybrid model.
In this study, we introduced a method based on fuzzy logic and swarm intelligence (PSO) for text summarization problem. The weights suggested by PSO were used to adjust the text features scores. The fuzzy inference system was employed to use the adjusted features scores as inputs, based on which the sentences are evaluated and the most relevant sentences are selected to be included in the summary. The results showed that the proposed method has better performance outperforming the swarm model and the benchmark methods used in this study.
This project is sponsored partly by the Ministry of Science, Technology and Innovation under E-Science grant 01-01-06-SF0502, Malaysia." class="btn btn-success" target="_blank">View Fulltext