ABSTRACT
In this study, we proposed a fuzzy data mining method to improve the situation, which changed the traditional assessment from a strict order into a flexible order. Experimental results have demonstrated the efficiency to reduce the amount of assessed questions. By using fewer questions will increase the participants willingness. Therefore, the high risk groups can be predicted rapidly and do further assessment and counseling.
PDF Abstract XML References Citation
How to cite this article
DOI: 10.3923/itj.2010.1133.1141
URL: https://scialert.net/abstract/?doi=itj.2010.1133.1141
INTRODUCTION
The monitoring and support of university students is considered very important at many educational institutions. With the popularity of higher education during recent years, universities and colleges have had more and more efforts on assessments to enhance students learning performance. For instance, the Consultation and Guidance Division of Student Affairs Office at Tamkang University in Taiwan should help more than 27,000 students in assessing their learning and study strategy, personality characteristic, career interest, etc. After the assessment, large amounts of survey data are offered to make follow-up analysis and provided with learning suggestions. Unfortunately, the number of available counselors in each school to provide these services is very limited. Conversely, the increased quantity of questions in a survey negatively influences the willingness of the participants to answer. Therefore, determining significant patterns in the traditional assessments appears essential to reducing the workload of counselors and increasing the participants willingness.
In recent years, computer technology is rapid progress. More and more learning diagnosis approaches are proposed for providing students with personalized learning suggestions. Researchers have attempted to develop more effective programs for assessment and to improve students learning performance. Such as Hwang (2005) proposed an algorithm for diagnosing students learning problems and providing personalized learning suggestions for science and mathematics courses. Chu et al. (2006) presented a learning diagnosis approach for providing students with personalized learning suggestions by analyzing their test results and test item related concepts. Onder (2006) designed a questionnaire to determine students` awareness of environmental issues and problems and their behaviour towards the environment at Selcuk University, Turkey. Lee (2007) presented a student model in the context of an integrated learning environment. Chen et al. (2007) utilized association rule for web-based learning diagnosis to identify learners misconception and helped them to promote the learning performance during learning processes. Bayati et al. (2009) designed to determine prevalence and related factors of depression in student of Arak, Iran. Madhyastha and Hunt (2009) proposed a method for mining multiple-choice assessment data to determine the similarity of concepts represented by the multiple-choice responses. Most of above mentioned researches proposed assessment for special purposes. The aim of these researches is for exploring the performance of the assessment. However, the amount of the assessment items are too much to effect the willingness for participants. Therefore, it can not get the expected results and add the counselors burden. In this case study, we proposed a fuzzy data mining method to change the traditional assessment from a strict order into a flexible order. The purpose is not to replace the original assessment, but provides a concepts and practices. By reducing the assessment items for increasing the willingness of participants, it can rapidly identify high-risk groups and do further assessment and counseling.
Learning and Study Strategy Scale Inventory (LASSI):
The Learning and Study Strategies Inventory was developed at the University of Texas at Austin. It is proven to accurately diagnose and prescriptive directly related to academic success. In this case study, a modified version was considered which more frequently used in Taiwan. It is an 11-scale, 87-item LASSI assessment, as shown in Table 1. First column lists LASSI scale which includes 11 study strategies. Each scale of study strategy abbreviated as 3-letter codes. Second column lists relevant scale included item (i.e., question) numbers. Any of the survey questions directly impact the score of the related scales. Bold and italics number represents a reverse-scored item.
Resources for this study were gathered from the LASSI of university freshmen and were provided by the Consultation and Guidance Division of Student Affairs Office at Tamkang University. Results have been provided to students to diagnose their strengths and weaknesses and suggestions have been given to enhance their capability through educational interventions, such as learning and study skills courses. Most students complete this assessment in approximately 25-30 min. The LASSI scales can be applied separately or as a whole. For the sake of convenience, the percentage rank norms of LASSI for the average university student are simplified as Table 2. Percentage ranks above 50 signify good in study strategies while below 50 signify poor. Instructors can refer to the students scores on each LASSI scale in the table to diagnose the students situations in the related learning and study strategies.
Data mining technology:
Data mining techniques are concerned with the discovery and extraction of latent knowledge from a database (Chang et al., 2001). Many algorithms are developed, proposed and applied: the decision tree, clustering, sequence clustering, association rule, Naïve Bayes, regression, neural network, etc. These techniques have become more popular and been frequently used in real-world applications, including customer relationship management (Wang et al., 2010), credit and loan evaluation (Wang et al., 2005), fraud detection (Hilas, 2009), medical diagnostics (Tsipouras et al., 2008) and educational systems (Lee, 2007). This study concerns learning and study strategy inventory from assessment. With data mining, essential patterns can be captured, learner behaviors can be predicted, decision tree and association rule are selected to further explain data mining.
Table 1: | The item number for LASSI |
![]() | |
The LASSI scale is composed of 11 study strategy. Each scale of study strategy abbreviated as 3-letter codes and lists relevant item numbers. Bold and italics number represents a reverse-scored item |
Table 2: | The percentage rank norms for LASSI |
![]() | |
Decision tree:
Decision tree is popular and powerful for both classification and prediction. The attractiveness of tree-based methods is due largely to the fact that decision tree represent rules (Berry and Linoff, 2004). A decision tree is based on the methodology of tree graphs and can be considered one of the more simple inductive study methods (Quinlan, 1986, 1993; Russell and Norving, 1995). Even if the user lacks any statistical knowledge, he or she can use a decision tree to analyze specific behavior. Rules can readily be expressed in English so that will be understood and be converted into a set of if-then rules with ease. However, if it becomes too complicated or too huge for decision-making, trimming some of its leaves or branches may become necessary in order to improve its effectiveness. Various decision tree algorithms such as ID3, C4.5 (Quinlan, 1993), CART (Breiman et al., 1984) and CHAID (Magidson and Vermunt, 2004) are the most well known.
Association rule:
Association rule analysis is used to discover elements that co-occur frequently within a data set consisting of multiple independent selections of elements (such as purchasing transactions) and to discover rules, such as implication or correlation, which is related to co-occurring elements (Agrawal et al., 1993; Srikant and Agrawal, 1995). A typical association rule is an implication of the form A⇒B, where A and B are itemsets of regarded items in a transaction database and AnB=φ. The association rule A⇒B means that the occurrence of all items in the itemset A implies the occurrence of all items in the itemset B. Performance of an association rule is determined by two factors, namely support and confidence, which must always be greater than the threshold (minimum support and minimum confidence). The support of the rule means the frequency of itemset A and itemset B coincide in all of the transactions. The confidence of the rule is the intensity and reliability of B acting upon the association rule, that is, the percentage of itemset B in the transactions of itemset A.
Fuzzy set concepts:
The theory of fuzzy sets was first proposed by Zadeh (1965, 1999). A fuzzy set is an extension of a crisp set. In crisp set, suppose the universal set is U, the element in universal set is x and A is a subset in universal set. Then the relation of x and A is xεA or xεA. It is established that there is one kind of two kinds of states. That may be represented xεA and xεA by the characteristic function of IA mapping x into two-element set {0,1}, where:
![]() |
It means that there are only 0 or 1 and two kinds of states.
In fuzzy set, suppose A is characterized by a membership function μA: U→[0, 1] that assigns to each object x of U, a degree of membership, in the continuum [0, 1], where A ={(x,μA(x))|xεU}, μA (x)ε[0, 1]. When the degree of membership for an object is 1, it means that the object is absolutely in the set. On the other hand, 0 means that the object is absolutely not in the set. Borderline cases are assigned to the values between 0 and 1. In the mathematical field, it can be treated as operations of crisp sets; the study of fuzzy sets also owns its basic operations, such as intersection, union and complement. Assume that A and B are fuzzy sets, A={(x,μA(x))|xεU} and B ={(x,μB (x))|xεU}, then the basic operations as follow:
![]() |
FUZZY DATA MINING METHOD
This study proposed an efficient method which integrated fuzzy set concepts and generalized data mining techniques. And the recommended assessing items depending on students answer instead of the traditional ones. This method is not only to reduce the number of assessing items but increasing the participants motivation. A fuzzy data mining method is based on fuzzy set theory which applies decision tree and association rule to classify and determine the meaningful rules. Figure 1 shows the processing flowchart of LASSI data mining. We use 40% of raw survey data from LASSI as the training set determines the characteristics and the correlations of all scale items in LASSI by decision tree analysis and association rule analysis. Then, thirty percent of raw data is used to modify the mining results. The rest thirty percent of raw data evaluates the efficacy of fuzzy data mining method.
![]() | |
Fig. 1: | The processing flowchart of LASSI data mining |
![]() | |
Fig. 2: | The results from decision tree analysis for motivation scale |
Data cleaning and reorganization are necessary in data preprocessing step. The purposes are removing incomplete and inconsistent information from data base and modifying the raw data format appropriate to fit data mining application. In addition, two major sub steps are processed in data mining step. First, candidate items were selected through decision tree analysis and therefore the classification trees of items for each scale were constructed. Secondly, the LASSI scales were prioritized through association rule analysis, to decide the order of the scales from which the associated items were selected.
Selecting candidate items: Applying decision tree analysis classifies and finds out the critical items in each scale. Figure 2 indicates the results produced from a decision tree analysis for the survey of the motivation (MOT) scale. The relevant item-numbers are 9, 12, 26, 30, 37, 44 and 51 as shown in Table 1. In Fig. 2, first column displays the tree shape which shows the classifications of answer conditions with respect to all related items within the MOT scale. Second column is node ID. It denotes the depth-level of a node in the tree visiting tree paths. Third column show the result of assessment prediction. Fourth column represents the proportion recorded with specific attribute conditions to whole attributes. Fifth column shows the probability required to satisfy the conditions. The applicable threshold of probability is 75% after repeating experiments.
According to the decision tree of scale motivation in Fig. 2, the inquiry is initiated by item number 26 (Topic No. 26). When the answer is 1, 2 or 3, the Points Received is poor. We assume that there are obstacles in the MOT scale of 53.8% probability due to the Support being less than 75%. We can follow the decision tree until the Support is more than 75%. On the other hand, if the answer is 4 or 5 for item number 26, then the Points Received is good. We assume that there are no obstacles on the MOT scale with a 91.6% probability. Consequently, we can visualize the tree graph as shown in Fig. 3 which was stored in database as a rule-base. The other of LASSI scales can also be created from decision tree analysis.
In the tree graph, internal nodes stand for the items of inquiry, terminal nodes stand for the assessed statuses and the answers list on relevant path. The shortest path in the tree graph of each scale represents the best case for poor (BCP) status or the best case for good (BCG) status; the optimal (fewest) number of candidate items required to be considered with respect to the assessment scale. Consider the example of scale motivation shown in Fig. 3; in the best case, three items, numbers 26, 9 and 44, determine those candidates with poor learning motivation; similarly, one item (number 26) can diagnose those who had no difficulty in learning motivation. Based on the answers provided by the participants, the number of assessed items may fall between the Optimal Case (OC)and the Worst Case (WC). As a result of decision tree analysis, the necessary items for the LASSI scales in the optimal and worst cases are compiled in Table 3. The scales of INP and SOS only used one item to diagnose the assessed status whatever the status is. Therefore, the four values of the Best Case for Poor (BCP), the Worst Case for Poor (WCP), the Best Case for Good (BCG) and the Worst Case for Good (WCG) equal one in INP and SOS scales. The last row is the summarization of 11 scales. Therefore, the value of assessed status for LASSI scales will fall between 11 (OC) and 54 (WC).
![]() | |
Fig. 3: | The tree graph of motivation scale |
Table 3: | Necessary items of LASSI based on decision tree analysis |
![]() | |
The OC is the minimum in BCP, WCP, BCG and WCG and the WC in the maximum in BCP, WCP, BCG and WCG. The value of assessed status for LASSI scales will fall between 11 (OC) and 54 (WC) |
Prioritizing LASSI scales:
Applying association rules analysis determines the referential rules between different LASSI scales. Three parameters, support, confidence and lift are taken into consideration to pick out the necessary rules. Figure 4 illustrates the segments of produced association rules between LASSI scales, where Rule is derived from association rule analysis. For instance, the first rule [Poor Motivation]⇒[Poor Attitude] means that if a respondent was indicated as Poor in learning motivation, it will affect his/her attitude and lead to Poor in learning attitude. Support means the proportion of records that satisfies the rule. For instance, the Support of the rule [Poor Motivation]⇒[Poor Attitude] is 28.7365%, which means 28.7365 records are Poor Motivation and Poor Attitude in all of 100 records. The Confidence is the probability of those satisfying two conditions in the rule. For instance, the first rules confidence is 77.3%, which means that there are 77.3% records with Poor Attitude in all of the records with Poor Motivation. Lift shows the association between the two scales in a rule. When Lift is greater than 1, the resulting rule is better at predicting the outcome than guessing whether the resulting item is present based on item frequencies in the data. If Lift is less than 1, there is a negative correlation. This work will omit the rules that offered a Lift is less than 1.3, Confidence is less than 75%, Support is less than 25% and those that conflict each other.
The rules produced from association rule analysis are shown in Table 4. There are nine association rules with poor status and three association rules with good status. The values of the Confidence are used to judge whether the rules are conformed to the prediction level. The applicable threshold of confidence is 75% after repeating experiments.
![]() | |
Fig. 4: | Parts of association rules produced by association analysis |
For instance, the second and third rules are [Poor SFT]⇒[Poor INP] and [Poor SFT]⇒[Poor SMI] in Table 4. The confidences of both rules are greater than 75%. Therefore, when students are found to be poor in self-testing abilities, the student will also be poor in information processing as well as the ability to select important points according to association rules.
When prioritizing the LASSI scales for selecting questions, we only consider the rules with confidence values above 75%. Accordingly, the correlation of each scale has to be calculated. Let A⇒B be a rule in association rules F, where A denotes a LASSI scale and B denote a set of scales, which is related to A. The correlation R of scale A is
![]() |
Where, SUMRELA (B) denotes a total of the scales related to A and Q(A) is the number of candidate items in scale A. SUMRELA (B) can be calculated by the following algorithm:
Algorithm 1: SUMRELA(B) (Summation of scales related to A)
![]() |
According to the definition, the correlations of the 11-scales in LASSI for poor association are listed in Table 5. These rules are easy to be converted into if-then rules. For instance, the fourth rule in Table 5 is if SFT is poor then INP and SMI are poor. The scales can be prioritized by comparing the correlation levels and the average confidence values.
Table 4: | : The processed rules from association rule analysis |
![]() |
Accordingly, the sequence for LASSI scales in assessing can be advised as SOS > SMI > CON > SFT > TTS > MOT >TMT > ANX > ATT > INP > STA. However, each scale of study strategys assessed result will affect other scale sequence. For example, when scale SOS is scored as poor, then scale SMI and TTS can be omitted in the suggested assessing sequence. Consequently, after the association rule analysis, 10 possible sequences of selecting scales as shown in Fig. 5 should prove helpful in accelerating the performance of assessment.
Association rule is utilized to identify the related scales, which can reduce the assessed scales and indirectly reduce the assessed items. As Fig. 5 indicates, the 0-3 scales could be omitted in the association rules tree model. For instance, the scales of SMI, TTS and TMT are omitted in case 1. Therefore, the necessary items of optimal case in case 1 summarize the optimal case of SOS, CON, ANX, ATT, INP, STA, MOT and SFT scales. The value of Optimal Case (OC) is eight. Similarly, the necessary items of the worst case in case 1 are the eight scales summarization. The value of the Worst Case (WC) is 37. In Fig. 5, there are 10 cases indicated as Table 6; the average values of OC and WC are 10.4 and 43.2. Therefore, applying fuzzy data mining method, the original LASSI assessment needed 87 questions to diagnose the obstacles of study strategies, where only 11 to 44 items are required.
![]() | |
Fig. 5: | The association rules tree |
Table 5: | The correlation of all study strategies for poor association |
![]() |
Table 6: | Necessary items of 10 cases |
![]() | |
There are 10 cases be induced by association rule analysis. OC means Optimal Case and WC means Worst Case. The average values of OC and WC are 10.4 and 43.2. Therefore, the values of assessed status for LASSI only 11 to 44 items are required |
PERFORMANCE EVALUATIONS
In this study, the goal is to diagnose obstacles in LASSI with as few items as possible. We proposed a fuzzy data mining method to achieve the goal. A web-based LASSI self-assessment system (Web-LSA) has been designed, which changed the traditional assessment in a strict order into a flexible order. The scale association rules and related candidate items are stored in the rule-base. Web-LSA provides an efficient self-assessment tool for the university students for diagnosing study or learning problems as well as providing a counseling-support system for counselors in Office of Student Affairs to organize assessment results in the database. All of the assessment results are scored and kept in the database of the Web-LSA. The experiment for performance evaluations can be achieved through the Web-LSA system.
Table 7: | Necessary items for fuzzy data mining method prediction |
![]() |
Table 8: | Accuracy of fuzzy method prediction |
![]() |
Evaluation of assessing efficacy:
Selecting 250 records randomly five times from thirty percent of raw data in order to evaluate the efficacy for applying fuzzy data mining method. The experimental results are compiled in Table 7 and 8. In Table 7 it displays five times selection experimental results, each experiment included the average required items for 11 scales and the total required items. The last column is the average of five time experiments. As mentioned previous section, testing data responses were traced by the recommended sequence of items based on the association rules tree in Fig. 5, some scales are omitted due to being related with other scales. Therefore, some values of scales are less than one in Table 7. The original LASSI assessment needs 87 items to diagnose the obstacles of study strategies. Currently, the average number of assessment just requires 16 items by applying fuzzy data mining method.
Evaluation of predicting accuracy:
Table 8 shows the accuracy of applying fuzzy method prediction, including the average accuracy of each scale in five time experiments. The last row is the average of 11 scales and the last column is the average of five time experiments. Therefore, the experimental accuracy is 80.5%. It means that fuzzy method prediction 80 of 100 participants in the testing group was classified into the same learning and study situations by comparing with the results of original LASSI assessment. Compared with the original assessment, the proposed fuzzy data mining method has successfully reduced testing items and has obtained an approximate accuracy.
CONCLUSIONS AND FUTURE DIRECTIONS
In this case study, we proposed a fuzzy data mining method for LASSI. It can recommend a flexible order of items in LASSI scales and reduce the amount of items effectively. This method is not to replace the original LASSI assessment but to provide an efficient model to support the counselors in predicating and to prevent students unwillingness to be assessed. From the analysis of the survey of freshmen at Tamkang University, the fuzzy data mining can reduce the amount of assessing items into 11 questions in the best case. Experimental results have demonstrated the effects of the fuzzy data mining method. Furthermore, only three to five minutes is required to complete an assessment through the Web-LSA system. The base rule of Web-LSA can be revised by taking surveys of all students in the university as the raw data to improve the accuracy and practicability of our research.
Moreover, our proposed fuzzy data mining technique is also an efficient model for diagnostic and predictive web-based assessment. In the future, we can apply this method for other questionnaires performed and scientific researches. It will be especially effective for large questions on questionnaires such as census and vote prediction. Therefore, more and more diverse applications will be provided.
REFERENCES
- Bayati, A., A.M. Beigi and N.M. Salehi, 2009. Depression prevalence and related factors in iranian students. Pak. J. Biol. Sci., 12: 1371-1375.
CrossRefPubMedDirect Link - Berry, M.J.A. and G.S. Linoff, 2004. Data Mining Techniques: For Marketing, Sales and Customer Relationship Management. 2nd Edn., Wiley Computer Publishing, New York, ISBN-10: 0471470643, pp: 672.
Direct Link - Chen, C.M., Y.L. Hsieh and S.H. Hsu, 2007. Mining learner profile utilizing association rule for web-based learning diagnosis. Expert Syst. Appl., 33: 6-22.
CrossRefDirect Link - Chu, H.C., G.J. Hwang, J.C.R. Tseng and G.H. Hwang, 2006. A computerized approach to diagnosing student learning problems in health education. Asian J. Health Inf. Sci., 1: 43-60.
Direct Link - Hilas, C.S., 2009. Designing an expert system for fraud detection in private telecommunications networks. Expert Syst. Appl., 36: 11559-11569.
CrossRef - Hwang, G.J., 2005. A data mining approach to diagnosing student learning problems in science courses. Int. J. Distance Educ. Technol., 3: 35-50.
CrossRefDirect Link - Lee, C.S., 2007. Diagnostic, predictive and compositional modeling with data mining in integrated learning environments. Comput. Educ., 49: 562-580.
CrossRefDirect Link - Madhyastha, T. and E. Hunt, 2009. Mining diagnostic assessment data for concept similarity. J. Educ. Data Mining, 1: 1-19.
Direct Link - Onder, S., 2006. A survey of awareness and behaviour in regard to environmental issues among selcuk university students in konya, Turkey. J. Applied Sci., 6: 347-352.
CrossRefDirect Link - Tsipouras, M.G., T.P. Exarchos, D.I. Fotiadis, A.P. Kotsia, K.V. Vakalis, K.K. Naka and L.K. Michalis, 2008. Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. IEEE Trans. Inf. Technol. Biomed., 12: 447-458.
PubMedDirect Link - Wang, Y.H., D.A. Chiang, S.W. Lai and C.J. Lin, 2010. Applying data mining techniques to WIFLY in customer relationship management. Inform. Technol. J., 9: 488-493.
CrossRefDirect Link - Wang, Y., S. Wang and K.K. Lai, 2005. A new fuzzy support vector machine to evaluate credit risk. IEEE Trans. Fuzzy Syst., 13: 820-831.
CrossRefDirect Link - Zadeh, L.A., 1999. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst., 100: 9-34.
CrossRef - Srikant, R. and R. Agrawal, 1995. Mining generalized association rules. Proceedings of the 21st International Conference on Very Large Databases, September 11-15, 1995, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA., pp: 407-419.
Direct Link - Agrawal, R., T. Imielinski and A. Swami, 1993. Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD International Conference on Management of Data, May 25-28, 1993, Washington, DC., USA., pp: 207-216.
CrossRef - Magidson, J. and J.K. Vermunt, 2004. An extension of the CHAID tree-based segmentation algorithm to multiple dependent variables classification the ubiquitous challenge. Proceedings of the 28th Annual Conference of the Gesellschaft fur Klassifikation e.V., March 9-11, University of Dortmund, pp: 1-8.
Direct Link