Tzung-Pei Hong
Ya-Fang Tung
Shyue-Liang Wang
Yu-Lung Wu
Min-Thai Wu

Fuzzy data mining is used to extract fuzzy knowledge from linguistic or quantitative data. It is an extension of traditional data mining and the derived knowledge is relatively meaningful to human beings. In the past, we proposed a mining algorithm to find suitable membership functions for fuzzy association rules based on ant colony systems. In that approach, precision was limited by the use of binary bits to encode the membership functions. This paper elaborates on the original approach to increase the accuracy of results by adding multi-level processing. A multi-level ant colony framework is thus designed and an algorithm based on the structure is proposed to achieve the purpose. The proposed approach first transforms the fuzzy mining problem into a multi-stage graph, with each route representing a possible set of membership functions. The new approach then extends the previous one, using multi-level processing to solve the problem in which the maximum quantities of item values in the transactions may be large. The membership functions derived in a given level will be refined in the subsequent level. The final membership functions in the last level are then outputted to the rule-mining phase to find fuzzy association rules. Experiments are also performed to show the performance of the proposed approach. The experimental results show that the proposed multi-level ant colony systems mining approach can obtain improved results. [All rights reserved Elsevier].

Frequent-itemset mining only considers the frequency of occurrence of the items but does not reflect any other factors, such as price or profit. Utility mining is an extension of frequent-itemset mining, considering cost, profit or other measures from user preference. Traditionally, the utility of an itemset is the summation of the utilities of the itemset in all the transactions regardless of its length. The average utility measure is thus adopted in this paper to reveal a better utility effect of combining several items than the original utility measure. It is defined as the total utility of an itemset divided by its number of items within it. The average-utility itemsets, as well as the original utility itemsets, does not have the "down ward-closure" property. A mining algorithm is then proposed to efficiently find the high average-utility itemsets. It uses the summation of the maximal utility among the items in each transaction with the tar get itemset as the upper bound to overestimate the actual average utilities of the itemset and processes it in two phases. As expected, the mined high average-utility itemsets in the proposed way will be fewer than the high utility itemsets under the same threshold. The proposed approach can thus be executed under a larger threshold than the original, thus with a more significant and relevant criterion. Experimental results also show the performance of the proposed algorithm. [All rights reserved Elsevier].

Shyue-Liang Wang, Jyun-Da Chen, Paul A. Stirpe
Tzung-Pei Hong

Based on given data center network topology and risk-neutral management, this work proposes a simple but efficient probability-based model to calculate the probability of insecurity of each protected resource and the optimal investment on each security protection device when a data center is under security breach. We present two algorithms that calculate the probability of threat and the optimal investment for data center security respectively. Based on the insecurity flow model (Moskowitz and Kang 1997) of analyzing security violations, we first model data center topology using two basic components, namely resources and filters, where resources represent the protected resources and filters represent the security protection devices. Four basic patterns are then identified as the building blocks for the first algorithm, called Accumulative Probability of Insecurity, to calculate the accumulative probability of realized threat (insecurity) on each resource. To calculate the optimal security investment, a risk-neutral based algorithm, called Optimal Security Investment, which maximizes the total expected net benefit is then proposed. Numerical simulations show that the proposed approach coincides with Gordon’s (Gordon and Loeb, ACM Transactions on Information and Systems Security 5(4):438–457, 2002) single-system analytical model. In addition, numerical results on two common data center topologies are analyzed and compared to demonstrate the effectiveness of the proposed approach. The technique proposed here can be used to facilitate the analysis and design of more secured data centers.

Machine learning can extract desired knowledge from existing training examples and ease the development bottleneck in building expert systems. Most learning approaches derive rules from complete data sets. If some attribute values are unknown in a data set, it is called incomplete. Learning from incomplete data sets is usually more difficult than learning from complete data sets. In the past, the rough-set theory was widely used in dealing with data classification problems. Most conventional mining algorithms based on the rough-set theory identify relationships among data using crisp attribute values. Data with quantitative values, however, are commonly seen in real-world applications. In this paper, we thus deal with the problem of learning from incomplete quantitative data sets based on rough sets. A learning algorithm is proposed, which can simultaneously derive certain and possible fuzzy rules from incomplete quantitative data sets and estimate the missing values in the learning process. Quantitative values are first transformed into fuzzy sets of linguistic terms using membership functions. Unknown attribute values are then assumed to be any possible linguistic terms and are gradually refined according to the fuzzy incomplete lower and upper approximations derived from the given quantitative training examples. The examples and the approximations then interact on each other to derive certain and possible rules and to estimate appropriate unknown values. The rules derived can then serve as knowledge concerning the incomplete quantitative data set. [All rights reserved Elsevier].

Tzung-Pei Hong
Ya-Fang Tung
Shyue-Liang Wang
Min-Thai Wu
Yu-Lung Wu

Data mining is often used to find out interesting and meaningful patterns from huge databases. It may generate different kinds of knowledge such as classification rules, clusters, association rules, and among others. A lot of researches have been proposed about data mining and most of them focused on mining from binary-valued data. Fuzzy data mining was thus proposed to discover fuzzy knowledge from linguistic or quantitative data. Recently, ant colony systems (ACS) have been successfully applied to optimization problems. However, few works have been done on applying ACS to fuzzy data mining. This thesis thus attempts to propose an ACS-based framework for fuzzy data mining. In the framework, the membership functions are first encoded into binary-bits and then fed into the ACS to search for the optimal set of membership functions. The problem is then transformed into a multi-stage graph, with each route representing a possible set of membership functions. When the termination condition is reached, the best membership function set (with the highest fitness value) can then be used to mine fuzzy association rules from a database. At last, experiments are made to make a comparison with other approaches and show the performance of the proposed framework. [All rights reserved Elsevier].

The frequent-pattern-tree (FP tree) is an efficient data structure for association-rule mining without generation of candidate itemsets. It was used to represent a database into a tree structure which stored only frequent items. It, however, needed to process all transactions in a batch way. In the past, Hong et al. thus proposed an efficient incremental mining algorithm for handling newly inserted transactions. In addition to record insertion, record deletion from databases is also commonly seen in real-applications. In this paper, we thus attempt to modify the FP-tree construction algorithm for efficiently handling deletion of records. A fast updated FP-tree (FUFP-tree) structure is used, which makes the tree update process become easier. An FUFP-tree maintenance algorithm for the deletion of records is also proposed for reducing the execution time in reconstructing the tree when records are deleted. Experimental results also show that the proposed FUFP-tree maintenance algorithm for deletion of records runs faster than the batch FP-tree construction algorithm for handling deleted records and generates nearly the same tree structure as the FP-tree algorithm. The proposed approach can thus achieve a good trade-off between execution time and tree complexity. [All rights reserved Elsevier].

Mining association rules is most commonly seen among the techniques for knowledge discovery from databases (KDD). It is used to discover relationships among items or itemsets. Furthermore, temporal data mining is concerned with the analysis of temporal data and the discovery of temporal patterns and regularities. In this paper, a new concept of up-to-date patterns is proposed, which is a hybrid of the association rules and temporal mining. An itemset may not be frequent (large) for an entire database but may be large up-to-date since the items seldom occurring early may often occur lately. An up-to-date pattern is thus composed of an itemset and its up-to-date lifetime, in which the user-defined minimum-support threshold must be satisfied. The proposed approach can mine more useful large itemsets than the conventional ones which discover large itemsets valid only for the entire database. Experimental results show that the proposed algorithm is more effective than the traditional ones in discovering such up-to-date temporal patterns especially when the minimum-support threshold is high.

Wireless networks and mobile applications have grown very rapidly and have made a significant impact on computer systems. Especially, the usage of mobile phones and PDA is increased very rapidly. Added functions and values with these devices are thus greatly developed. If some regularity can be known from the user mobility behavior, then these functions and values can be further expanded and used intelligently. This paper thus attempts to discover fuzzy personal mobility patterns for helping systems provide personalized service in a wireless network. The arrival time and the duration time of each location area visited by a mobile user are used as important attributes in representing the results. Since both the arrival time and the duration time are numeric, fuzzy concepts are used to process them and to form linguistic terms. A fuzzy mining algorithm has then been proposed, which is based on the AprioriAll algorithm, but different from it in several ways. The difference causes a delicate consideration in the design of the algorithm. An example is also given to demonstrate the algorithm. The linguistic representation of personal mobility patterns will be more natural and understandable for the system managers to provide better personalized service in a wireless network.[All rights reserved Elsevier].

Wireless networks and mobile applications have grown very rapidly and have made a significant impact on computer systems. Especially, the usage of mobile phones and PDA is increased very rapidly. Added functions and values with these devices are thus greatly developed. If some regularity can be known from the user mobility behavior, then these functions and values can be further expanded and used intelligently. This paper thus attempts to discover fuzzy personal mobility patterns for helping systems provide personalized service in a wireless network. The arrival time and the duration time of each location area visited by a mobile user are used as important attributes in representing the results. Since both the arrival time and the duration time are numeric, fuzzy concepts are used to process them and to form linguistic terms. A fuzzy mining algorithm has then been proposed, which is based on the AprioriAll algorithm, but different from it in several ways. The difference causes a delicate consideration in the design of the algorithm. An example is also given to demonstrate the algorithm. The linguistic representation of personal mobility patterns will be more natural and understandable for the system managers to provide better personalized service in a wireless network.

Machine learning can extract desired knowledge and ease the development bottleneck in building expert systems. Among the proposed approaches, deriving classification rules from training examples is the most common. Given a set of examples, a learning program tries to induce rules that describe each class. The rough-set theory has served as a good mathematical tool for dealing with data classification problems. It adopts the concept of equivalence classes to partition training instances according to some criteria. In the past, we thus proposed a fuzzy-rough approach to produce a set of certain and possible rules from quantitative data. Attributes are, however, usually organized into hierarchy in real applications. This paper thus extends our previous approach to deal with the problem of producing a set of cross-level maximally general fuzzy certain and possible rules from examples with hierarchical and quantitative attributes. The proposed approach combines the rough-set theory and the fuzzy-set theory to learn. It is more complex than learning from single-level values, but may derive more general knowledge from data. Fuzzy boundary approximations, instead of upper approximations, are used to find possible rules, thus reducing some subsumption checking. Some pruning heuristics are adopted in the proposed algorithm to avoid unnecessary search. A simple example is also given to illustrate the proposed approach.

Tzung-Pei Hong
Ya-Fang Tung
Shyue-Liang Wang
Min-Thai Wu
Yu-Lung Wu

Data mining is often used to find out interesting and meaningful patterns from huge databases. It may generate different kinds of knowledge such as classification rules, clusters, association rules, and among others. A lot of researches have been proposed about data mining and most of them focused on mining from binary-valued data. Fuzzy data mining was thus proposed to discover fuzzy knowledge from linguistic or quantitative data. Recently, ant colony systems (ACS) have been successfully applied to optimization problems. However, few works have been done on applying ACS to fuzzy data mining. This thesis thus attempts to propose an ACS-based framework for fuzzy data mining. In the framework, the membership functions are first encoded into binary-bits and then fed into the ACS to search for the optimal set of membership functions. The problem is then transformed into a multi-stage graph, with each route representing a possible set of membership functions. When the termination condition is reached, the best membership function set (with the highest fitness value) can then be used to mine fuzzy association rules from a database. At last, experiments are made to make a comparison with other approaches and show the performance of the proposed framework.

Tzung-Pei Hong
Chyan-Yuan Horng
Chih-Hung Wu
Shyue-Liang Wang

In this paper, we present a mining algorithm to improve the efficiency of finding large itemsets. Based on the concept of prediction proposed in the (n, p) algorithm, our method considers the data dependency in the given transactions to predict promising and non-promising candidate itemsets. Our method estimates for each level a different support threshold that is derived from a data dependency parameter and determines whether an item should be included in a promising candidate itemset directly. In this way, we maintain the efficiency of finding large itemsets by reducing the number of scanning the input dataset and the number candidate items. Experimental results show our method has a better efficiency than the apriori and the (n, p) algorithms when the minimum support value is small.

Machine learning can extract desired knowledge and ease the development bottleneck in building expert systems. Among the proposed approaches, deriving classification rules from training examples is the most common. Given a set of examples, a learning program tries to induce rules that describe each class. The rough-set theory has served as a good mathematical tool for dealing with data classification problems. It adopts the concept of equivalence classes to partition training instances according to some criteria. In the past, we thus proposed a fuzzy-rough approach to produce a set of certain and possible rules from quantitative data. Attributes are, however, usually organized into hierarchy in real applications. This paper thus extends our previous approach to deal with the problem of producing a set of cross-level maximally general fuzzy certain and possible rules from examples with hierarchical and quantitative attributes. The proposed approach combines the rough-set theory and the fuzzy-set theory to learn. It is more complex than learning from single-level values, but may derive more general knowledge from data. Fuzzy boundary approximations, instead of upper approximations, are used to find possible rules, thus reducing some subsumption checking. Some pruning heuristics are adopted in the proposed algorithm to avoid unnecessary search. A simple example is also given to illustrate the proposed approach.