Creat membership Creat membership
Sign in

Forgot password?

Confirm
  • Forgot password?
    Sign Up
  • Confirm
    Sign In
home > search

Now showing items 1 - 16 of 107

  • Self-adjust Local Connectivity Analysis for Spectral Clustering

    Hui Wu, Guangzhi Qu   Xingquan Zhu  

    Spectral clustering has been applied in various applications. But there still exist some important issues to be resolved, among which the two major ones are to (1) specify the scale parameter in calculating the similarity between data objects, and (2) select propoer eigenvectors to reduce data dimensionality. Though these topics have been studied extensively, the existing methods cannot work well in some complicated scenarios, which limits the wide deployment of the spectral clustering method. In this work, we revisit the above two problems and propose three contributions to the field: 1) a unified framework is designed to study the impact of the scale parameter on similarity between data objects. This framework can easily accommodate various state of art spectral clustering methods in determining the scale parameter; 2) a novel approach based on local connectivity analysis is proposed to specify the scale parameter; 3) propose a new method for eigenvector selection. Compared with existing techniques, the proposed approach has a rigorous theoretical basis and is efficient from practical perspective. Experimental results show the efficacy of our approach to clustering data of different scenarios.
    Download Collect
  • Special issue on data mining applications and case study

    Xingquan Zhu  

    Download Collect
  • Relational pattern discovery across multiple databases

    A system and method of identifying relational patterns across a plurality of databases using a data structure and the data structure itself. The data structure including one or more data node branches, each of the one or more data node branches including one or more data nodes, each of the one or more data nodes representing a data item of interest and corresponding data item support values for the data item across the plurality of databases in relation to other data items represented in the data node branch. The data structure can be used to mine one or more relational patterns considering pattern support data across the plurality of databases at the same time.
    Download Collect
  • Rule Synthesizing from Multiple Related Databases

    Dan He, Xindong Wu   Xingquan Zhu  

    In this paper, we study the problem of rule synthesizing from multiple related databases where items representing the databases may be different, and the databases may not be relevant, or similar to each other. We argue that, for such multi-related databases, simple rule synthesizing without a detailed understanding of the databases is not able to reveal meaningful patterns inside the data collections. Consequently, we propose a two-step clustering on the databases at both item and rule levels such that the databases in the final clusters contain both similar items and similar rules. A weighted rule synthesizing method is then applied on each such cluster to generate final rules. Experimental results demonstrate that the new rule synthesizing method is able to discover important rules which can not be synthesized by other methods.
    Download Collect
  • Cross-Domain Semi-Supervised Learning Using Feature Formulation

    Xingquan Zhu  

    Download Collect
  • One-class learning and concept summarization for data streams

    Xingquan Zhu   Wei Ding   Philip S. Yu   Chengqi Zhang  

    Download Collect
  • CLAP: Collaborative pattern mining for distributed information systems

    Xingquan Zhu   Bin Li   Xindong Wu   Dan He   Chengqi Zhang  

    Download Collect
  • Active Learning From Stream Data Using Optimal Weight Classifier Ensemble

    Xingquan Zhu   Peng Zhang   Xiaodong Lin   Yong Shi  

    Download Collect
  • A lazy bagging approach to classification

    Xingquan Zhu   Ying Yang  

    In this paper, we propose lazy bagging (LB), which builds bootstrap replicate bags based on the characteristics of test instances. Upon receiving a test instance xk, LB trims bootstrap bags by taking into consideration xk's nearest neighbors in the training data. Our hypothesis is that an unlabeled instance's nearest neighbors provide valuable information to enhance local learning and generate a classifier with refined decision boundaries emphasizing the test instance's surrounding region. In particular, by taking full advantage of xk's nearest neighbors, classifiers are able to reduce classification bias and variance when classifying xk. As a result, LB, which is built on these classifiers, can significantly reduce classification error, compared with the traditional bagging (TB) approach. To investigate LB's performance, we first use carefully designed synthetic data sets to gain insight into why LB works and under which conditions it can outperform TB. We then test LB against four rival algorithms on a large suite of 35 real-world benchmark data sets using a variety of statistical tests. Empirical results confirm that LB can statistically significantly outperform alternative methods in terms of reducing classification error.
    Download Collect
  • A lazy bagging approach to classification

    Xingquan Zhu   Ying Yang  

    In this paper, we propose lazy bagging (LB), which builds bootstrap replicate bags based on the characteristics of test instances. Upon receiving a test instance xk, LB trims bootstrap bags by taking into consideration xk's nearest neighbors in the training data. Our hypothesis is that an unlabeled instance's nearest neighbors provide valuable information to enhance local learning and generate a classifier with refined decision boundaries emphasizing the test instance's surrounding region. In particular, by taking full advantage of xk's nearest neighbors, classifiers are able to reduce classification bias and variance when classifying xk. As a result, LB, which is built on these classifiers, can significantly reduce classification error, compared with the traditional bagging (TB) approach. To investigate LB's performance, we first use carefully designed synthetic data sets to gain insight into why LB works and under which conditions it can outperform TB. We then test LB against four rival algorithms on a large suite of 35 real-world benchmark data sets using a variety of statistical tests. Empirical results confirm that LB can statistically significantly outperform alternative methods in terms of reducing classification error.
    Download Collect
  • Using WordNet to Disambiguate Word Senses for Text Classification

    Ying Liu, Peter Scheuermann, Xingsen Li   Xingquan Zhu  

    In this paper, we propose an automatic text classification method based on word sense disambiguation. We use “hood” algorithm to remove the word ambiguity so that each word is replaced by its sense in the context. The nearest ancestors of the senses of all the non-stopwords in a give document are selected as the classes for the given document. We apply our algorithm to Brown Corpus. The effectiveness is evaluated by comparing the classification results with the classification results using manual disambiguation offered by Princeton University.
    Download Collect
  • Pushing Frequency Constraint to Utility Mining Model

    Jing Wang, Ying Liu, Lin Zhou, Yong Shi   Xingquan Zhu  

    Traditional association rules mining (ARM) only concerns the frequency of itemsets, which may not bring large amount of profit. Utility mining only focuses on itemsets with high utilities, but the number of rich-enough customers is limited. To overcome the weakness of the two models, we propose a novel model, called general utility mining, which takes both frequency and utility into consideration simultaneously. By adjusting the weight of the frequency factor or the utility factor, this model can meet the different preferences of different applications. It is flexible and practicable in a broad range of applications. We evaluate our proposed model on a real-world database. Experimental results demonstrate that the mining results are valuable in business decision making.
    Download Collect
  • Editorial: Special issue on mining low-quality data

    Xingquan Zhu   Taghi M. Khoshgoftaar   Ian Davidson and Shichao Zhang  

    Download Collect
  • Editorial: Special issue on mining low-quality data

    Xingquan Zhu   Taghi M. Khoshgoftaar   Ian Davidson   Shichao Zhang  

    Download Collect
  • Scalable Inductive Learning on Partitioned Data

    Qijun Chen, Xindong Wu   Xingquan Zhu  

    With the rapid advancement of information technology, scalability has become a necessity for learning algorithms to deal with large, real-world data repositories. In this paper, scalability is accomplished through a data reduction technique, which partitions a large data set into subsets, applies a learning algorithm on each subset sequentially or concurrently, and then integrates the learned results. Five strategies to achieve scalability (Rule-Example Conversion, Rule Weighting, Iteration, Good Rule Selection, and Data Dependent Rule Selection) are identified and seven corresponding scalable schemes are designed and developed. A substantial number of experiments have been performed to evaluate these schemes. Experimental results demonstrate that through data reduction some of our schemes can effectively generate accurate classifiers from weak classifiers generated from data subsets. Furthermore, our schemes require significantly less training time than that of generating a global classifier.
    Download Collect
  • Exploring video content structure for hierarchical summarization

    Xingquan Zhu   Xindong Wu   Jianping Fan   Ahmed K. Elmagarmid   Walid G. Aref  

    In this paper, we propose a hierarchical video summarization strategy that explores video content structure to provide the users with a scalable, multilevel video summary. First, video-shot- segmentation and keyframe-extraction algorithms are applied to parse video sequences into physical shots and discrete keyframes. Next, an affinity (self-correlation) matrix is constructed to merge visually similar shots into clusters (supergroups). Since video shots with high similarities do not necessarily imply that they belong to the same story unit, temporal information is adopted by merging temporally adjacent shots (within a specified distance) from the supergroup into each video group. A video-scene-detection algorithm is thus proposed to merge temporally or spatially correlated video groups into scenario units. This is followed by a scene-clustering algorithm that eliminates visual redundancy among the units. A hierarchical video content structure with increasing granularity is constructed from the clustered scenes, video scenes, and video groups to keyframes. Finally, we introduce a hierarchical video summarization scheme by executing various approaches at different levels of the video content hierarchy to statically or dynamically construct the video summary. Extensive experiments based on real-world videos have been performed to validate the effectiveness of the proposed approach.
    Download Collect
1 2 3 4 5 6 7

Contact

If you have any feedback, Please follow the official account to submit feedback.

Turn on your phone and scan

Submit Feedback