This paper proposes a method for grouping trajectories as two-dimensional time-series data. Our method employed a two-stage approach. Firstly, it compared two trajectories based on their structural similarity, and determines the best correspondence of partial trajectories. Then, it calculated the value-based dissimilarity for the all pairs of matched segments, and outputs their total sum as the dissimilarity of two trajectories. We evaluated this method on two data sets. Experimental results on the Australia sign language dataset and chronic hepatitis dataset demonstrate that our method could capture the structural similarity between trajectories even in the presence of noise and local differences, and could provide better proximity for discriminating objects.

This paper proposes a method for grouping trajectories as two-dimensional time-series data. Our method employed a two-stage approach. Firstly, it compared two trajectories based on their structural similarity, and determines the best correspondence of partial trajectories. Then, it calculated the value-based dissimilarity for the all pairs of matched segments, and outputs their total sum as the dissimilarity of two trajectories. We evaluated this method on two data sets. Experimental results on the Australia sign language dataset and chronic hepatitis dataset demonstrate that our method could capture the structural similarity between trajectories even in the presence of noise and local differences, and could provide better proximity for discriminating objects.

A Pearson residual is defined as the residual between actual values and expected ones of each cell in a contingency table. This paper shows that this residual is represented as linear sum of determinants of 2 脳2, which suggests that the geometrical nature of the residuals can be viewed from grasmmanian algebra.

Chance discovery aims at understanding the meaning of functional dependency from the viewpoint of unexpected relations. One of the most important observations is that such a chance is hidden under a huge number of coocurrencies extracted from a given data. On the other hand, conventional data-mining methods are strongly dependent on frequencies and statistics rather than interestingness or unexpectedness. This paper discusses some limitations of ideas of statistical dependence, especially focusing on the formal characteristics of Simpson’s paradox from the viewpoint of linear algebra. Theoretical results show that such a Simpson’s paradox can be observed when a given contingency table as a matrix is not regular, in other words, the rank of a contingency matrix is not full. Thus, data-ordered evidence gives some limitations, which should be compensated by human-oriented reasoning.

Chance discovery aims at understanding the meaning of functional dependency from the viewpoint of unexpected relations. One of the most important observations is that such a chance is hidden under a huge number of coocurrencies extracted from a given data. On the other hand, conventional data-mining methods are strongly dependent on frequencies and statistics rather than interestingness or unexpectedness. This paper discusses some limitations of ideas of statistical dependence, especially focusing on the formal characteristics of Simpson’s paradox from the viewpoint of linear algebra. Theoretical results show that such a Simpson’s paradox can be observed when a given contingency table as a matrix is not regular, in other words, the rank of a contingency matrix is not full. Thus, data-ordered evidence gives some limitations, which should be compensated by human-oriented reasoning.

Hosptial information system (HIS) collects all the data from all the branches of departments in a hospital, including laboratory tests,physiological tests, electronic patient records. Thus, HIS can be viewed as a large heterogenous database, which stores chronological changes in patients’ status. In this paper, we applied trajectory mining method to the data extracted from HIS. Experimental results demonstrated that the method could find the groups of trajectories which reflects temporal covariance of laboratory examinations.

Along the enhancement of our social life level, people became to pay more attention to the risk of our society to ensure our life very safe. Under this increasing demand, modern science and engineering now have to provide efficient measures to reduce our social risk in various aspects. On the other hand, the accumulation of a large amount of data on our activities is going on under the introduction of information technology to our society. This data can be used to efficiently manage the risks in the society. The Workshop on Risk Mining 2006 (RM2006) was held in June, 2006 based on these demand and situation while focusing the risk management based on data mining techniques [1,2]. However, the study of the risk management has a long history on the basis of mathematical statistics, and the mathematical statistics is now making remarkable progress in the data analysis field. The successive workshop in this year, the International Workshop on Risk Informatics (RI2007), extended its scope to include the risk management by the data analysis based on both data mining and mathematical statistics.

Finding temporally covariant variables is very important for clinical practice because we are able to obtain the measurements of some examinations very easily, while it takes a long time for us to measure other ones. Also, unexpected covariant patterns give us new knowledge for temporal evolution of chronic diseases. This paper focuses on clustering of trajectories of temporal sequences of two laboratory examinations. First, we map a set of time series containing different types of laboratory tests into directed trajectories representing temporal change in patients’ status. Then the trajectories for individual patients are compared in multiscale and grouped into similar cases by using clustering methods. Experimental results on the chronic hepatitis data demonstrated that the method could find the groups of trajectories which reflects temporal covariance of platelet, albumin and choline esterase.

This paper shows problems with combination of rule induction and attribute-oriented generalization, where if the given hierarchy includes inconsistencies, then application of hierarchical knowledge generates inconsistent rules. Then, we introduce two approaches to solve this problem, one process of which suggests that combination of rule induction and attribute-oriented generalization can be used to validate concept hiearchy. Interestingly, fuzzy linguistic variables play an important role in solving these problems.

Pawlak showed that knowledge can be captured by data partition and proposed a rough set method where comparison between data partition gives knowledge about classification. Interestingly, thes approximations correspond to the focusing mechanism of differential medical diagnosis; upper approximation as selection of candidates and lower approximation as concluding a final diagnosis. This paper focuses on severl models of medical reasoning shows that core ideas of rough set theory can be observed in these diagnostic models.

International workshop on Risk Mining (RM2006) was held in conjunction with the 20th Annual Conference of the Japanese Society for Artificial Intelligence(JSAI2005), Tokyo Japan, June 2005. The workshop aimed at sharing and comparing experiences on risk mining techniques applied to risk detection, risk clarification and risk utilization. In summary, the workshop gave a discussion forum for researchers working on both data mining and risk management where the attendees discussed various aspects on data mining based risk management.

This paper focuses on statistical independence of three variables from the viewpoint of linear algebra. While information granules of statistical independence of two variables can be viewed as determinants of 2 脳2- submatrices, those of three variables consist of linear combination of odds ratios.

This paper shows formal analysis of a contingency table based on its marginal distributions. The main approach is to make an expected matrix from two given marginal distributions and take the difference between original cell values and expected values to construct a residual matrix. The most important characeristics of a residual matrix are following: (1) Its determinant is equal to 0, which implies the rank of this matrix is less than the rank of an original matrix. (2) Each element of a residual matrix can be represented as a linear combination of 2 脳2 subderminants. These characteristics shows that the residual of a contingency matrix is closely related with 2 脳2 subderminants, which also shows that the χ 2 test statistic is a function of 2 脳2 subderminants and marginal sums and suggests that distribution of determinants should have an important meaning for this statistic.