CrossMine
Efficient Classification Across Multiple
Database Relations
Authors
Introduction
CrossMine is a tool for classification in relational databases. The training
set is a relational database that contains one target relation, in which every
tuple is associated with a class label. The test set is a database of the same
schema (and usually all relations remain same except the target relation), but
the tuples in the target relation are not labeled. The target relation contains
objects to be classified, and the other relations provide useful knowledge for
classification. Given a training set, CrossMine builds a rule-based classifier,
which can be applied on the test set to predict the class labels of unseen
tuples.
Usage Agreement
- Downloading is for internal research usage only. Redistribution and
commercial usage are not permitted.
- The downloaded software can be only used for performance testing. Please
contact the authors for other purposes.
Note
- Feedbacks are always welcome.
- Please contact xyin1@uiuc.edu for bugs
and questions about CrossMine.
User Manual
1. Data Format
The training set contains the following files:
- relations_train.des and relaltions_test.des: Description of
database schema for the training and testing data (these two files usually
only differ in the name of the target relation). Each of them contains the
following information. (1) The number and names of relations. (2) All joinable
attributes (only keys and foreign-keys need to be listed). (3) The target
relation and the target attribute (the class label). Only two values 1 and 2
can be used as class labels. If multi-class classification is to be performed,
please build a classifier for each class and select the class with highest
score as the predicted class.
- xxx.des and xxx.dat: Description and data of each relation.
The description of each relation should contain the names of attributes, their
types (nomial or numerical), and their representations (number or string). In
the data file, different values are separated by spaces. The first
attribute of the target relation must be its IDs, which are distinct for
all target tuples. It is suggested that the first attribute of each relation
contains the IDs of tuples in that relation.
2. Usage Instructions
- Run CrossMine.exe.
- Click "Operation→Read Trainset" and read in "relations_train.des". Now all
training data are read in, and you may navigate the data by clicking on
relations.
- Click "Operation→Build Rules" to build rules. Rules are output to "rule.txt".
- Click "Operation→Read Testset" to read in the testing data.
- Click "Operation→Classify All Tuples" to perform classification.
- Click on the ID of any target tuple (either in training or testing set) to
see the rule used for classifying this tuple.
Download CrossMine
Downloading is for internal research usage only. Redistribution and
commercial usage are not permitted. Download
References
- Xiaoxin Yin, Jiawei Han, Jiong Yang, and Philip S. Yu, "CrossMine:
Efficient Classification across Multiple Database Relations", in
Proc. 2004 Int. Conf. on Data Engineering
(ICDE'04),
Boston, MA, March 2004.
- Xiaoxin Yin, Jiawei Han, Jiong Yang. "Efficient Multi-relational
Classification by Tuple ID Propagation" in Workshop on Multi-relational
Data Mining in conjunction with KDD 2003, Washington, DC, Aug 2003.