A Decomposition of the Outlier Detection Problem into a Set of Supervised Learning Problems

Outlier detection methods automatically identify instances that deviate from themajority of the data. In this paper, we propose a novel approach for unsupervised out-lier detection, which re-formulates the outlier detection problem in numerical data as aset of supervised regression learning problems. For each attribute, we learn a predictivemodel which predicts the values of that attribute from the values of all other attributes, andcompute the deviations between the predictions and the actual values. From those devi-ations, we derive both a weight for each attribute, and a final outlier score using thoseweights. The weights help separating the relevant attributes from the irrelevant ones, andthus make the approach well suitable for discovering outliers otherwise masked in high-dimensional data. An empirical evaluation shows that our approach outperforms existingalgorithms, and is particularly robust in datasets with many irrelevant attributes. Further-more, we show that if a symbolic machine learning method is used to solve the individuallearning problems, the approach is also capable of generating concise explanations for thedetected outliers.

Focus: Methods or Design
Source: Machine Learning Journal
Readability: Expert
Type: PDF Article
Open Source: No
Keywords: Outlier detection, Machine learning, Outlier explanations
Learn Tags: Data Collection/Data Set Data Tools Solution
Summary: A discussion of the use of an attribute-wise learning for scoring outliers (ALSO) approach to manage outlier data and a comparison of this method to classic methods of outlier detection.