Popular Post

_

Wednesday, July 22, 2015

SETDS is PADBI: Based on "Performance Anomaly Detection and Bottleneck Identification" ARTICLE in ACM COMPUTING SURVEYS · JUNE 2015

The ResearchGate site detected the citation to two SEDS papers from the survey article and brought the following survey to my attention.

Performance Anomaly Detection and Bottleneck Identification 
Olumuyiwa Ibidunmoye, Francisco Hern´andez-Rodriguez, Erik Elmroth
Umea University, Sweden. July 3, 2015 

Abstract
In order to meet stringent performance requirements, system administrators must
eectively detect undesirable performance behaviours, identify potential root causes
and take adequate corrective measures. The problem of uncovering and understanding
performance anomalies and their causes (bottlenecks) in di↵erent system and application
domains is well studied. In order to assess progress, research trends and identify
open challenges, we have reviewed major contributions in the area and present our
findings in this survey. Our approach provides an overview of anomaly detection and
bottleneck identification research as it relates to the performance of computing systems.
By identifying fundamental elements of the problem, we are able to categorize existing
solutions based on multiple factors such as the detection goals, nature of applications
and systems, system observability, and detection methods.

Reading this (published also in ResearchGate site I got impression that is a very good overview of "PADBI" systems where SEDS has its place among other SPC/MASF ones. By the way the paper gives a short definition of MASF referencing the  Busen and Bereznay work:

"According to Bereznay ... [100], SPC is not suitable for interval based sampling data
such as system performance traces. This motivates the development of the Multivariate
Adaptive Statistical Filtering (MASF) method. MASF, [101] is a SPC framework for detecting
changes in a Gaussian distribution."

SEDS (2 references to SEDS CMG papsers) has got its places under SPC section in this survey:

Where: 

[100] Frank M Bereznay and Kaiser Permanente. Did something change? using statistical
techniques to interpret service and resource metrics. In Int. CMG Conference, pages
229–242, 2006.

[101] Jerey P Buzen and Annie W Shum. Masf-multivariate adaptive statistical filtering.
In Int. CMG Conference, pages 1–10, 1995.

[105] Igor A Trubin and Linwood Merritt. ” mainframe global and workload level statistical
exception detection system, based on masf”. In Int. CMG Conference, pages 671–678,
2004.

[106] Igor Trubin et al. Capturing workload pathology by statistical exception detection
system. In Proceedings of the Computer Measurement Group. Citeseer, 2005.

Nice to see our CMG folks mentioned in the review! In general, that is a most complete high level overview of all types of SETDS-like systems and methods I have ever read. And there are a lot of them mentioned in the article!

But a few things could be missed there, for instance the idea of using the EV - Exception Value -  to range the anomalies and to use that for detecting phases in the historical sample by analyzing this EV meta-metric. That is actually a way to cluster sample data in order to use it then for better prediction or correlation. See more details about EV here: The Exception Value Concept to Measure Magnitude of Systems Behavior Anomalies.