Popular Post

_

Tuesday, December 29, 2009

Exception Value (EV) and OPNET Panorama

I have recently looked at the following OPNET resources to get impressions of OPNET Panorama tool:

1. Link to website: http://www.opnet.com/solutions/application_performance/panorama.html
2. White paper downloaded from that site: "Understanding OPNET Panorama’s Performance Analysis Engines"

General comment: OPNET becomes the next generation tool provider that has “learning behavior” capabilities that are similar with what I do for years with my SEDS and with tools from other Vendors, such as Netuitive, Integrien and ProactiveNet (BMC) that I have recently studied (check my older postings: http://itrubin.blogspot.com/2009/02/realtime-statistical-exception.html).

Special comment: In the OPNET white paper I read: "Metrics that exhibit deviations from normal are automatically identified and assigned scores based on “how abnormal” their behavior is."
This is very close to what I introduced in my 1st CMG paper in 2001("Exception Detection System, Based on the Statistical Process Control Concept" ) and called ExtraVolume of a metric (I call that now Exception Value (EV) meta-metric). In OPNET’s white paper referenced to that my CMG’01 paper, but they did not mention that they use very similar approach to rang exceptions (“Area Out vs. Limit Range In Metric Scoring”).
Here is example from my 1st CMG paper of the usage EV metric to build the TOP exceptional Unix servers list:


I had even tied to normalize that metric to some Unix benchmarks (TPC) to compare ranges of exceptional capacity usage ecross diferent type of servers and configurations. The example of the report is in the paper.

Al in all, it is a good news that another vendor has been adopting that technology (maybe with some of my work’s influence!). Based on my experience with OPNET tools (very limited, just ITD Guru for network and some server behavior simulations), that tool most likely can be trusted. To speak more I need at least to play with demo….

Tuesday, November 10, 2009

SEDS-Lite Introduction

In the purpose of sharing in codes some ideas of exception detection metodology I am developing the SEDS-Lite version using "R"-scripting (http://www.r-project.org/). One of the scripts (cchrt.r - see front picture of this post) has been already published on my blog and that builds control charts against CSV data: http://itrubin.blogspot.com/2009/03/power-of-control-charts.html




How exactly that works  and more R-scripts will be presented on my workshop at up-coming CMG'09 conference in Dallas (http://itrubin.blogspot.com/2009/07/my-cmg09-sunday-workshop.html ) - you are welcome to attend!

Some additional R-scripts can be found in my SCMG presentation http://itrubin.blogspot.com/2009/05/seds-charts-at-scmg.html

The trick is the SAS 9.2 can execute R scripts. You can also try another SAS-like product (http://www.teamwpc.co.uk/products/wps) which also understands R, plus there are some ways to use SAS data for R-graphing: http://www.hollandnumerics.co.uk/pdf/SAS2R2SAS_paper.pdf  or just using the "SAS and R" good book ( I have recently bought that and highly recommend): http://sas-and-r.blogspot.com/

Wednesday, October 21, 2009

Lower Control Limit Usage Examples for IT Capaciy Management

I have recently posted the following question as LinkedIn discussion subject for "Statistical Process Control" group: "Does it make any sense to use Control Charts for capacity management?" and got one pessimistic comment, which included the following statement:

"...The only situation I can think of using a control chart for capacity is if you had a piece of equipment that if over utilized would cause damage or premature wear in which case you would only have an upper control..."

I disagree. My system (SEDS) has a special part (updated lists) called "Unusual Capacity Usage OUTSIDERS" that can help to capture some serious issues with servers, such as database going down, LPAR migration out of a host and other unusual capacity releases, that  are not necessarily good things:

The following control charts from my up-coming CMG'09 workshop presentation are good illustrations of those type of finding SEDS captures:

1. Vmware host issue (VM migration):



2. Unisys server database is down:




3. Mainframe application unusual low CPU usage:


Sunday, September 20, 2009

Near-Real-Time IT-Control Charts

On the next Thursday September 24, 2009 in the Richmond's SCMG meeting I am going to present my updated version of previous presentation called "Power of Control Chart". This time the focus is on Near-Real-Time IT-Control Charts. Below is the clip that shows the example of Near-Real-Time IT-Control Chart simulated by R-program:

The presentation will be published in SCMG site: http://regions.cmg.org/regions/scmg/fall_09/richmond/meeting_09_24_09b.htm

Thursday, August 27, 2009

IT-Chart: The Best Way to Visualize IT Systems Performance

How to see most current metric data, most recent one and also in retrospective, but in one picture? Is that possible? Yes, it is.
I guess the simple radar in a plane or ship cockpit refreshes current data on a top of most recent and shows approaching "future". The SEDS control chart is similar and uses the border line to separate current data and most recent one. Plus it gives a historical base-line to show you what can be expected and for comparison.

I believe it is powerful way to visualize IT Systems Performance, so I made up a name for that chart: "IT-CHART".
(Not only because IT is my initials....)

I plan to add the pictures from this blog as additional slides to my CMG'09 workshop, which includes the R-script to build IT-CHART based on CSV input. (see abstract here)

Thursday, July 23, 2009

Real-Time Control Charts for SEDS

I still analyze different tools that capture computer application abnormalities based on real-time data. In addition to Integrien (now it is a part of VMware's tool called AliveVMand Netuitive I have recently looked at BMC ProActive Net Analytics. I have spoken with BMC SMEs and they showed me a live demo of the tool. I always respected BMC (and espetialy BGS) as actually the inventor of this approach (MASF) and long ago I used to analyze statistical exceptions using BMC Visualizer and BMC Perceive (BTW I have published in my papers a few examples how I did that) . Now they have another and very good tool for the same purpose (http://documents.bmc.com/products/documents/49/13/84913/84913.pdf)

Watching the live presentation I got a positive impression of how that works for complex applications and transactions correlating different abnormal events with possibility to reduce false positives situations. Interesting that the combination of dynamic and static thresholds are used there to generate alarms. Just like SEDS does - static one to capture hot issues (run-aways and leaks) and statistical ones for early warnings.

Now I have a very difficult task to choose from those three products (plus SEDS) to recommend to my management...

Speaking about SEDS, I have decided to play with near-real time data to see how difficult would be to redesign SEDS making it works more similar with mentioned above modern and serious tools. Fortunately SEDS is just a bunch of SAS Marcos with parameters which helped me to make the adjustment needed to include today's data. And surprisingly that was pretty easy task! I spent only a couple days to developed a "real-time SEDS" prototype. Currently what it only does is building every hour the real-time Control Charts that can be seen at the beginning of this post.

I plan to include some details about real-time Control Charts to my upcoming CMG'09 Workshop.

Thursday, July 16, 2009

The Performance and Capacity Analyst Bookshelf

0. PERSONAL: Alex Podelko's Capacity/Performance links collection. - Richest in the Internet!

1. ON-LINE
- Guerrilla Capacity Planning by Neil J. Gunther, M.Sc., Ph.D. and Guerrilla Capacity Planning PART II: Weapons of Mass Instruction by Neil J. Gunther

- Ray Wicks: Getting Started in z/OS Capacity Planning
Part2 Getting Started in z/OS Capacity Planning
Part3 Getting Started in z/OS Capacity Planning
Part4 Getting Started in z/OS Capacity Planning
Part5 Getting Started in z/OS Capacity Planning

2. TO BUY
- John Allspaw:The Art of Capacity Planning ...
- The Performance and Capacity Analyst Bookshelf by Rick Ralston and Dan Schwarz

Monday, July 6, 2009

My CMG’09 Sunday Workshop

(2010 UPDATE: based on the workshop the CMG'10 paper is written and will be published in the CMG conference - http://itrubin.blogspot.com/2010/11/my-cmg10-presentation-it-control-charts.html) 

My workshop entitled
"Power of Control Charts: How to Read, How to Build, How to Use
has been accepted for the CMG’09 Sunday Workshop program to be held at the Gaylord Texan in Dallas, Texas, December 6, 2009 (http://cmg.org/conference/cmg2009/)

The workshop proposal is following:

One of the most powerful ways to visualize computer system behavior is the Control Chart. Originally used in Mechanical Engineering, it has become one of the main Six Sigma tools to optimize business processes, and after some adjustments it is used in IT Capacity Management area especially in “behavior learning” products.

During the workshop the following topics will be discussed: What is the Control Chart? Where the Control Chart is used: review of some systems performance tools that use it. Control chart types: MASF charts vs. SPC. Gallery of already published charts in CMG papers plus some new charts with explanations on how to read them. How to build a Control Chart: using Excel for interactive analysis and R to do it automatically. The session includes a live demonstration of Excel to build different types of control charts against real performance data. Attendees will be provided CDs with the data in spreadsheets and will build Control Charts themselves even with their own data. Finally, they will be able to run an R-script to build a Control Chart based on input CSV data.

This workshop is based on series of CMG papers published by the author. The prototype of the workshop was presented twice this year in Southern CMG meetings in VA and NC.

The presentation slides are already published here: https://www.researchgate.net/profile/Igor_Trubin/publication/259486489_TrubinCMG2009_IT_ControlCharts_SCMG_Fall/data/59c14e9e0f7e9b21a82657b6/CMG2009-Workshop-Trubin.pptx 

Thursday, July 2, 2009

Capacity Management Found in Translation

I have just created my 2nd blog to share my technical ideas and thoughts in Russian. If you can read Russian please visit http://www.ukor.blogspot.com/.

The name of my new blog is "Управление Вычислительной Мощностью" which simply means “Capacity Management”. That term translation I have recently found in a Russian article (click here to read) which was published in 2008 by Enterprise Systems and Software Laboratory, HP Laboratories Palo Alto. I was so glad that I had finally figured out how “Capacity Management” is said in Russian! The past 10 years doing Capacity Management I always had a problem explaining to my Russian friends and relatives what my occupation was! Now I know and that fact inspired me to start my new blog for Russian readers.

Another reason is the 20th anniversary of my 1st program which I wrote and sold. That was the graphical editor with some CAD features I wrote using FORTRAN for PC with PDP type of processor (DVK-3). The name of that program was UKOR (In Russian that means “REPROOF”). That’s why the link to my new blog is “ http://www.UKOR.blogspot.com/ “!

Wednesday, June 17, 2009

Management by Exception: Business vs. System

Management by Exception is actually an old idea and it is used for the Business Process management and even for the Accounting as defined in the following website that I have recently found: http://www.allbusiness.com/glossaries/management-by-exception/4944378-1.html

Wikipedia, referring to the same source, defines Management by Exception as a
"policy by which management devotes its time to investigating only those situations in which actual results differ significantly from planned results. The idea is that management should spend its valuable time concentrating on the more important items (such as shaping the company's future strategic course). Attention is given only to material deviations requiring investigation."

I would say if one applies this definition to IT, it turns to my term "System Management by Exception" where the "management" is Capacity management analysts or Capacity planners and "material deviations" are servers or applications' exceptions.

Speaking about applications' exceptions, currently I am working on applying “Management by Exception” approach not to servers farm capacity management (I think I have already done this successfully) but to a set of applications to produce automatically the list of only those applications that are having some exceptions (unusual but not yet deadly behavior) to help providing proactive application capacity/performance management. Why? Because in some IT environments with large number of applications the centralized capacity management does not exist and application support teams have to play that role and SEDS (System Management by Exception tool) should deliver automatically them what systems need attention within each exceptional application.

Wednesday, June 10, 2009

CMG Board of Directors Nomination

2020 UPDATE:

#CMGnews: I have been re-elected again to Computer Measurement Group (www.CMG.org) #BoardOfDirectors


Final update:

2015 UPDATE:
Thanks for all who voted for me last year! I am resubmitting my nomination again for this year.

2014 UPDATE:
This year I was nominated again !
So, if you are a CMG.org  member, please vote! How to vote check HERE.

By Chair of 2009 CMG Nominating Committee I have been asked to nominate myself to CMG Board of Directors. Apparently I am qualified for that and I believe it is a great honor. I have decided to do that and below it is my nomination statement.

Willingness to Serve:
CMG has been an extremely valuable part of my professional life for the past ten years. Because of CMG, I became a known specialist in IT Capacity Management discipline! I have already worked at the local level to support the organization and would like to serve on CMG's Board of Directors to continue promoting the organization throughout the IT community. My company and family members support my involvement with and commitment to CMG.

Professional Work Experience:I have over 30 years of experience in the IT field. I have started my career in 1979 as an IBM 370 system engineer. In 1986, I received my PhD. in Robotics at St. Petersburg Technical University (Russia), where I then taught full-time such subjects as CAD/CAM, Robotics and Computer Science for about 12 years. I have published more than 30 papers and made several presentations for different international conferences related to the Robotics, Artificial Intelligence and Computer fields. In 1999, I moved to the US and worked at Capital One bank as a Capacity Planner. My first CMG paper was written and presented in 2001. The next one, "Global and Application Level Exception Detection System Based on MASF Technique," won a Best Paper award at CMG’02 and was presented again at UKCMG’03 in Oxford, England. My CMG’04 was republished in the IBM z/Series Expo. I also presented my papers in Central Europe CMG conference (Austria) and at numerous US regional meetings. After working more than two years as the Capacity Management Team Lead for IBM, in 2007 I have accepted a Senior Capacity Planner position at SunTrust Bank where I am currently employed.

Other Professional Experience:I have a long experience working as a programmer. I have also acquired extensive managerial experience working as the Head of the CAD/CAM University’s lab and Team Lead at IBM. Since March 2005, I have been severing as Vice Chair of Southern CMG, providing vendors connections.

Candidate Statement:I believe that I am uniquely qualified and motivated to serve CMG and its future development, as the IT landscape changes. My major accomplishment is Statistical Exception Detection System (SEDS) for IT Capacity Management. SEDS ideas and techniques are published in a series of my CMG papers over the period of the last ten years and also in this technical blog. My position as a Capacity Management expert and my dedication to the CMG organization will allow me to contribute in substantial ways. I further believe that my teaching experience could enhance CMG’s training and educational services for technical community. If elected, I will diligently pursue innovative ways to strengthen the organization’s membership. I will continue the CMG’s dedicated tradition of volunteerism and will actively seek ways to support and improve CMG's commitment to supporting its members.

If you are CMG member, please vote for me!

Thursday, May 7, 2009

SEDS charts at SCMG


SCMG has just held two great meetings:


I was able to demonstrate in live some control charts building technique including R scripting. It encouraged me to submit workshop proposal for CMG'09 "Power of Control Charts: How to Read, How to Build, How to Use".



If it's accepted, please come to see my workshop in December 6 in Dallas, Texas : http://www.cmg.org/conference/!

P.S. Interesting news I got from SCMG meeting about R: The SAS v. 9.2 can execute R scripts. Does anybody try?

Wednesday, March 25, 2009

Performance Anomaly ("Perfomaly") Detection. Parts 1-4: Power of Control Charts


_______
Based on my old  workshop (Power of Control Chart), which I ran a few times a several years ago I develop the updated version of it and that will be the  part of  a training course 

Performance Anomalies ("Perfomalies") Detection

That will consist of the following parts:

1. Introduction to Performance Anomaly Detection
2. Detecting performance anomalies by Control Charts - lecture.
3. Building control charts by using Excel - hands-on exercises.
4. Detecting performance anomalies by Control charts using R on cloud server ( AWS ) - hands-on exercises.
            - includes the Instruction video how to build R environment on AWS cloud
5. Detecting Novelties in performance data by using Exception Value (EV) approach (type of “knee” detection) - lecture
6. Detecting Novelties in performance data - hands-on exercises.
7. Detecting normality in the performance workload data by neural nets and deep learning – lecture’
8. Detecting normality by using R and R NN packages - hands on exercises.
9. Detecting anomalous short living objects by using entropy calculation - lecture

10. Detecting anomalous short living objects – hands-on exercises.

So the parts 1-4 is the updated version of my old workshop about:  

- What is the Control Chart? - A little bit of theory and history.
- Where the Control Chart is used: Review of some systems performance tools on a market that built and use control charts.
- How SEDS uses that - MASF charts vs. SPC ones; long gallery of already published charts in the CMG papers plus some new ones with explanations how to read them.
- How to build Control chart: using Excel for interactive analysis and
R to automate the control charts generating with live demonstration of the technique.

- NEW: How to build the R environment (Rstudio) in the cloud (AWS) server (EC2) and using R code to build control charts against your own data. 

Data and R script for testing AWS EC2 with R environment and for the 1st R exercise:

Below is the data in CSV format (supposed to copy to test.csv file) and simple R script to build the monthly profile of some real Unix file system space utilization in form of a monthly Control Chart. 

TEST.SCV

day,CurrentMonthData,UpLimit,Mean,LowLimit
1,0.45,0.54,0.42,0.31
2,0.45,0.54,0.42,0.31
3,0.45,0.54,0.42,0.31
4,0.45,0.54,0.42,0.31
5,0.45,0.54,0.42,0.31
6,0.45,0.53,0.43,0.32
7,0.45,0.54,0.43,0.32
8,0.45,0.54,0.43,0.32
9,0.45,0.53,0.43,0.33
10,0.45,0.53,0.43,0.33
11,0.45,0.53,0.43,0.33
12,0.72,0.53,0.43,0.33
13,0.72,0.53,0.43,0.33
14,0.72,0.53,0.42,0.32
15,0.45,0.53,0.42,0.32
16,0.45,0.55,0.43,0.31
17,0.45,0.55,0.44,0.33
18,1.00,0.54,0.44,0.33
19,0.84,0.54,0.44,0.33
20,0.84,0.54,0.44,0.34
21,0.84,0.54,0.44,0.34
22,,0.54,0.44,0.34
23,,0.52,0.44,0.36
24,,0.52,0.44,0.36
25,,0.51,0.43,0.36
26,,0.66,0.46,0.26
27,,0.66,0.46,0.25
28,,0.62,0.45,0.28
29,,0.62,0.45,0.28
30,,0.54,0.43,0.32
31,,0.54,0.43,0.32


## R script to plot control chart CSV input - I.Trubin
###############################################################
cchrt=read.table('test.csv', header=T, sep=",")

plot(    cchrt[,1],cchrt[,2], type="l",col="black",  ylim=c(0,1),lwd=2,ann=F)

points(cchrt[,1],cchrt[,3],type="l",col="red",       ylim=c(0,1),lwd=1,ann=F)
points(cchrt[,1],cchrt[,4],type="l",col="green",   ylim=c(0,1),lwd=1,ann=F)
points(cchrt[,1],cchrt[,5],type="l",col="blue",     ylim=c(0,1),lwd=1,ann=F)

mtext("# of transactions (K)",      side=2, line=3.0)
mtext("days of month",           side=1, line=3.0)
mtext("CONTROL CHART", side=3, line=1.0)

legend(9,0.3,c("Current Month","UpperLimit","Mean","LowerLimit"),
                col=c("black","red","green","blue"),lwd=c(2,1,1,1),bty="n")
###############################################################


Result is in the picture.

(Other examples posted here: Near-Real-Time IT-Control Charts )


If you would like to attend my workshop - put your contact information to the comment of the post.








Raw time-series  CSV data for the case is below:

MonthlyRawData.csv

date,metric
6/1/2008,0.39
6/2/2008,0.39
6/3/2008,0.39
6/4/2008,0.39
6/5/2008,0.39
6/6/2008,0.39
6/7/2008,0.39
6/8/2008,0.39
6/9/2008,0.39
6/10/2008,0.39
6/11/2008,0.39
6/12/2008,0.39
6/13/2008,0.39
6/14/2008,0.39
6/15/2008,0.39
6/16/2008,0.39
6/17/2008,0.39
6/18/2008,0.39
6/19/2008,0.39
6/20/2008,0.39
6/21/2008,0.39
6/22/2008,0.39
6/23/2008,0.4
6/24/2008,0.4
6/25/2008,0.4
6/26/2008,0.63
6/27/2008,0.63
6/28/2008,0.59
6/29/2008,0.57
6/30/2008,0.37
7/1/2008,0.37
7/2/2008,0.37
7/3/2008,0.37
7/4/2008,0.37
7/5/2008,0.37
7/6/2008,0.38
7/7/2008,0.38
7/8/2008,0.38
7/9/2008,0.39
7/10/2008,0.39
7/11/2008,0.39
7/12/2008,0.39
7/13/2008,0.39
7/14/2008,0.37
7/15/2008,0.37
7/16/2008,0.37
7/17/2008,0.37
7/18/2008,0.37
7/19/2008,0.37
7/20/2008,0.38
7/21/2008,0.38
7/22/2008,0.38
7/23/2008,0.39
7/24/2008,0.39
7/25/2008,0.39
7/26/2008,0.38
7/27/2008,0.37
7/28/2008,0.37
7/29/2008,0.37
7/30/2008,0.37
7/31/2008,0.37
8/1/2008,0.37
8/2/2008,0.37
8/3/2008,0.37
8/4/2008,0.37
8/5/2008,0.37
8/6/2008,0.37
8/7/2008,0.37
8/8/2008,0.37
8/9/2008,0.38
8/10/2008,0.38
8/11/2008,0.38
8/12/2008,0.38
8/13/2008,0.38
8/14/2008,0.38
8/15/2008,0.38
8/16/2008,0.38
8/17/2008,0.45
8/18/2008,0.45
8/19/2008,0.45
8/20/2008,0.45
8/21/2008,0.45
8/22/2008,0.45
8/23/2008,0.46
8/24/2008,0.46
8/25/2008,0.44
8/26/2008,0.44
8/27/2008,0.44
8/28/2008,0.44
8/29/2008,0.44
8/30/2008,0.44
8/31/2008,0.45
9/1/2008,0.45
9/2/2008,0.45
9/3/2008,0.45
9/4/2008,0.45
9/5/2008,0.45
9/6/2008,0.45
9/7/2008,0.45
9/8/2008,0.45
9/9/2008,0.45
9/10/2008,0.45
9/11/2008,0.45
9/12/2008,0.46
9/13/2008,0.46
9/14/2008,0.44
9/15/2008,0.44
9/16/2008,0.44
9/17/2008,0.44
9/18/2008,0.44
9/19/2008,0.44
9/20/2008,0.45
9/21/2008,0.45
9/22/2008,0.45
9/23/2008,0.46
9/24/2008,0.46
9/25/2008,0.45
9/26/2008,0.46
9/27/2008,0.46
9/28/2008,0.45
9/29/2008,0.45
9/30/2008,0.45
10/1/2008,0.45
10/2/2008,0.45
10/3/2008,0.45
10/4/2008,0.45
10/5/2008,0.45
10/6/2008,0.45
10/7/2008,0.45
10/8/2008,0.45
10/9/2008,0.45
10/10/2008,0.45
10/11/2008,0.45
10/12/2008,0.45
10/13/2008,0.45
10/14/2008,0.45
10/15/2008,0.45
10/16/2008,0.45
10/17/2008,0.45
10/18/2008,0.45
10/19/2008,0.45
10/20/2008,0.45
10/21/2008,0.45
10/22/2008,0.45
10/23/2008,0.45
10/24/2008,0.45
10/25/2008,0.45
10/26/2008,0.45
10/27/2008,0.45
10/28/2008,0.45
10/29/2008,0.45
10/30/2008,0.45
10/31/2008,0.45
11/1/2008,0.45
11/2/2008,0.45
11/3/2008,0.45
11/4/2008,0.45
11/5/2008,0.45
11/6/2008,0.45
11/7/2008,0.45
11/8/2008,0.45
11/9/2008,0.45
11/10/2008,0.45
11/11/2008,0.45
11/12/2008,0.45
11/13/2008,0.45
11/14/2008,0.45
11/15/2008,0.45
11/16/2008,0.48
11/17/2008,0.48
11/18/2008,0.48
11/19/2008,0.48
11/20/2008,0.48
11/21/2008,0.48
11/22/2008,0.48
11/23/2008,0.44
11/24/2008,0.44
11/25/2008,0.44
11/26/2008,0.44
11/27/2008,0.44
11/28/2008,0.44
11/29/2008,0.44
11/30/2008,0.45
12/1/2008,0.45
12/2/2008,0.45
12/3/2008,0.45
12/4/2008,0.45
12/5/2008,0.45
12/6/2008,0.45
12/7/2008,0.45
12/8/2008,0.45
12/9/2008,0.45
12/10/2008,0.45
12/11/2008,0.45
12/13/2008,0.46
12/14/2008,0.45
12/15/2008,0.45
12/16/2008,0.45
12/17/2008,0.45
12/18/2008,0.45
12/19/2008,0.45
12/20/2008,0.45
12/21/2008,0.45
12/22/2008,0.45
12/23/2008,0.45
12/24/2008,0.45
12/25/2008,0.45
12/26/2008,0.45
12/27/2008,0.45
12/28/2008,0.45
12/29/2008,0.44
12/30/2008,0.44
12/31/2008,0.44
1/1/2009,0.45
1/2/2009,0.45
1/3/2009,0.45
1/4/2009,0.45
1/5/2009,0.45
1/6/2009,0.45
1/7/2009,0.45
1/8/2009,0.45
1/9/2009,0.45
1/10/2009,0.45
1/11/2009,0.45
1/12/2009,0.45
1/13/2009,0.45
1/14/2009,0.45
1/15/2009,0.45
1/16/2009,0.45
1/17/2009,0.45
1/18/2009,0.45
1/19/2009,0.45
1/20/2009,0.45
1/21/2009,0.45
1/22/2009,0.45
1/23/2009,0.45
1/24/2009,0.45
1/25/2009,0.45
1/26/2009,0.45
1/27/2009,0.45
1/28/2009,0.45
1/29/2009,0.45
1/30/2009,0.45
1/31/2009,0.45
2/1/2009,0.45
2/2/2009,0.45
2/3/2009,0.45
2/4/2009,0.45
2/5/2009,0.46
2/6/2009,0.46
2/7/2009,0.46
2/8/2009,0.46
2/9/2009,0.45
2/10/2009,0.45
2/11/2009,0.45
2/12/2009,0.45
2/13/2009,0.45
2/14/2009,0.45
2/15/2009,0.45
2/16/2009,0.45
2/17/2009,0.45
2/18/2009,0.45
2/19/2009,0.45
2/20/2009,0.45
2/21/2009,0.45
2/22/2009,0.45
2/23/2009,0.45
2/24/2009,0.45
2/25/2009,0.45
2/26/2009,0.45
2/27/2009,0.45
2/28/2009,0.45
3/1/2009,0.45
3/2/2009,0.45
3/3/2009,0.45
3/4/2009,0.45
3/5/2009,0.45
3/6/2009,0.45
3/7/2009,0.45
3/8/2009,0.45
3/9/2009,0.45
3/10/2009,0.45
3/11/2009,0.45
3/12/2009,0.72
3/13/2009,0.72
3/14/2009,0.72
3/15/2009,0.45
3/16/2009,0.45
3/17/2009,0.45
3/18/2009,1
3/19/2009,0.84
3/20/2009,0.84
3/21/2009,0.84

How to transform that data to the profile data (used above to build control chart)?  That and much more are covered by the workshop. SIGN UP!
____________________________
APENDIX: Script to install R, Shiny and Rstudio on AWS EC2 instance. 

#!/bin/bash
#install R
yum install -y R

#install RStudio-Server 1.1.423-x86_64
wget https://download2.rstudio.org/rstudio-server-rhel-1.1.423-x86_64.rpm
yum install -y --nogpgcheck rstudio-server-rhel-1.1.423-x86_64.rpm
rm rstudio-server-rhel-1.1.423-x86_64.rpm

#install shiny and shiny-server (2017-08-25)
R -e "install.packages('shiny', repos='http://cran.rstudio.com/')"
wget https://download3.rstudio.org/centos5.9/x86_64/shiny-server-1.5.4.869-rh5-x86_64.rpm
yum install -y --nogpgcheck shiny-server-1.5.4.869-rh5-x86_64.rpm
rm shiny-server-1.5.4.869-rh5-x86_64.rpm

#add user(s)
useradd username
echo username:username | chpasswd

Saturday, February 28, 2009

Real-Time Statistical Exception Detection

Does that make sense to apply statistical filtering to real-time computer performance data? I did not try as I believe analyzing last day data against historical baseline (based on dynamic statistical thresholds) would be enough to have good alert for upcoming issue and at the same time classical alerting system (based on constant thresholds, for instance, patrol or sites-scope) captures severe incidents if something completely dying.

But I see some companies do that using the following three (at least) products available on a market:

1. Integrien Alive™ (http://www.integrien.com/ )
2. Netuitive (http://netuitive.com/ )
3. ProactiveNet (now BMC), (http://documents.bmc.com/products/documents/49/13/84913/84913.pdf )

Plus Firescope http://www.firescope.com/default.htm and Managed Objects http://managedobjects.com/ do something similar)

I have recently had discussion with Integrien sales people as they did live presentation of Alive product for company I work for now.
I was impressed, it looks working good. Most interesting for me is the deference between SEDS (my approach) and their technology.

Apparently both approaches are using dynamic statistical thresholds to issue an alert.

But I think they do that using some patented complex statistical algorithms that should work well even if sample data is not normally distributed. It’s done based on some research that Dr Mazda A. Marvasti did and I am aware of this research as some of his thoughts was published in CMG (in MeasureIT) couple years ago. That consists of very good critic of SPC (Statistical Process Control) concepts applied to IT data as SPC works perfect if data is normally distributed and if not, it works not so perfect. The 1st attempt to improve SPC was MASF to regroup analyzed data and after regrouping data might be more close to normal. SEDS is based on MASF and, for instance, it looks at history in different dimension by not comparing (calculating st. deviations) hours during the same day but grouping hours by weekday and also it calculates statistic across weeks not days.

(You could find more details in my last paper. Links to some papers related to this subject including my papers can be found in this blog )

BTW In respond on his publication I did special analysis to see how far from normal the data is used by SEDS and some result of this research has been published in one of my papers. And my opinion is some data is close to normal and some still indeed is not so close and it depends of metrics, subsystems and environment (prod/non-prod) and how it’s grouped.

The key is what type of threshold the SEDS-like product uses to establish a base-line. That could be very simple – static one, or based on st. deviations, but that could be more complex thresholds such as combination of static (based on expert experiences – empiric) and simple statistical ones (based on st. deviations). SEDS uses that combination and SEDS has a several tuning parameters to tune SEDS to capture meaningful exceptions. I believe this approach is valuable (and cheap) for practical usage and several successful implementations of SEDS proves that.

But for more accurate analysis of data especially if it’s far from normal destitution, other more advanced statistical techniques could be applied and looks like this product implements that. For me it’s just another (more sophisticated) threshold calculation for base-lining. Anyway I am continue improving my approach and will be thinking about what they and others do in this area.

Other interesting observation I got from  the Integrien tool live presentation:
The rate of dynamic threshold exceeding is so large that they have to put additional (static???) threshold considering that some number of exceptions are kind of normal and just a nose that should be ignored. That means if the number of exceptions is bigger that that threshold, the smart alert is issued. I did not get how this threshold is set or calculated, but it’s very high - HUNDREDS (!!!) of exceptions per interval. I believe the reason of this is they apply “Anomaly” detector to too granular data. As I stated in my last paper the better result could be reached by doing statistical after some stigmatization (SEDS does that mostly after averaging that to hourly data)

BTW SEDS uses original meta-metric to detect only meaningful exceptions (it uses EV or Exception Value - see my last paper) that allows SEDS to have fault positive rate very low.