Popular Post

_

Tuesday, July 20, 2021

Presenting in London - "Performance Anomaly and Change Point Detection For Large-Scale System Management"

I will be presenting at the Worlds3 conference in London my paper

"Performance Anomaly and Change Point Detection For Large-Scale System Management"

(https://www.researchgate.net/publication/340926055_Performance_Anomaly_and_Change_Point_Detection_For_Large-Scale_System_Management)

Time slot in London time: 04:30 - 06:00 on 29th July 2021



Thursday, March 25, 2021

SEDS based "CLOUD RESOURCES WORKLOAD PROFILING"

Based on SEDS method the workload profiling of main cloud objects (AWS EC2, RDSand  EBS) are implemented at my current work. 

Next Tuesday 3/30 at 12:30 pm EST I will be sharing my experience of building and using this method at the Data Centers and Cloud Infrastructure virtual CMG.org conference. You are welcome! The topic of the presentation is "CLOUD RESOURCES WORKLOAD PROFILING"

ABSTRACT: How to be sure a cloud object’s (e.g, AWS EC2, RDS or EBS) workload fits the rightsized resources (Compute, RAM, IO/s and Network traffic)? It is very difficult to do using raw system performance data from monitoring tools. The best way to do that is using a weekly workload profile, which is a graphical visualization in form of MASF IT-Control chart. This chart shows the stability of the workload, reveals the anomalies that happened recently, such as run-away, memory leaks or specifically important for cloud objects, the unusual number of hours the object is down all compared with the usual weekly pattern.

This presentation will describe how to build, read, and use workload profiles using real data examples and demonstrates how cloud capacity scaling could be verified.



Thursday, January 21, 2021

"Performance problem diagnosis in cloud infrastructures" (#CloudComputing #AnomalyDetection #ControlChart)

  I found this interesting research (2016) written by Olumuyiwa Ibidunmoye, which has a reference to my 2004 paper (Capturing Workload Pathology by Statistical Exception)

Abstract 

Cloud datacenters comprise hundreds or thousands of disparate application services, each having stringent performance and availability requirements, sharing a finite set of heterogeneous hardware and software resources. The implication of such complex environment is that the occurrence of performance problems, such as slow application response and unplanned downtimes, has become a norm rather than exception resulting in decreased revenue, damaged reputation, and huge human-effort in diagnosis. Though causes can be as varied as application issues (e.g. bugs), machine-level failures (e.g. faulty server), and operator errors (e.g. mis-configurations), recent studies have attributed capacity-related issues, such as resource shortage and contention, as the cause of most performance problems on the Internet today. As cloud datacenters become increasingly autonomous there is need for automated performance diagnosis systems that can adapt their operation to reflect the changing workload and topology in the infrastructure. In particular, such systems should be able to detect anomalous performance events, uncover manifestations of capacity bottlenecks, localize actual root-cause(s), and possibly suggest or actuate corrections. This thesis investigates approaches for diagnosing performance problems in cloud infrastructures. We present the outcome of an extensive survey of existing research contributions addressing performance diagnosis in diverse systems domains. We also present models and algorithms for detecting anomalies in real-time application performance and identification of anomalous datacenter resources based on operational metrics and spatial dependency across datacenter components. Empirical evaluations of our approaches shows how they can be used to improve end-user experience, service assurance and support root-cause analysis.

Control Charting Example from the paper



Anomalies Detection and Cloud Platform Selection During DevOps - #CMGimpact conference interesting session (#CMGnews #CloudComputing #AnomalyDetection)

The topic is interesting as it relates to two my current interests: Clouding and AD. It is scheduled for today evening - https://cmgimpact.com/anomalies-detection-and-cloud-platform-selection-during-devops/ 

Here is abstract: 

"In this session, the presenters will review the challenges of anomaly detection during DevOps and discuss the methodology and use case of cloud platform selection for the application. There will be a focus on applying iterative modeling and gradient optimization to determine the minimum configuration and cost required to support new applications Service Level Goals in different clouds"






Wednesday, January 20, 2021

Enjoying Virtual CMG Impact conference. #CMGnews

 


- this is  a snip shot from last night's Q&As with Chris Molloy! 
IMPACT 2021 is great place for getting answers & networking among your peers! Come join us and take advantage of what IMPACT 2021 has to offer, click here https://hubs.li/H0F87V00

Cloud Capacity Management (#CloudComputing #CapacityManagement)

I have created a LinkedIn group to discuss this. Please join: https://www.linkedin.com/groups/13935809/