Digital transformation is moving the world in which we live at breakneck speed, with IT the driving force behind this revolution. We have seen rapid adoption of cloud technologies, and the types and number of devices we have to manage has increased dramatically.
We all have laptops, tablets, smart phones and IoT devices, and they are all communicating data about what we are doing and where we are doing it. Storage systems now deal with Terabytes and Petabytes of data rather than merely the Megabytes of yesteryear.
The result of all of this is a significant increase in the scale and complexity of our IT systems. So, it’s critical that the IT Operations function evolves with this new landscape.
Yet it occurred to me recently that historically the IT Operations profession has some striking similarities to the Cobbler’s children and their lack of shoes.
With all the architecture, design and development being focused on the customer’s requirements, there was little time – or money – left for ensuring that the appropriate tools were in place to effectively and efficiently manage the solution.
Dynamic alerting – understanding the norm
Thankfully, the growth of AI and machine learning algorithms now allow us to quickly extract insights from data that we previously never knew existed. With enough data, it is possible to train a model to identify the symptoms of failure prior to the failure occurring.
Most importantly, we can utilise machine learning techniques in order to identify anomalies, and even start to predict when things may go wrong. That’s where AIOps comes in.
Using the data we have gathered from our monitoring solutions we can now train a machine learning model to understand what is normal. The model can learn when the peak usage is and accept that as the norm. Furthermore, should I start seeing an increase in alerts, this can be identified as an anomaly.
This unsupervised learning can be further enhanced by removing any false alarms or associations. So, following a failure event, if a server is incorrectly associated with the failure, the ITOps team can easily update the tool’s rules by removing this rogue association.
Through a combination of AIOps learning and human review, AIOps platforms can quickly build an accurate picture of your IT infrastructure, without the painstaking work of having to enter all associations, particularly when these change so rapidly.
Having learnt how my system performs, and in particular when a failure has occurred, I can now look to leverage any of the automation tools that I have at my disposal. So, if I have discovered an issue with a storage device, for instance, I may wish to automatically restart it.
Predictive maintenance delivers in improved availability
A good example of this dynamic alerting in practice is where Fujitsu is using data from its systems to build models that can now identify hard disk failures before they actually occur. With this insight, Fujitsu can arrange proactive maintenance activity, providing a replacement before the disk has actually failed, without risking any data loss.
In addition, having been given several days’ warning, we are also better able to arrange for the maintenance activity when an engineer is either already onsite or nearby, reducing the cost and time to perform the fix.
This type of predictive maintenance activity can be adapted to many different scenarios, ranging from monitoring the performance of retail self-service terminals, to IoT devices monitoring water levels.
Further real-life examples are outlined in the associated White Paper which expands on my thoughts captured here.
The future lies in an augmented approach
Now I’m not for one moment suggesting it’s the end of the IT Operations function as we know it. What I’m suggesting is that with this additional insight, appropriate focus can be directed to who should review such change proposals and the level of scrutiny required.
The good news is this doesn’t replace the traditional event and metric gathering tools that we use today. It simply augments it.
So, our modern-day Cobbler’s children can have their shoes after all.
Find out more…
To find out more about how AI is being used by the IT Operations function to manage this increasing scale and complexity, download our latest White Paper, ‘The Cobbler’s Children Have No Shoes’.
Latest posts by Kevin Yeo (see all)
- AIOps – teaching the machines proactive maintenance - May 28, 2020