What is Event Management in the ServiceNow ITOM module?

Before talking about what Event Management is, we need to understand that in large companies, every device on the network is being monitored by one or more monitoring tools. The main purpose of these monitoring tools is to automatically watch these devices and if something has gone wrong or if something is about to go wrong, generate a (warning or an error) message.

These (error/warning) messages are called Events.

So why do we need Event Management? We need event management because monitoring tools are not smart. 

In a complex network where multiple devices are connected together to perform a single function, when one device fails, the monitoring tool watching it (the failed device) and the monitoring tools watching the other devices (that the failed device was working with) all start generating their own error/warning messages. Although the number of messages is big, the actual problem is just one failed device. Let's try to understand this with the example of the Customer Order System that we talked about earlier. We know that the front-end of this system is on one server. The REST API is on another server, and the Database is on a third server. Now let's suppose that the database server breaks down. The tool monitoring the database server will go crazy and start generating an error message every 60 seconds. But things won't stop there. Now because the REST API is unable to send requests to the database server, the monitoring tool that is monitoring the REST API would also start generating error messages every 60 seconds. And because the front-end would now, not be able to send requests to the REST API, the monitoring tool, monitoring the front-end server would also start generating error messages every 60 seconds. 

So now we have a huge bunch of error messages from multiple machines even though only one machine broke down. How do we make sense of these error messages and find the root cause of the error messages? 

This is what Event Management helps us with. You see, in the above scenario, each monitoring tool is generating the same error message every 60 seconds. So that's a lot of duplicate messages. Event Management would take all duplicate messages and present them as single messages to us. Then Event Management is going to use the Service Mapping to determine that the three servers, generating these error messages, actually work together to perform a single function (which is Customer Order System here). Event Management is going to use Service Mapping to also determine, that out of the three servers, which server is dependent upon which. In this case, the front-end server is dependent upon the REST API server to do its job, and the REST API server is dependent upon the database server to do its job. The database server is not dependent upon the other two to do its job though. We can sum this up by saying that the server sending the requests is the dependent server. And the server that is receiving those requests and responding to them is the provider. So, in our example, the front-end depends upon the REST API to work and the REST API depends upon the database server to work. Since the database server does not send requests to any server, it does not depend upon any other server to do its job. This means, when the database server goes down, the other servers that depend upon it to do their job also sort of go down with it and start generating error messages. Using Service Mapping, Event Management is going to determine this and get to the root of the problem i.e. the broken database server. It would deduce that because the database server, the REST server and the front-end server, all three are generating error messages, therefore the database server must be broken because the other two depend upon it to do their job. And now Event Management would trigger the Orchestration module to try to fix the database server automatically. We are going to cover Orchestration in a future post.

Just to be thorough, if the front-end server had broken down, the REST API and the database servers would not have generated any errors because they simply respond to requests from the front-end. So even if no requests come from the front-end (because the front-end server is broken), the REST server and the database server don't care and thus do not generate errors. They simply keep waiting for a request to come.

Using this "dependence" information, the Event Management module is able to deduce the root cause of the error messages.


Comments

Popular posts from this blog

What are HAM and HAM Pro within ServiceNow ITAM module?

What is ServiceNow ITOM module? What are its submodules?

What are SAM and SAM Pro within the ServiceNow ITAM Module?