Sunday, March 8, 2015

API Health Monitoring using MEF & Microsoft Azure

Memphis - That's the name I have given to the custom tool I wrote to monitor the health & availability of our API. 

Background:
There are thousands of tools which can be used to monitor the performance, usage, availability of websites but when it comes to APIs, the usual availability test which just ping your server doesn't suffice. Your server hosting the API might be up and running but the methods under the endpoints may not due to backend services or database are down. The only way to identify the working of those methods is by calling them constantly and verifying the response. This is what API monitoring tools offers that you write the test for your endpoint methods and upload them on their websites. Although they are configurable but doesn't give you exactly what you need and if your APIs are as diverse as ours then it really becomes a problem. Also, there is very limited re-usability involved in your tests.

We have our APIs hosted on-Premise & on cloud which are further wrapped under an API management tool like APIGEE. Also, doesn't matter where these are hosted, all are secured via some form of authentication.

Memphis:
The name Memphis came from the underlying framework as the tool is based on the Microsoft's "Managed Extensibility Framework (MEF)" and hosted on Microsoft's cloud - Azure. Below are the components involved in this tool:-
































It may look like a complicated architecture but the components are really straight forward & simple to use. The idea I had while writing this thing was writing & adding tests to this framework should be extremely simple. If its not then people are not going to use it. And since people like copy pasting a lot, adding tests to this framework should be as simple as drag and drop which is why I used MEF. MEF allows you to discover things at runtime like finding DLLs of specific type at some location. Also, there should be minimum code required to write tests which is why "Skeleton" was added. Below are all components explained:-

1. IPlugin: MEF can discover DLLs at runtime in couple of ways but one way is to find all implementations of one specific interface and that is what IPlugin is. You can write a test in a simple class library project inheriting from IPlugin and drop the DLL in Memphis and it will be picked for execution.

2. Skeleton: One problem of simplifying the addition of tests in Memphis is solved via MEF & IPlugin interface but what about ease of writing the tests. This is where Skeleton, an abstract class inheriting from IPlugin comes into picture. Skeleton provides default implementation of most of the method of IPlugin and further allows user to override them. Also, it has helpers for APIGEE, Microsoft Azure which eases the authentication issues. Skeleton also takes cares of logging & error handling and all left to do is just to write your code for calling the method and verify the result.

3. Memphis Executor: Memphis executor is WCF rest service hosted as Azure Website in cloud which holds the job of executing the test written by inheriting Skeleton. It executes all tests found in the DLLs available in Plugins folder of the solution. Each test’s results are held in separate table in Azure Storage. Memphis Executor runs periodically under Azure Scheduler for predefined interval of either 15 or 30 minutes.

4. Memphis Reporter: Memphis reporter is an Azure WebJob running every 5 minutes of interval. The work of reporter is to analyse the test results and report failures, drop of availability & average response under configured limits of each tests. The configuration is defined later. Alerts are sent to configured admin people and additional recipients defined at test level.

5. Memphis Web: Memphis web is web portal hosted in cloud as Azure website. The portal’s dashboard shows the test configured in Memphis executor and details page which shows the performance of test for the selected day plus the configured and calculated availability and average response. The details page also shows the failed tests and errors behind them.

All the 5 components explained above, only Memphis executor is the one which needs to be redeployed every time tests change or added. Memphis web and reporter works independently and picks up all the new things gets added or changed in executor. Everything in Memphis test's is configurable. Below is the configuration per test:-

<monitor name="sectionName" methodName="methodname" serviceName="serviceEndpointName" tableName="azureTableName" additionalRecipient="piyush.gupta@bupa.com" serviceUrl="https://mywcfserviceurl.com/SearchService.svc" disableLogging="false" minTimeBetweenAvailabilityAlerts="120" availabilitySLA="88" disableReporting="true" averageResponse="7" failDuration="35" failCount="2" minTimeBetweenResponseAlerts="120" />

MonitorSection is the name of the main config section which is defined in Memphis Executor’s web.config for it to run the test properly. MonitorSection hold one “monitor” node for each test and each monitor section defines the configuration properties of that test only. The properties of this section are described below:-

Name
Type
Required
Default
value
Description
Name
String
Yes
Null
Name of the section – unique between all tests
MethodName
String
Yes
Null
Name of the method being tested under service
ServiceName
String
Yes
Null
Name of the service whose method is being tested
TableName
String
Yes
Null
Name of the table which will hold the test results
AdditionalRecipients
String
No
Null
Any additional email address who wish to receive notification of service failures along with admin team
ServiceUrl
String
Yes
Null
Url of the service which is being tested
DisableLogging
Bool
No
False
Indicating whether logging of test results is disabled
DisableReporting
Bool
No
False
Indicating whether the reporting of test results via email is disabled
AvailabilitySLA
Int
Yes
99
The expected availability of the service which is used during analysis of test results and email notification if calculated availability goes below this value
AverageResponse
Int
Yes
0.5
Expected average response in seconds which is used during analysis of test results and email notifications are sent if the calculated values goes below this
FailDuration
Int
No
35
Minutes duration under which if failures are encountered above FailCount then Failure alerts are sent
FailCount
Int
No
2
Minimum number of test failures which are encountered under FailDuration then failure alerts are sent
MinTimeBetweenAvailabilityAlerts
Int
No
30
Minutes gap between availability alerts. If the availability is below expected then alerts will be sent every dash minutes as configured here
MinTimeBetweenResponseAlerts
Int
No
30
Minutes gap between average reponse alerts. If average response is below expected then alerts will be sent every dash minutes as configured here


All above configuration properties are per test basis. Each test will have its own set of above configuration which is used by Memphis Executor and reporter for email alerts. The other config settings available in executor and reporter are explained in their respective sections.

This is all about Memphis. It may not be the perfect tool but handle the basic requirement of availability check pretty well.

Hope you would have enjoyed Memphis and if you have any feedback then feel free to suggest.

1 comment :

  1. Great article by the great author, it is very massive and informative but still preaches the way to sound like that it has some beautiful thoughts described so I really appreciate this article about api. Best Oracle Fusion Financials Online Training

    ReplyDelete