Monday, April 20, 2015

Service Monitoring in SOA 11g: Using Oracle BTM 12c

Oracle Business Transaction Management (OBTM) is a Service monitoring tool for Oracle SOA environments. Oracle acquired it from Amber point and has rebranded it along with adding serveral functionalities to make it compatible with the Oracle FMW stack.  Oracle Enterprise Manager (OEM) which provides Server Monitoring capabilities can be used with BTM to provide an end to end Monitoring Capability at Enterprise level.

SOA developers usually rely on either the EM console for monitoring/debugging SOA instances or Oracle Service Bus console (if reporting is enabled) for Service bus flows or log files incase of java components. If the landscape contains all these several components tied together it becomes a nightmare for people supporting/maintaining the system to debug it. 

BTM comes handy in these scenarios:
  • It's a centralized service monitoring tool which gives lot of insights into service transactions.
  • It allows Automatic data collection through BTM Observers, which are non-intrusive in nature (unlike BAM it doesn't require any modifications to existing service code) 
  • It provides an end to end visibility to a service transaction spanning different type of components. (BPEL processes + OSB flows + Java web services+ Database calls etc.)
 At the time of writing this article, I have used SOA Suite/OSB 11.1.1.7 with BTM 12.1.0.6.7 on a 11g R2 DB. I won't cover the various installation/configuration steps for setting up the tool as its well documented in below Oracle link.


I would like to touch upon some of the features the tool provides and a typical topology which will be used in most SOA deployments.

Service Endpoint Monitoring
By default the BTM observers will monitor most of the service end points like BPEL endpoints, OSB Proxy endpoints or Business Services, database adapter calls (create connection, close connection, execute etc), POJOs, JMS queues etc. Even without any transaction creation, you can analyze the data related to each service end point and this data is retained for the configured amount of time (based on logger system policy). You can create custom views of specific service end points related to a particular domain/container which are frequently viewed.

Transactions
Transactions are very useful when you have multiple components in the service interface like OSB, BPEL, Java, DB adapter calls etc. If there are service interactions which are asynchronous in nature, they can also be correlated using Message Fingerprint (a unique key identified by BTM), ECID or some custom message property to link them up. Transactions provide an end to end visibility on the service message flow through the various components and give analysis data like timings and fault details at individual message level if logging is enabled.

SLA Policies/Alerts
SLA policies can be defined on various service end points and email alerts triggered to the concerned stake holders if they are breached. Typical examples would be Avg. response time on a service exceeding a threshold value for a given period of time, Max response time for a service breaching some threshold, number of faults on a critical business service exceeding the high water mark etc.

Based on my experience with the tool:
  • Out of the 5 JVM components in a typical BTM installation, the most important ones are the btmMonitor and btmMain. If the transaction load on servers are high, it would be recommended to horizontally scale the btmMonitor across cluster nodes, as its the one interacting with the Observers installed on all the monitored domains. btmPerformance and btmTransaction are other 2 JVMs...
  • I would say this is a very nice tool for technical people/developers and not something which would be used by Business users. The UI provides very detailed information at a granular level for each service transaction capturing information like throughput, Avg./Max response times, faults etc which can be very useful while doing performance testing of services.
  • The SLA alert emails are very helpful for support team while debugging production issues when end systems aren't responding in a timely fashion.
  • There is a BTM CLI (Command Line Interface) utility provided by the product which can be used to extract service/transaction information based on a time range. This can be used for reporting purposes for use cases like how many transactions ran for more than 30 seconds in a day for a specific service interface.
  • The underlying database tables of BTM is highly de-normalized (perhaps to optimize read performance and render data on the UI faster) but it makes it highly difficult to query specific data. It would be an understatement to say these DB tables are not straightforward to interpret.