There are several use-cases where Integration/Middleware layer is expected to handle large payload (XML/Non-XML) processing. There are many technical challenges with these kind of implementations:
- How to process such large files (>1 GB) without running into Out Of Memory/Heap Space issues.
- How to fine tune the design to ensure SLAs are met and data transformations are done efficiently without eating up server resources.
Oracle has published several white papers and articles outlining the various use-cases where Oracle FMW/SOA Suite can be leveraged to process large payloads. These also cover the various best practices/configurations while designing/implementing such integrations.
In this blog post I would cover one such use case of Processing large XML (with repeating structures). Requirement was to concurrently process more than 10 large sized XML files (each > 1GB). Oracle recommends using below approach for this use case:
- De-batching XML
- Chunked Read
- Streaming XPath functions
Steps followed:
- For me requirement was to ensure input and output file have same sequence of data, so de-batching input XML wasn't an option as that would have generated different output files and merging those at end was a challenge.
- Created a BPEL and leveraged File Adapter's ChunkedRead operation. The approach was basically to invoke read operation on file adapter inside while loop based on the chunk size configured. This will ensure that instead of loading the whole XML file into the memory, chunks of data will be loaded. JCA : property name="ChunkSize" value="1000";
- XSLT was be used on smaller payload size rather than the entire large payload. Used properties like streamResultToTempFile which enables XSLT results to be streamed to a temporary file and then loaded from the temporary file instead of caching into memory as a whole document in binary XML format (resulting in OOM errors)
- Ensured proper JVM heap size settings (4-6 GB), transaction timeout settings (15-20 mins) and audit configurations are done at server level to avoid OOM errors.
With the SOA approach I was able to successfully process only 1 to 2 large files of 1-1.5GB size concurrently. The time taken by BPEL was high (around 5 minutes for single file) and also the heap usage was very high. (XSLT was straight forward direct mapping for few fields only). This BPEL solution didn't scale up for concurrent processing of 3 or more files and started giving "Out Of Memory: Heap space" errors.
Alternative Approach:
With BPEL ruled out, the Plan "B" was to implement core processing logic in Java layer and invoke the java static method from OSB/BPEL using Java callout/Java embedding activities.
Steps followed:
- File streaming (read/write) was done using java.nio packages as these are faster and more efficient.
- Method was implemented for file chunking i.e instead of reading and transforming the whole >1GB file, it was split into smaller chunks (4MB size) and processing was carried out each chunk (byte stream).
- JAXB libraries were used for marshalling and unmarshalling of data and the transformation logic was embedded inside the java code itself.
- Proper exception handling was implemented to reject bad records in any chunk of data and generate output data with only good data.
Results:
This alternative approach turned out to be a highly scalable solution and also helped in achieving the required SLAs with minimal heap usage.
- Was able to process 10-12 such large files with less heap usage and in lesser time frame. (** Around 12 minutes for concurrently processing 12 such large files of 1GB size with overall heap usage of 1GB. Below is a JConsole screenshot after processing the files)
- Load Balancing in Cluster environment was achieved by leveraging OSB Business Service's load balancing capability. This helped in distributing the file processing load across nodes of the cluster.
No comments:
Post a Comment