jump to navigation

CDIs–Non Functional Requirement–Performance March 12, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , , , ,

In today’s post I will talk about the NFR which I consider as one of the most important for the stability of the system for a CDI implementation – Performance.

This topic is vast and intense and there are several factors that needs to be considered while architecting or designing a system. A thought passes through my mind whenever, and even now – writing principles to achieve performance and scalability is easy which is mostly scholarly but the challenge lies during realization phase. If a systems performance and scalability has to be achieved then the appropriate solution blue print, right design, best coding, precise testing and deployment has to be in place. See as I said the moment this topic is opened up so much of lateral and tangential thinking comes into place. Now I am going to hold this and let’s talk about the above NFRs significance and then how it could be used for building CDI solution.

Performance – A short description and explanation that I picked from one of the sources I have shared in my earlier post, but modified for my better understanding and of course to suit the solution that I was building for the client.

So here it is, “performance is about the resources used to service a request and how quickly an operation can be complete”, e.g., response time, number of events processed. Its known that in a CDI type solution where the data is processed in GB to TB, and in large number of cycles response time could be less appropriate, but identifying the pre-determined response time for a particular process will help in defining the SLAs (SLA not from monitoring or maintenance perspective) for the downstream applications on when the data would be available.

I will give a short explanation here on what I intended by defining the performance. Say for e.g. I have to bring data from different data sources, apply lot of cleansing rules on the in-flow data, massage it for the intermediary schema, go map and publish per need of business and this is done on few millions of rows on different tables, or on GBs of data. I can’t assess the performance of batch applications at row basis or seconds basis. In a transactional system I would have gone by response time to be in seconds, and this doesn’t mean I can define that the job /batch application should finish the processing in an hour to 4 hours defined in the maintenance windows. I would prefer to abide per business needs of maintenance windows, however while looking for information on this NFR I am pre-defining the systems behavior – predictably, and also ensuring that to meet this NFR I need to have batch applications, real-time processing, appropriate design to handle volumes and velocity of the data. Yes, too much of thought process and explanation but the short message of the story is that “Meet the business needs of maintenance window by defining the SLAs for the system”. This is key for an integration solution or in specific to CDI.

Now I shall get back to the questions that I posed across to assess on what is the performance or rather SLA levels should the system adhere. The below list of questions are repeat of my previous post, but I shall put a bit of more text on why this question was asked.

  • What is the batch window available for completion of complete the batch cycle?

The batch window is that number of hours that are available for the CDI system to process data. This usually is mutually agreed inputs from the business and the IT folks, and knowing this information helps in deciding if the system can meet the proposed number of hours with the existing data sources, complexity in the processing logics, and volumes of data that needs to be processed.

  • What is the batch window available for completion of individual jobs?

The response to this question will help in deciding the dependency of batch job, and also in designing batches efficiently.

  • What is the frequency of the batch that could be considered by default?

The inputs for frequency of the job will help in assessing on how the loads on the system would vary during different weekdays. At the end this information would essentially help in deciding the frequencies of jobs, how to optimally use the resources of the system knowing that a typical day may not have the requisite load.

  • What are the data availability SLAs provided by the source systems of the batch load?

The source systems data availability would help in assessing if the proposed SLAs would be met or not. Essentially it is close to realistic if the data availability of the source system is close to what is being considered as SLA requirement for CDI.

  • What is the expected load (peak, off-peak, average)?

This would be more of a response from the IT team who would have set internal benchmarks on peak, off-peak and average load that a particular system should have. The historical data of the existing source systems or integration systems would be good references, and the answer to this question would help in designing optimal batches, frequencies and in proposing the relevant hardware for implementation.

  • What is the hardware and software configuration of the environment where the batch cycle could be run?

Strange question isn’t it!? Yes indeed, but trust me this questions response sets the pulse of what is expected from the CDI’s hardware and software if it is to be procured based on fresh inputs or if the solution has to be co-hosted on the existing systems with a pre-defined or prescribed hardware or software.

The following question will help in digging deeper and further on the standards that needs to be adhered for new solutions.

  • Are the resources available exclusive or shared?

The last and final question in evaluating on what is the expected performance behavior is to find out if the new solution has to be hosted on existing systems or new systems in to-to.

This has been a long post, and I am thinking I should pause for now. In the next post I shall talk about the NFR – Scalability.

Please feel free to comment and let me know your thoughts.



No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: