jump to navigation

CDIs-Non Functional Requirement–Scalability March 27, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Integration.
Tags: , , , , , ,

n the last post I spoke about Non Functional Requirement – Performance, and in this post I am going to talk a bit about Scalability and this  is a closest parameter related to performance too. Generally during in a requirement gathering  phase this requirement documented as – need a scalable solution and should have high performance or quick response time. Well I don’t think I would write this note technically, but if anyone else I would expect this type of “single line” statement.

What is the definition of scalability, and what should be those finer line items that needs to be considered for a CDI solution to be scalable?

Scalability definition from Wikipedia (http://en.wikipedia.org/wiki/Scalability)

Scalability is the ability of a system, network, or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth.

and the definition of the related word Scalable system is

A system whose performance improves after adding hardware, proportionally to the capacity added.

All these sound simple, yes, true but these sentences have great meanings inside, and to realize this as implementation is the task for any engineer, manager, technology specialist or an architect. Now this is done, let me get into next point and the most important one for this post.

Scalable database system is the system that can handle additional users, a number of transactions and more in future, data growth and all this with speed, preciseness, accuracy. Any system can be scaled up or out, and when it comes to database system like SQL Server, a scale up option is most easiest but could get costlier, and scale out needs to be well planned. When I say well planned it starts from non functional requirements understanding, solution design, design of data model, development, capacity planning with infrastructure setup.

Now that we know definition of scalability and scalable database system, I would like to list down those questions that helps me in deciding the needs of a scalable database system in a CDI solution.

  • What is the expected annual growth in data volume (for each of the source systems) in the next three to five years?

This is in same line to what a scalable database system should meet, however the importance is more when dealing with CDI – a data integration solution. An approximate projection of data growth of different data sources would help in deciding if the CDI solutions database could handle that growing data. The volumes of data plays very important part in meeting your response times or the SLAs for data processing, and then publish for analytics consumption.

  • What is the projected increase in number of source systems?

Similar question as above, but an idea of how many data source systems should be integrated along with the volume of data generated from these systems would help in knowing the volume of data that needs to be handled.

  • How many jobs to be executed in parallel?

I couldn’t disagree with an already documented point on this question, so mentioning as a repeat of someone who said about this meaningfully.

There is a difference between executing jobs sequentially and doing so in parallel. The answer to this question impacts performance and could affect the degree to which the given application can scale.

  • Is there any need for distributed processing?

An answer to this question will give a thought of existing systems distributed processing capabilities or the capacities, the organizations policies towards distributed processing. But this is at the level of understanding the the clients requirements, and should be taken as only leads or inputs to decide if distributed processing should be considered or not. A distributed processing of data is double-edged decision, and needs to be considered for solution after plenty of thought given on trade-offs.

The list of questions for scalability is short, but all these are essential to be addressed to have a high-performing and scalable system. I have nothing to give as an expert advise, but the last suggestion is to get these responses to have predictable behavior of any CDI solution.

Please feel free to comment and let me know your thoughts too.



No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: