jump to navigation

CDIs-Non Functional Requirements-Other few May 21, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Design, Integration, Security.
Tags: , , , , , , , , ,
add a comment

In the series of CDIs-Non Functional Requirements I had covered on NFRs like Performance, Scalability, Availability, Reliability, and Utilization Benchmarks that could be useful to build a CDI system. In this post I shall talk about less known but important NFRs like Maintainability, Extensibility and Security. I wouldn’t term these as NFRs but questions on these topics will help you to get an insight on how the system is expected to behave, from a business and technical perspective. Also, these don’t work in silos, instead are linked back to some of the ones mentioned in the above list.

Maintainability

  1. What is the maintenance/release cycle?

    This question will give us an idea on the practices that is followed by the business and technical teams about the release cycle. And each release cycle would mean there would be change in the source system which may be applicable to down-stream applications. The more the release cycles, the difficult is the job of maintaining code base. And to avoid long-term overheads the system, data model and the low level design of the ETLs should be carefully built considering that this changes would be constant, and frequent.

  2. How frequently do source or target structures change? What is the probability of such change?

    This point is relevant to the first question, but elicits information one level deeper by asking “if more maintenance cycles, what are those changes at source and expected to be in target”. If the changes are constant, frequent and less complex then the data model and the ETLs have to be configurable to accommodate ‘certain’ changes in the source. The configurability comes with a rider and tradeoff on other NFR like performance. The changes on data source could affect the performance of the ETL and sometimes the business laid SLA can’t be met.

Now having said this, I presume the next NFR will be closely related with Maintainability.

Configurability

The answers to queries under this topic is supposedly to be challenging for the business, and technical teams. Not everyone is sure of what should be configurable and what shouldn’t be based on the changes that are expected from business at the source system level. One would get the answer of “not sure”, “may be”, “near future quite possible”, “probably” the source will change, and what change will remain as a question. The challenge of providing an appropriate solution at different layers will be a daunting task for the technical team.

  1. How many different/divergent sources (different source formats) are expected to be supported?

    The answer to this question will help in understanding what formats of sources (csv, tsv, xml..etc…) have to be supported, and if there is plenty of difference then alternate design practices could be implemented on the target which could provide extensibility to all formats.

  2. What kind of enhancements/changes to source formats are expected to come in?

    An answer to this point would help in deciding if there be abstract transformations or reusable mappings.

  3. What is the probability of getting new sources added? How frequently does the source format change? How divergent will the new source formats be?

    This information will help in knowing how often the sources format change, and is it with existing sources or with new ones. Again it would also help in deciding between abstract transformations or reusable mappings.

The last NFR is Security which is usually the last preferred in any system architecture and design, but most important.

Security

In case of CDI we are dealing with sensitive information of the customer and the transaction details. It is important to understand how business treats this type of data, and how do security compliance team want to consider the data is being gathered from different source systems and consolidated at a single place – “Data hub”. The below bunch of questions cover on the data protection level rather than access levels or privileges of different users.

  1. Are there any special security requirements (such as data encryption or privacy) applicable?

    An answer to this question usually would be “no”, but there are certain fields that are brought in from CRM and ERP systems and needs to be hidden from any misuse in case of breach of security. It is suggested that this question is explained well with a real scenario, and then a decision of having data or database encryption enabled or not could be taken.

  2. Are there any logging and auditing requirements?

    This is least required since the data from different systems is mostly massaged and made available in reporting format through a different data sink. A discussion in here would help in deciding if the security should be handled at reporting level (enabling different security features), rather than in massive data processes.

I hope all these posts on NFR for CDIs helps you in architecting, designing Data Hub system that is highly available, scalable, high performing, and most reliable.

Cheers!

Advertisements

CDIs-Non Functional Requirement-Utilization Benchmarks May 16, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Integration.
Tags: , , , , , , ,
add a comment

In this post I shall share my thoughts on how these questions would help in designing system to meet the response time, and performance. Some of these are about the history of the existing systems and few are about the future requirements. Lot of English, eh, so straight message is that answers to these questions will help me in gathering “baselines and benchmarks” of the past and future. And once I get to know I should have

  1. A system designed that performs with low latency (quick response time) for any operation
  2. Built an infrastructure (especially servers) that would meet future performance requirements

Let me group these questions based on purpose and tag with some key words, for easy understanding.

Q# Question Purpose – Keywords
1 How many rows are created every day in transactional DBs? To get current baselines on how many rows are generated from business transactions – row size, storage space, disk size, CPU size, computation power
2 What is the size of data that is generated across LOBs? To get current baselines on how many rows are generated from business transactions per line of business – row size, storage space, disk size
3 What is the agreeable processing time of data in data hub To understand future benchmarks of the data hub – CPU sizing
4 How stale can the reporting data be? To understand future benchmarks of the data hub – storage sizing
5 How much of the processed data and data used for reporting has to be archived /purged?
To understand future benchmarks on how the archived data would be used in reporting system – server sizing, storage sizing
6 How many years of data have to be archived in the system?
To understand future benchmarks on how the archived data would be used in reporting system – server sizing, storage sizing
7 How many years of yester year’s data have to be processed? To understand future benchmarks on how the archived data would be used in reporting system – server sizing, storage sizing, CPU sizing, RAM sizing
8 What will be year on year growth of data in the transactional source system?
To understand current baselines and future benchmarks on the data that needs to be processed – server sizing, storage sizing, CPU sizing, RAM sizing
9 What could be the other sources of data that could be added during a period of 1-2 years, and how much of these data sources could provide data to data hub?
To understand current baselines and future benchmarks on the data that needs to be processed – server sizing, storage sizing, CPU sizing, RAM sizing
10 What are the agreeable database system downtime hours? To understand future benchmarks on how quick the data processing should complete – CPU sizing, RAM sizing
11 What is the possible backup and recovery time required for the Data Hub and Reporting system?
To understand on how quickly should the whole data hub and BI system be back up and running – servers sizing, storage sizing, CPU sizing, RAM sizing
12 How many users will be added year on year for the reporting system, and what are the types of users?
To understand future benchmarks on how the reporting and data hub system resources will be utilized – servers sizing, storage sizing, CPU sizing, RAM sizing
13 Are there any other external systems that would require the data from data hub?
To understand current baselines of data processing and BI systems – server sizing
14 How many rows of the transactional /business system needs to be processed for the Data Hub?
15 How much data is currently processed for reports?
16 What type of data processing queries exist in the system which provide Static /Ad-Hoc reports?
17 What types of reports are currently available, and what is the resource usage?
18 What are the query profiling inputs for these data processing queries /reporting queries?

  • CPU usage
  • Memory usage
  • Disk usage
To understand current baselines of data processing and BI systems – server sizing

In the follow up post for this NFR I shall let you all know on how I translated as part of design and server sizing. Until then happy reading.

Please feel free to share your comments.

Cheers!


CDIs-Non Functional Requirement-Reliability May 6, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Integration.
Tags: , , , , , ,
add a comment

This is a strange NFR, but very important. The word reliable is more of English and looks less applicable in here?! Well, no. We are talking of reliability with the meaning and importance of data reliability in a data integration system – CDI. Please take a note of a point which I had mentioned earlier that in a CDI there is no change in meaning of data, but trying to align across organization perspective of what the data in the CDI system means and how true it is. It is necessary that all business, and technical teams agree with the definition of these data terms. After this is agreed, the technical teams have to implement logics – business, solution in the system to bring that “value and truth” to the data or information that is consolidated from different sources.

In this post, I shall share few thoughts on the questions that we need to find answers.

  • What is the tolerance level or percentage of erroneous data that is permitted?

With the above context set, this question should gather information in order to plan error handling and threshold levels in case of any issue. The usual way is to have a failed process restarted, or aborted. Now to do this we need to identify the tolerance (threshold levels) and handle to get the essential, business valid data.

  • Is there any manual process defined to handle potential failure cases?

There should always be a process which restarts any failed or aborted process due to technical or business validation errors, should this be done done manually or automated. In less tolerant systems this process should be either semi-automated or fully-automated, and manual in some cases. A known fact and familiar thought too, but we should elicit this from the stakeholders explicitly to understand the business information at the same level.

  • Under what conditions is it permissible for the system/process to not be completely accurate, and what is the frequency/probability of such recurrence?

This question is not to be mistaken that we could ignore certain levels of error in the system and could be unattended. Rather this pointer helps in understanding what are the processes that could be considered as business critical, and the inputs gathered here could and should be used in designing a solution to handle errors – especially the business across all the modules based on importance that is agreed with the business stakeholders.

Although there just three questions in this NFR, all inputs work in tandem with other NFRs helping to build in a less-erroneous, fault-tolerant and robust CDI. There is lot of business importance to these questions which actually feeds into the solution building, designing of components, modules, and data model.

So please share your thoughts or feel free to comment on this NFR.

Cheers!

CDIs-Non Functional Requirement-Availability March 28, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, High Availability, Integration.
Tags: , , , , ,
add a comment

The word availability so commonly used while building systems, and there are different responses, and of course spontaneous too. I would like to share few of those which I know of, and used more often.

  • I want the system to be available all the time – 24×7. This response was not of recent times, but maybe at least 5 years back where the business or application or program managers didn’t have knowledge on what a system availability means.
  • The business needs system to be available during the weekdays, and during the peak hours between 8 AM – 4 PM with acceptable failure time or non availability time of 5 mins. And during the weekends the business could accept few down time hours during the early hours of the day for any patch or software upgrades. This is more stringent for applications that have database systems to be continuously available for the business.

Thankfully the response to the availability questions is better and is getting better as more techno-functional business managers are involved in a new application or system building. It has been saving my day, where I don’t have to explain bunch of terms, formulae and what not.

Availability in the world of CDI solution is little different although the system should be available for any users accessibility all the while. What is that small difference? I had been dealing with transactional systems extensively, and the idea of availability changed when I had to look at this NFR from the perspective of CDI or a data integration solution. Trust me as I am writing this post, I couldn’t figure out the exact point that gives difference in the context of transactional system and data integration solution. I shall try defining for both, and hopefully that gives some sense.

Availability in the transactional system – the system should be up and running in the defined SLAs, except for unwarranted outages. In case of failure the system should be recovered in defined period of SLAs. This could be addressed by use of several availability features in SQL Server, and usually transactional systems are less bulkier than CDI /Data Hub databases. The key points that come on top of my head are

  1. No business /user data should be lost during a transaction, and data has to be accessible all the time
  2. No transaction should be a failure because of a system or process failure
  3. And in case of any failure the system should handle the failed transaction, and data should be recovered

Availability in the data integration systems – the system should be up and running – available for business users to consume the aggregated data from various sources. Again these too in the pre-defined and agreed SLAs, and some of these bulkier databases availability requirements could be addressed by different availability features of SQL Server. The key points are

  1. No business data should be lost during transition (porting time and path) and there should be enough mechanisms to track back the lost row or record until the originating point
  2. In case of any lost row or record should not be because of a hardware or system but could be because of implemented business processes failure, and this should be recorded in the system
  3. In case of any hardware failure, the system should be up and running within agreed SLAs and it is acceptable to have little longer period of recovery or restoration
  4. And the data should be available near real-time, accurate and on time

I believe after writing the above list I am probably bringing out that ‘thin’ difference between availability of integration solutions and transactional system. For both the systems data is most important and for this system, processes, and hardware should be in place, and that’s the objective. Great!, now this knowledge sharing is done, I am going to get into those questions which I put forward to know what is the availability requirement, and also judge what it all means for the customer. I am sure there are organizations who have adopted MDM solution, but MDM-Data Hub or CDI is seldom done because of its impact on the business directly or indirectly. Okay, I am not getting into that part of the discussion..so here are those pointers that we should gather inputs.

  • How important is the system and how quickly does it need to be returned to online in case of an outage or failure?

This question is to address the recoverability or restoration of the system in case of a hardware or system process failure. At the end the response to this question helps in defining SLA for a data hub database, and its downstream eco-system data sources.

  • How up-to-date does your information need to be?

The question here looks closest to near real-time data requirement, but please hold and look at it once again. The response would help to address on “Data Staleness”. In short how old can the business bear with, and technically how often should the data refresh happen.

  • Do you need real-time or are delays acceptable?

This question is off-shoot of the previous one, and response for this question will set the expectations from business and techno-functional teams if there should be real-time data pulls, processing and analytics.

  • What is the data retention policy?

The last question is to address the archival policy, and also to give an idea on what type of availability feature should be used to make large volumes of data available as part recovery of process.

At the end I probably managed pulling in questions for the tricky word “availability” in the context of CDI. All the inputs out here would help in designing a solution that should meet the requirements of availability – data staleness, data retention, and data recoverability.

I shall stop here and of course I feel this is the end of the NFR availability. Please feel free to comment and share your thoughts.

CDIs-Non Functional Requirement–Scalability March 27, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Integration.
Tags: , , , , , ,
add a comment

n the last post I spoke about Non Functional Requirement – Performance, and in this post I am going to talk a bit about Scalability and this  is a closest parameter related to performance too. Generally during in a requirement gathering  phase this requirement documented as – need a scalable solution and should have high performance or quick response time. Well I don’t think I would write this note technically, but if anyone else I would expect this type of “single line” statement.

What is the definition of scalability, and what should be those finer line items that needs to be considered for a CDI solution to be scalable?

Scalability definition from Wikipedia (http://en.wikipedia.org/wiki/Scalability)

Scalability is the ability of a system, network, or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth.

and the definition of the related word Scalable system is

A system whose performance improves after adding hardware, proportionally to the capacity added.

All these sound simple, yes, true but these sentences have great meanings inside, and to realize this as implementation is the task for any engineer, manager, technology specialist or an architect. Now this is done, let me get into next point and the most important one for this post.

Scalable database system is the system that can handle additional users, a number of transactions and more in future, data growth and all this with speed, preciseness, accuracy. Any system can be scaled up or out, and when it comes to database system like SQL Server, a scale up option is most easiest but could get costlier, and scale out needs to be well planned. When I say well planned it starts from non functional requirements understanding, solution design, design of data model, development, capacity planning with infrastructure setup.

Now that we know definition of scalability and scalable database system, I would like to list down those questions that helps me in deciding the needs of a scalable database system in a CDI solution.

  • What is the expected annual growth in data volume (for each of the source systems) in the next three to five years?

This is in same line to what a scalable database system should meet, however the importance is more when dealing with CDI – a data integration solution. An approximate projection of data growth of different data sources would help in deciding if the CDI solutions database could handle that growing data. The volumes of data plays very important part in meeting your response times or the SLAs for data processing, and then publish for analytics consumption.

  • What is the projected increase in number of source systems?

Similar question as above, but an idea of how many data source systems should be integrated along with the volume of data generated from these systems would help in knowing the volume of data that needs to be handled.

  • How many jobs to be executed in parallel?

I couldn’t disagree with an already documented point on this question, so mentioning as a repeat of someone who said about this meaningfully.

There is a difference between executing jobs sequentially and doing so in parallel. The answer to this question impacts performance and could affect the degree to which the given application can scale.

  • Is there any need for distributed processing?

An answer to this question will give a thought of existing systems distributed processing capabilities or the capacities, the organizations policies towards distributed processing. But this is at the level of understanding the the clients requirements, and should be taken as only leads or inputs to decide if distributed processing should be considered or not. A distributed processing of data is double-edged decision, and needs to be considered for solution after plenty of thought given on trade-offs.

The list of questions for scalability is short, but all these are essential to be addressed to have a high-performing and scalable system. I have nothing to give as an expert advise, but the last suggestion is to get these responses to have predictable behavior of any CDI solution.

Please feel free to comment and let me know your thoughts too.

CDIs–Non Functional Requirement–Performance March 12, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , , , ,
add a comment

In today’s post I will talk about the NFR which I consider as one of the most important for the stability of the system for a CDI implementation – Performance.

This topic is vast and intense and there are several factors that needs to be considered while architecting or designing a system. A thought passes through my mind whenever, and even now – writing principles to achieve performance and scalability is easy which is mostly scholarly but the challenge lies during realization phase. If a systems performance and scalability has to be achieved then the appropriate solution blue print, right design, best coding, precise testing and deployment has to be in place. See as I said the moment this topic is opened up so much of lateral and tangential thinking comes into place. Now I am going to hold this and let’s talk about the above NFRs significance and then how it could be used for building CDI solution.

Performance – A short description and explanation that I picked from one of the sources I have shared in my earlier post, but modified for my better understanding and of course to suit the solution that I was building for the client.

So here it is, “performance is about the resources used to service a request and how quickly an operation can be complete”, e.g., response time, number of events processed. Its known that in a CDI type solution where the data is processed in GB to TB, and in large number of cycles response time could be less appropriate, but identifying the pre-determined response time for a particular process will help in defining the SLAs (SLA not from monitoring or maintenance perspective) for the downstream applications on when the data would be available.

I will give a short explanation here on what I intended by defining the performance. Say for e.g. I have to bring data from different data sources, apply lot of cleansing rules on the in-flow data, massage it for the intermediary schema, go map and publish per need of business and this is done on few millions of rows on different tables, or on GBs of data. I can’t assess the performance of batch applications at row basis or seconds basis. In a transactional system I would have gone by response time to be in seconds, and this doesn’t mean I can define that the job /batch application should finish the processing in an hour to 4 hours defined in the maintenance windows. I would prefer to abide per business needs of maintenance windows, however while looking for information on this NFR I am pre-defining the systems behavior – predictably, and also ensuring that to meet this NFR I need to have batch applications, real-time processing, appropriate design to handle volumes and velocity of the data. Yes, too much of thought process and explanation but the short message of the story is that “Meet the business needs of maintenance window by defining the SLAs for the system”. This is key for an integration solution or in specific to CDI.

Now I shall get back to the questions that I posed across to assess on what is the performance or rather SLA levels should the system adhere. The below list of questions are repeat of my previous post, but I shall put a bit of more text on why this question was asked.

  • What is the batch window available for completion of complete the batch cycle?

The batch window is that number of hours that are available for the CDI system to process data. This usually is mutually agreed inputs from the business and the IT folks, and knowing this information helps in deciding if the system can meet the proposed number of hours with the existing data sources, complexity in the processing logics, and volumes of data that needs to be processed.

  • What is the batch window available for completion of individual jobs?

The response to this question will help in deciding the dependency of batch job, and also in designing batches efficiently.

  • What is the frequency of the batch that could be considered by default?

The inputs for frequency of the job will help in assessing on how the loads on the system would vary during different weekdays. At the end this information would essentially help in deciding the frequencies of jobs, how to optimally use the resources of the system knowing that a typical day may not have the requisite load.

  • What are the data availability SLAs provided by the source systems of the batch load?

The source systems data availability would help in assessing if the proposed SLAs would be met or not. Essentially it is close to realistic if the data availability of the source system is close to what is being considered as SLA requirement for CDI.

  • What is the expected load (peak, off-peak, average)?

This would be more of a response from the IT team who would have set internal benchmarks on peak, off-peak and average load that a particular system should have. The historical data of the existing source systems or integration systems would be good references, and the answer to this question would help in designing optimal batches, frequencies and in proposing the relevant hardware for implementation.

  • What is the hardware and software configuration of the environment where the batch cycle could be run?

Strange question isn’t it!? Yes indeed, but trust me this questions response sets the pulse of what is expected from the CDI’s hardware and software if it is to be procured based on fresh inputs or if the solution has to be co-hosted on the existing systems with a pre-defined or prescribed hardware or software.

The following question will help in digging deeper and further on the standards that needs to be adhered for new solutions.

  • Are the resources available exclusive or shared?

The last and final question in evaluating on what is the expected performance behavior is to find out if the new solution has to be hosted on existing systems or new systems in to-to.

This has been a long post, and I am thinking I should pause for now. In the next post I shall talk about the NFR – Scalability.

Please feel free to comment and let me know your thoughts.

CDIs – Non functional requirements January 24, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , , , ,
add a comment

Its been quite sometime that I had posted on this topic and I believe its time to share the next post which is to do with technical work I had done in this assignment.

NFR – Non functional requirements! Are you aware of this term? Yes you would be, should be and could be. But when I talk to Architects, DBAs, and Project Management Teams I noticed that these group of people understand at a layman level, but don’t understand much at depths. It usually ends with “Yes, NFRs are very important like response time, total number of users, number of concurrent users”.

Somehow I don’t like stopping at this level, and if you are building a solution for a transactional system or analytical system you will have to get deeper to know the behavior of the system at a level of Availability, Scalability, Performance, Multi-Tenancy when business is on work. The words I have brought up here are well known, but when you start digging further you will notice we would be covering lot more. At the end, the built solution should comply to NFRs as an integrated piece of all the above said factors.

So for the CDI solution I had built, I considered the below list as most important NFRs. I don’t believe this is the exhaustive list, but I know these are to be addressed for me to build the architecture.

· Availability

o Disaster Recovery

§ How important is the system and how quickly does it need to be returned to online in case of an outage or failure?

· Data

o Data Staleness

§ How up-to-date does your information need to be?

§ Do you need real-time or are delays acceptable?

o Data Retention

o Internationalization

· Performance

o What is the batch window available for completion of complete the batch cycle?

o What is the batch window available for completion of individual jobs?

o What is the frequency of the batch that could be considered by default?

o What are the data availability SLAs provided by the source systems of the batch load?

o What is the expected load (peak, off-peak, average)?

o What is the hardware and software configuration of the environment where the batch cycle could be run?

o Are the resources available exclusive or shared?

· Scalability

o What is the expected annual growth in data volume (for each of the source systems) in the next three to five years?

o What is the projected increase in number of source systems?

o How many jobs to be executed in parallel?

o Is there any need for distributed processing?

· Reliability

o What is the tolerance level or percentage of erroneous data that is permitted?

o Is there any manual process defined to handle potential failure cases?

o Under what conditions is it permissible for the system/process to not be completely accurate, and what is the frequency/probability of such recurrence?

· Maintainability

o What is the maintenance/release cycle?

o How frequently do source or target structures change? What is the probability of such change?

· Extensibility

o How many different/divergent sources (different source formats) are expected to be supported?

o What kind of enhancements/changes to source formats are expected to come in?

o What is the probability of getting new sources added? How frequently does the source format change? How divergent will the new source formats be?

· Security

o Are there any special security requirements (such as data encryption or privacy) applicable?

o Are there any logging and auditing requirements?

· Capacity

o How many rows are created every day in transactional DB, CRM and ERP?

o What is the size of data that is generated across LOBs in DB, CRM, ERP systems?

o What is the agreeable processing time of data in data hub?

o How stale can the reporting data be?

o What are the agreeable database system downtime hours?

§ Administration & Maintenance

§ Data Refresh /Processing

o How many concurrent users will access the report /application?

o What is the total number of users expected to use the reporting system?

o What are the expected complexity levels of reporting solution? (High, Medium, Low)

o How much of the processed data and data used for reporting has to be archived /purged?

o How many years of data have to be archived in the system?

o How many years of yester year’s data have to be processed?

o What is the possible backup and recovery time required for the Data Hub and Reporting system?

o What are the availability requirements of the data hub and reporting system?

o How many users will be added year on year for the reporting system, and what are the types of users?

o What will be year on year growth of data in the transactional source system?

o What could be the other sources of data that could be added during a period of 1-2 years, and how much of these data sources could provide data to data hub?

o Are there any other external systems that would require the data from data hub?

o How many rows of the transactional DB, CRM and ERP needs to be processed for the Data Hub?

o How much data is currently processed for reports?

o What type of data processing queries exist in the system which provide Static /Ad-Hoc reports?

o What types of reports are currently available, and what is the resource usage?

o What are the query profiling inputs for these data processing queries /reporting queries?

§ CPU usage

§ Memory usage

§ Disk usage

· Resource Utilization

o Is there an upper limit for CPU time or central processing systems, etc.?

o Are there any limitations on memory that can be consumed?

o Do the target store/ database/ file need to be available 24X7?

o Any down time is allowed?

o Is there any peak or off peak hours during which loading can happen?

o Are there crucial SLAs that need to be met?

o What if SLAs are missed are there any critical system/ business impact?

This list was prepared after researching the web for similar implementations, best practices, standards, and based on the past experiences.

I am sharing few links where I found information on capacity planning which had questions that were around NFRs

http://searchdatamanagement.techtarget.com/news/2240033663/Capacity-planning-for-the-data-warehouse-environment

www.information-management.com/newsletters/data_management_warehouse_store_integration-10020567-1.html?zkPrintable=1&nopagination=1

Please feel free to comment.

Cheers and Enjoy!