jump to navigation

SSIS is for ETL, ELT or EL….clamor continued September 3, 2013

Posted by msrviking in Business Intelligence, Data Integration, Design, Integration Services.
Tags: , , , , , , , ,
add a comment

In my yesterday’s post I had mentioned that we had to design and implement a SSIS solution that does EL, and had my own gripe about addressing only the non-functional requirements. Also it was just about a process of extraction & loading – EL.

Today let me start off with the definition of ETL, ELT, and then the new word with EL where T is missing, or supposedly not to be considered here. While writing this post I realized I should put across few notes on what an ETL or ETL means, or probably the differences between these implementations. I could have probably listed the points as a neat comparison table but blame it on my laziness that I am sharing these links. Trust me I have read these links, and I completely acknowledge the technical content’s relevance to ETL vs ELT.

http://www.dataacademy.com/files/ETL-vs-ELT-White-Paper.pdf

http://blog.performancearchitects.com/wp/2013/06/13/etl-vs-elt-whats-the-difference/

At the end ETL would mean that you have a powerful infrastructure in your destination, to handle transformation using SET operations, but then you would need a staging area to do the transformation and push the data into the destination tables. Whereas in ELT processes, the tools have the power to do the transformation in parallel infrastructure setup and in-memory, and away from the destination systems. After the data is transformed it is loaded into the databases for business to consume.

So having these principles in mind, we had yet to implement the EL instead of ETL or ELT. I personally didn’t find the approach to be good to implement, and which could have been done easily without any development effort using a CDC for Oracle in SQL Server, Replication. However I found some finer pros of this approach, and I remember noting those in one of my earlier post over here. Those few are,

  1. SSIS doesn’t need any additional changes to be done on the source end, except that we configure a connector to pull the data.

    Whereas Replication from Oracle had to create temporary tables, triggers in the source schema which probably is an overhead along with changes on the source schema. This is not definitely acceptable by any business or system owner. CDC for Oracle in SQL Server helped to a larger extent to address the above problem, but then enabling certain properties of the source instance would be overhead for security considerations.

  2. SSIS is essentially for a flow-control of data, and we could synch data on a pre-set precedence of steps. For example the master table’s data should ported first, and then the transactional tables.
  3. SSIS could be configured to pull data from the source based on business criteria by using parameters in the WHERE statements.
  4. SSIS gives finer control on error logging, handling and of course we could have retry packages to pull in data of failure.
  5. SSIS also gives me a perfect control on the defensive programming so that all business data is in place.
  6. SSIS could be tweaked, tuned to have optimal performance from extraction through loading.

Overall we will have control on what is happening when the data is extracted and loaded. Now I am wondering all these are perfect design guidelines for any ETL or ELT, and how I wished or rather would have not wanted on just an EL process, huh.

Cheers!