jump to navigation

SSIS for Extraction & Loading (EL) only September 2, 2013

Posted by msrviking in Business Intelligence, Data Integration, DBA Rant, Design, Heterogeneous, Integration, Integration Services.
Tags: , , , , , , , ,
add a comment

There were series of posts earlier on the different ways to pull data from Oracle to SQL Server where I shared on how to port data either through Replication, CDC for Oracle, and few others like SSIS. I thought I will give a starter to a series of post(s) on how we went about picking up the option of implementing SSIS as a way to bring data from Oracle to SQL Server.

The major drivers (business and non-functional) for picking the solution were

  • The destination database on SQL Server should be replica of the source
  • No business rules or transformation need to be implemented when data is pulled
  • The performance of pull and loading should be optimal with no overheads or bottlenecks either on the source or destination

The destination system was mentioned as ODS, but it surely isn’t the same as Operational Data Store of DWH systems. Sadly the naming convention had been adopted by the shop where we had to implement the solution, and you might see me using the word ODS for sake of explaining the implementation, and I am sorry for putting the incorrect usage of the word. You would probably see little more of conventional or standards being skewed in the implementation, and of course a bit of my comments on how this could have been handled, better.

So the ODS was to hold data that would be used by downstream business analytics applications for the business users, with an intent of providing Self-Service BI. The ODS is to be of exact – in the form of schema, and data as in the source. The job was to pull data from the source without any performance overhead, and failures. Not to forget to mention that there needn’t be any transformation of data. At the end we use SSIS as only an Extraction & Loading tool instead of ETL – Extraction, Transformation and Loading.

This sounds simple, eh, but beware this wasn’t that simple because the design was to be kept simple to address all of the above factors. The SSIS package(s) had to handle these at a non-functional level,

  • The package should be configurable to pick and pull data from any set date
  • The package should be re-runnable from the point of failure
  • The package should have configurability to address performance needs – package execution time should be less, the source should not be overloaded, the destination should be full in its efficiency while data is loading.
  • The package should be configurable to pull data from source in preset chunks. The preset chunks could be based on a period basis – days /months /years, or number of records per period
  • The package should have the option to flag any other dependent packages to run or not run during initial and incremental loads
  • The package should have defensive design to handle bunch of different type of errors
  • The package should have error logging and handling at a package level, and at record levels

In Toto this solution was more to how do we design for the non-functional requirements and implement, leaving the functional side of the data. This type of implementation is half-blind and I will talk more on those cons we had to face, when I get in to details of each implementation.

A post after few weeks of gap, and one with learnings and summaries on how it could have been better.

What is 360° view of a Customer? January 3, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , ,
add a comment

A much delayed post as part of my series on CDI (Customer Data Integration). My earlier post had an introduction on “How do I see 360° view of my Customer?” . In this post I shall talk a bit about on what is 360° in business terms, and what it is for a person like me.

There are multiple definitions of 360° view of a customer in a business. Let me pick up an example from the Travel industry and this is the client for whom I proposed the solution. The customer is well known online booking agent, and has booking businesses across several modes of transportation. It starts from a Car Rental to Airline booking, and all this happens through two well known modes of booking – online & offline.

Whichever is the mode of booking, the booking agent website finally deals with the end customer who is either a traveler or a flyer. The business verticals – marketing through customer relationship teams would want to know the “behavior” of the customer. Sadly this type of information could be captured mostly for online transaction than of offline. Hence I could manage understanding only the “online behavior”.

Why does business need a customer behavior? In today’s world everything revolves around interest of a customer and providing an optimal travel package based on his past, recent and probably future interests. Its definitely not in lines of earlier way of marketing and selling pre-packaged travel solutions. So what does a customer behavior mean here? It could be any of these at least, and many more than the below list.

  1. How many times has a registered customer visited and clicked the search flights or other website features link?
  2. How many times has a customer reached a booking stage but dropped off?
  3. What are the type of these customers? Are they regular visitors (registered customers)?
  4. What are the age groups? What season do these customers peak on the website, particular links?
  5. Have these customers done any booking earlier on the website? What are their past transactions (successful, failed, abandoned)?
  6. Have these customers ever interacted with the customer support? How had been the interaction? What is that customer support could be of more help?
  7. How does all the above data help the marketing and sales team? Have the sales and marketing team of different business (travel mode) units made in-roads to a customer need?
  8. And finally, is the customer genuine by his identity – name, age, mobile /cell #, mail id?

These were the key pointers from the business perspective, and what does all this mean to person like me? Here is the list of things that came up as first thought and answers to these helped me in bringing up a solution.

  1. Are there any implemented mechanisms that capture the customer behavioral data?
  2. What are those data sources?
  3. How clean are these data sources? How authentic and genuine are these data sources? Are there any duplication of data or information at a master level. For e.g. Is customer data duplicated?
  4. How many data sources should be dealt to bring that single view of the customer? Is this for master and transactional data?
  5. What are the type of data sources? Are these heterogeneous at data technology and platform levels?
  6. What are different forms of data – structured, semi-structured, and non-structured?
  7. What are the volumes of data or # of transactions that generate data in these data sources?
  8. Finally what is the one key that could be used to tie the transactions, behavior of a customer with that one key?

This post was self-interrogative and pointers to these questions from business and technology team would bring up the solution – architecture, data model, ETL, report design.

Do share what you think or what else could be included?

Cheers and Enjoy!

How do I see 360° view of my Customers? December 6, 2012

Posted by msrviking in Business Intelligence, Integration.
Tags: ,
1 comment so far

What does 360° view of the Customers mean? This is one of the first questions that came to my mind when my client had asked us “How do I see 360° view of my Customers”? Trust me, this opened a new world for me where I spent working on core-technical subjects all the time. This is the eye-opener for me into the world of “understanding data from business perspective.” This question didn’t stop me in asking back the customer on what is that he wants to do with the data (for sure I know its all about data at EOD)? My first question started here and I ended in doing the Architecture of Customer Data Integration (CDI) solution.

I am going to walk through several posts on how this journey of CDI or otherwise commonly known as Data Hub had happened. In this post I shall share few links that talk about definitions, architecture, implementation pointers of Data Hub or CDI and what it means to different businesses, CIOs and so on.

I surely have gained lot of knowledge after reading through several articles, blog posts, books, forums, internal and external discussions however not all could be condensed as blog posts. So you would see bundle of links, and of course my views around these articles (per article or all together) so that I too add my experience and learning’s. After all that is what this blog and blogging is about, isn’t it Smile?

So here are few reference links (talks at Architecture and Business levels) that I would want to share to have your thought process to kick-start if you are in search of the meaning or definition of the word CDI or Data Hub or 360° view of the customer?

http://searchsystemschannel.techtarget.com/tip/Data-management-concerns-of-MDM-CDI-architecture

>> This is one of the most important articles I have noted all over the web. The excerpts in this article is from a book called as Master Data Management and Customer Data Integration for a Global Enterprise authored by Alex Berson, and Larry Dubov. You could find a copy here if you want to own one.

The content of this article was revelation to me on the concept of CDI and I loved reading it. I kept reading the same article several times over a period of time until I published the intended architecture for implementation.

I wish to talk a lot about the article and the book but I will reserve those for other posts where I will pick each of the topics – used as guidelines and talk in detail on how I brought up the architecture. So don’t wait to hit the above link to read more.

http://datamanagement.manjeetss.com/tag/mdm

This is the next website, where Manjeet has written extensively on how CDI could be adopted, who could adopt, different models, best practices, principles, its relation with Master Data Management (MDM). I would say that one should read further into these topics as a supplementary for the above, first link.

http://www.b-eye-network.com/view/8109

This article gives more or less same information, but was relevant for me so that I could validate on what I was reading. Good one.

This is how I started my journey with CDI, and stay tune to read more. I would probably share more links as and when I dig my mail folders.

Cheers and Enjoy!