jump to navigation

Oracle to SQL Server Replication-Pre-Requisites May 10, 2013

Posted by msrviking in Configuration, Heterogeneous, Integration, Replication.
Tags: , , , , ,

In the first post on this topic I had shared my tiny plan or approach on how to get the Oracle to SQL Server replication done. Today I shall talk a bit about on why I listed the steps, and also share the links of forums, blog posts I had used to work on each step.

1. Environment setup

This was the first step where I had to decide what is that I could have running on that box at an OS level, the hardware configuration I would need to run an instance of Oracle and SQL Server. So here are the specs I chose for bringing up this demo-able replication setup quickly.

  • A virtual machine, this is the fastest and best way to have POCs setup for understanding the internals and share it with teams.
  • A virtual machine with enough memory, CPU power and hard disk space. What would be the configuration for this? I knew I wasn’t showing anything on performance when there is data being replicated between systems, but a simple walk-through to understand the pain of having replication done from Oracle to SQL Server.Here are the specs,
    • Memory /RAM > 4GB
    • CPU > Dual core processor
    • Hard Disk space > 150GB
  • The VM with a an appropriate OS. While I was deciding on these specs I thought it doesn’t make sense to configure replication between Oracle and SQL Server running on the same box, and that too on Windows. So there was a little change of course of what should run where. That made me to pick Windows Server 2008 R2 to run SQL Server 2012 with SP1, and Oracle on Linux box. Now the challenge to find a box running with Linux and Oracle on top of it. I got some help because I had one of the boxes with Oracle on Linux, running for another customer so I had to just use it after few requests with the team.

Here is the link that tells you about software and hardware requirement to run SQL Server 2012 SP1 Developer Edition (this edition has Replication Features). We know about this link, nevertheless I thought sharing won’t cost me anything and could help anyone of us.

2. Installation of Oracle

As mentioned earlier I wanted to have the setup with Oracle on Linux and “Oracle edition that supports replication setup”. Just take a note of the last sentence in quotes, by saying that I wanted Oracle Standard /Enterprise Edition to support Oracle Gateway option which is part of these editions, and this option is not available in Oracle Express Edition. Oracle Gateway needs to be installed separately without any license in the same box of Oracle instance or a different box. I didn’t want to get into too many installations for a quick POC hence chose an Oracle Enterprise edition instance which was already up and running on Linux, 64 bits environment. Made my job precise and easy, eh!

All this said I shall still share the links on the Oracle Editions, and the license information /feature availability per Edition.

Oracle Standard Edition (SE), and Enterprise Edition (EE) features availability


Oracle Express Edition (EE)


Another point that I realized although didn’t or probably won’t be of great importance is that you have only an Oracle 32-bit Express Edition on Windows, and no 64-bit is available. I checked, and verified that there is no such installation. Of course I didn’t have to bother much because I dropped the idea of running an Oracle XE on Windows, rather a EE, 64 bit on Linux. Nevertheless, in case you are hunting for one for whatever reason I am thinking this should help you to stop the mad hunt. Here is the link where it certifies that no 64-bit.


3. Installation of SQL Server

This was simplest because this was done a zillion times earlier, except that I had to be choosy in features list when I was installing the SQL Server instance on Windows Server 2008 R2 64 bit. The feature I had to make sure that was available was Replication as part of installation, and that too was done with ease. The pre-requisites were taken care when I chose the OS, Hardware for this piece of work, but make sure  you have latest service pack. I haven’t seen anything break when I tried with SQL Server 2012 w/o SP, though.

4. Configuration of Oracle publisher and SQL Server Transactional Replication – Subscriber

Under this heading I won’t write about how the configuration was done but the intent is to list down the pre-requisites.

  • What is needed before Oracle publisher is configured?
    • Oracle Database Client (Oracle Database 11g Release 2 Client) on Windows box where SQL Server 2012 instance is running and you could get this from here. Please remember the bit version you would like to download, and trust me this is the most important step to establish a connection from SQL Server on Windows with Oracle on Linux. You mess or miss this step for whatever reason you will realize that it was simple step, and bad mistake.
    • 64-bit Data Access Components to be installed on SQL Server box. You will get that installable from here, and you will want to choose the right one based on your environment. I had to pick this one 64-bit ODAC 11.2 Release 5 ( for Windows x64.
  • What is needed for SQL Server Transaction Replication – Subscriber setup? Primarily everything done in the first step is good enough to establish a connection with Oracle, and then configuration a publication – subscription (pull) is quick.

I shall share finer details under each of the configuration steps and the intent in this post was to tell what you should have installed before you start putting the Oracle Publisher – SQL Server Transactional Replication – Subscriber in place.

At the end this link – Configuring an Oracle Publisher talk about same steps I have put in here, but falls short of pointing to right places of appropriate downloads.

I hope these help, and please feel free to comment.



Oracle to SQL Server Replication – First steps May 6, 2013

Posted by msrviking in Configuration, Heterogeneous, Integration, Replication.
Tags: , , , ,
1 comment so far

There was a request in my shop to setup a Transactional Replication on SQL Server 2012 and the publisher would be Oracle. A known setup in SQL Server – SQL Server, but not straight if Oracle is the publisher. The setup was to be such a way that nothing should be done on Oracle end, and everything should be configured from SQL Server. Sounds good and easy, eh! This was my first thought, but post this exercise I realized that it definitely isn’t that straight forward too and needs a bit of planning, careful installations of dependent features, and at the end has helped me refresh Replication setup knowledge which I didn’t apply for last few years.

In this post I shall share steps to configure transactional replication on SQL Server 2012, and the publisher is Oracle 11g enterprise edition running on Linux.

As mentioned earlier firstly I had to put in my thoughts to see what are the steps are involved, what are the other features or dependent components I need to install before I configure replication. I will try my best to post in few images of installation of any important step, otherwise it would be more of text writing.

Here is the high level activity I put in for sake of clarity and to know where I am. This helps and helped me to judge on how much time would I take to configure the whole setup, whatever in-depth experience I have. A bit of planning and picking the right steps (an approach) did help me avoid some rework.

  1. Environment setup
  2. Installation of Oracle
  3. Installation of SQL Server
  4. Configuration of Oracle Publisher
  5. Configuration of SQL Server transactional replication-subscriber

I shall share few thoughts against some of the above steps, because I realized everything requires few pre-requisites. For e.g. an environment is the box that is needed to run both SQL Server and Oracle, installation of appropriate version and edition of Oracle or SQL Server, drivers and client network /connectivity tools, connecting to Oracle through SQL Server replication and publishing the tables. All sound easy and familiar but things go bad and take few hours more than expected how much so you plan and jot your points.

I shall write details in the next post as part of this series, so stay tuned.


CDIs-Non Functional Requirement-Reliability May 6, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Integration.
Tags: , , , , , ,
add a comment

This is a strange NFR, but very important. The word reliable is more of English and looks less applicable in here?! Well, no. We are talking of reliability with the meaning and importance of data reliability in a data integration system – CDI. Please take a note of a point which I had mentioned earlier that in a CDI there is no change in meaning of data, but trying to align across organization perspective of what the data in the CDI system means and how true it is. It is necessary that all business, and technical teams agree with the definition of these data terms. After this is agreed, the technical teams have to implement logics – business, solution in the system to bring that “value and truth” to the data or information that is consolidated from different sources.

In this post, I shall share few thoughts on the questions that we need to find answers.

  • What is the tolerance level or percentage of erroneous data that is permitted?

With the above context set, this question should gather information in order to plan error handling and threshold levels in case of any issue. The usual way is to have a failed process restarted, or aborted. Now to do this we need to identify the tolerance (threshold levels) and handle to get the essential, business valid data.

  • Is there any manual process defined to handle potential failure cases?

There should always be a process which restarts any failed or aborted process due to technical or business validation errors, should this be done done manually or automated. In less tolerant systems this process should be either semi-automated or fully-automated, and manual in some cases. A known fact and familiar thought too, but we should elicit this from the stakeholders explicitly to understand the business information at the same level.

  • Under what conditions is it permissible for the system/process to not be completely accurate, and what is the frequency/probability of such recurrence?

This question is not to be mistaken that we could ignore certain levels of error in the system and could be unattended. Rather this pointer helps in understanding what are the processes that could be considered as business critical, and the inputs gathered here could and should be used in designing a solution to handle errors – especially the business across all the modules based on importance that is agreed with the business stakeholders.

Although there just three questions in this NFR, all inputs work in tandem with other NFRs helping to build in a less-erroneous, fault-tolerant and robust CDI. There is lot of business importance to these questions which actually feeds into the solution building, designing of components, modules, and data model.

So please share your thoughts or feel free to comment on this NFR.


CDIs-Non Functional Requirement-Availability March 28, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, High Availability, Integration.
Tags: , , , , ,
add a comment

The word availability so commonly used while building systems, and there are different responses, and of course spontaneous too. I would like to share few of those which I know of, and used more often.

  • I want the system to be available all the time – 24×7. This response was not of recent times, but maybe at least 5 years back where the business or application or program managers didn’t have knowledge on what a system availability means.
  • The business needs system to be available during the weekdays, and during the peak hours between 8 AM – 4 PM with acceptable failure time or non availability time of 5 mins. And during the weekends the business could accept few down time hours during the early hours of the day for any patch or software upgrades. This is more stringent for applications that have database systems to be continuously available for the business.

Thankfully the response to the availability questions is better and is getting better as more techno-functional business managers are involved in a new application or system building. It has been saving my day, where I don’t have to explain bunch of terms, formulae and what not.

Availability in the world of CDI solution is little different although the system should be available for any users accessibility all the while. What is that small difference? I had been dealing with transactional systems extensively, and the idea of availability changed when I had to look at this NFR from the perspective of CDI or a data integration solution. Trust me as I am writing this post, I couldn’t figure out the exact point that gives difference in the context of transactional system and data integration solution. I shall try defining for both, and hopefully that gives some sense.

Availability in the transactional system – the system should be up and running in the defined SLAs, except for unwarranted outages. In case of failure the system should be recovered in defined period of SLAs. This could be addressed by use of several availability features in SQL Server, and usually transactional systems are less bulkier than CDI /Data Hub databases. The key points that come on top of my head are

  1. No business /user data should be lost during a transaction, and data has to be accessible all the time
  2. No transaction should be a failure because of a system or process failure
  3. And in case of any failure the system should handle the failed transaction, and data should be recovered

Availability in the data integration systems – the system should be up and running – available for business users to consume the aggregated data from various sources. Again these too in the pre-defined and agreed SLAs, and some of these bulkier databases availability requirements could be addressed by different availability features of SQL Server. The key points are

  1. No business data should be lost during transition (porting time and path) and there should be enough mechanisms to track back the lost row or record until the originating point
  2. In case of any lost row or record should not be because of a hardware or system but could be because of implemented business processes failure, and this should be recorded in the system
  3. In case of any hardware failure, the system should be up and running within agreed SLAs and it is acceptable to have little longer period of recovery or restoration
  4. And the data should be available near real-time, accurate and on time

I believe after writing the above list I am probably bringing out that ‘thin’ difference between availability of integration solutions and transactional system. For both the systems data is most important and for this system, processes, and hardware should be in place, and that’s the objective. Great!, now this knowledge sharing is done, I am going to get into those questions which I put forward to know what is the availability requirement, and also judge what it all means for the customer. I am sure there are organizations who have adopted MDM solution, but MDM-Data Hub or CDI is seldom done because of its impact on the business directly or indirectly. Okay, I am not getting into that part of the discussion..so here are those pointers that we should gather inputs.

  • How important is the system and how quickly does it need to be returned to online in case of an outage or failure?

This question is to address the recoverability or restoration of the system in case of a hardware or system process failure. At the end the response to this question helps in defining SLA for a data hub database, and its downstream eco-system data sources.

  • How up-to-date does your information need to be?

The question here looks closest to near real-time data requirement, but please hold and look at it once again. The response would help to address on “Data Staleness”. In short how old can the business bear with, and technically how often should the data refresh happen.

  • Do you need real-time or are delays acceptable?

This question is off-shoot of the previous one, and response for this question will set the expectations from business and techno-functional teams if there should be real-time data pulls, processing and analytics.

  • What is the data retention policy?

The last question is to address the archival policy, and also to give an idea on what type of availability feature should be used to make large volumes of data available as part recovery of process.

At the end I probably managed pulling in questions for the tricky word “availability” in the context of CDI. All the inputs out here would help in designing a solution that should meet the requirements of availability – data staleness, data retention, and data recoverability.

I shall stop here and of course I feel this is the end of the NFR availability. Please feel free to comment and share your thoughts.

CDIs-Non Functional Requirement–Scalability March 27, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Integration.
Tags: , , , , , ,
add a comment

n the last post I spoke about Non Functional Requirement – Performance, and in this post I am going to talk a bit about Scalability and this  is a closest parameter related to performance too. Generally during in a requirement gathering  phase this requirement documented as – need a scalable solution and should have high performance or quick response time. Well I don’t think I would write this note technically, but if anyone else I would expect this type of “single line” statement.

What is the definition of scalability, and what should be those finer line items that needs to be considered for a CDI solution to be scalable?

Scalability definition from Wikipedia (http://en.wikipedia.org/wiki/Scalability)

Scalability is the ability of a system, network, or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth.

and the definition of the related word Scalable system is

A system whose performance improves after adding hardware, proportionally to the capacity added.

All these sound simple, yes, true but these sentences have great meanings inside, and to realize this as implementation is the task for any engineer, manager, technology specialist or an architect. Now this is done, let me get into next point and the most important one for this post.

Scalable database system is the system that can handle additional users, a number of transactions and more in future, data growth and all this with speed, preciseness, accuracy. Any system can be scaled up or out, and when it comes to database system like SQL Server, a scale up option is most easiest but could get costlier, and scale out needs to be well planned. When I say well planned it starts from non functional requirements understanding, solution design, design of data model, development, capacity planning with infrastructure setup.

Now that we know definition of scalability and scalable database system, I would like to list down those questions that helps me in deciding the needs of a scalable database system in a CDI solution.

  • What is the expected annual growth in data volume (for each of the source systems) in the next three to five years?

This is in same line to what a scalable database system should meet, however the importance is more when dealing with CDI – a data integration solution. An approximate projection of data growth of different data sources would help in deciding if the CDI solutions database could handle that growing data. The volumes of data plays very important part in meeting your response times or the SLAs for data processing, and then publish for analytics consumption.

  • What is the projected increase in number of source systems?

Similar question as above, but an idea of how many data source systems should be integrated along with the volume of data generated from these systems would help in knowing the volume of data that needs to be handled.

  • How many jobs to be executed in parallel?

I couldn’t disagree with an already documented point on this question, so mentioning as a repeat of someone who said about this meaningfully.

There is a difference between executing jobs sequentially and doing so in parallel. The answer to this question impacts performance and could affect the degree to which the given application can scale.

  • Is there any need for distributed processing?

An answer to this question will give a thought of existing systems distributed processing capabilities or the capacities, the organizations policies towards distributed processing. But this is at the level of understanding the the clients requirements, and should be taken as only leads or inputs to decide if distributed processing should be considered or not. A distributed processing of data is double-edged decision, and needs to be considered for solution after plenty of thought given on trade-offs.

The list of questions for scalability is short, but all these are essential to be addressed to have a high-performing and scalable system. I have nothing to give as an expert advise, but the last suggestion is to get these responses to have predictable behavior of any CDI solution.

Please feel free to comment and let me know your thoughts too.

CDIs–Non Functional Requirement–Performance March 12, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , , , ,
add a comment

In today’s post I will talk about the NFR which I consider as one of the most important for the stability of the system for a CDI implementation – Performance.

This topic is vast and intense and there are several factors that needs to be considered while architecting or designing a system. A thought passes through my mind whenever, and even now – writing principles to achieve performance and scalability is easy which is mostly scholarly but the challenge lies during realization phase. If a systems performance and scalability has to be achieved then the appropriate solution blue print, right design, best coding, precise testing and deployment has to be in place. See as I said the moment this topic is opened up so much of lateral and tangential thinking comes into place. Now I am going to hold this and let’s talk about the above NFRs significance and then how it could be used for building CDI solution.

Performance – A short description and explanation that I picked from one of the sources I have shared in my earlier post, but modified for my better understanding and of course to suit the solution that I was building for the client.

So here it is, “performance is about the resources used to service a request and how quickly an operation can be complete”, e.g., response time, number of events processed. Its known that in a CDI type solution where the data is processed in GB to TB, and in large number of cycles response time could be less appropriate, but identifying the pre-determined response time for a particular process will help in defining the SLAs (SLA not from monitoring or maintenance perspective) for the downstream applications on when the data would be available.

I will give a short explanation here on what I intended by defining the performance. Say for e.g. I have to bring data from different data sources, apply lot of cleansing rules on the in-flow data, massage it for the intermediary schema, go map and publish per need of business and this is done on few millions of rows on different tables, or on GBs of data. I can’t assess the performance of batch applications at row basis or seconds basis. In a transactional system I would have gone by response time to be in seconds, and this doesn’t mean I can define that the job /batch application should finish the processing in an hour to 4 hours defined in the maintenance windows. I would prefer to abide per business needs of maintenance windows, however while looking for information on this NFR I am pre-defining the systems behavior – predictably, and also ensuring that to meet this NFR I need to have batch applications, real-time processing, appropriate design to handle volumes and velocity of the data. Yes, too much of thought process and explanation but the short message of the story is that “Meet the business needs of maintenance window by defining the SLAs for the system”. This is key for an integration solution or in specific to CDI.

Now I shall get back to the questions that I posed across to assess on what is the performance or rather SLA levels should the system adhere. The below list of questions are repeat of my previous post, but I shall put a bit of more text on why this question was asked.

  • What is the batch window available for completion of complete the batch cycle?

The batch window is that number of hours that are available for the CDI system to process data. This usually is mutually agreed inputs from the business and the IT folks, and knowing this information helps in deciding if the system can meet the proposed number of hours with the existing data sources, complexity in the processing logics, and volumes of data that needs to be processed.

  • What is the batch window available for completion of individual jobs?

The response to this question will help in deciding the dependency of batch job, and also in designing batches efficiently.

  • What is the frequency of the batch that could be considered by default?

The inputs for frequency of the job will help in assessing on how the loads on the system would vary during different weekdays. At the end this information would essentially help in deciding the frequencies of jobs, how to optimally use the resources of the system knowing that a typical day may not have the requisite load.

  • What are the data availability SLAs provided by the source systems of the batch load?

The source systems data availability would help in assessing if the proposed SLAs would be met or not. Essentially it is close to realistic if the data availability of the source system is close to what is being considered as SLA requirement for CDI.

  • What is the expected load (peak, off-peak, average)?

This would be more of a response from the IT team who would have set internal benchmarks on peak, off-peak and average load that a particular system should have. The historical data of the existing source systems or integration systems would be good references, and the answer to this question would help in designing optimal batches, frequencies and in proposing the relevant hardware for implementation.

  • What is the hardware and software configuration of the environment where the batch cycle could be run?

Strange question isn’t it!? Yes indeed, but trust me this questions response sets the pulse of what is expected from the CDI’s hardware and software if it is to be procured based on fresh inputs or if the solution has to be co-hosted on the existing systems with a pre-defined or prescribed hardware or software.

The following question will help in digging deeper and further on the standards that needs to be adhered for new solutions.

  • Are the resources available exclusive or shared?

The last and final question in evaluating on what is the expected performance behavior is to find out if the new solution has to be hosted on existing systems or new systems in to-to.

This has been a long post, and I am thinking I should pause for now. In the next post I shall talk about the NFR – Scalability.

Please feel free to comment and let me know your thoughts.

CDIs – Non functional requirements January 24, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , , , ,
add a comment

Its been quite sometime that I had posted on this topic and I believe its time to share the next post which is to do with technical work I had done in this assignment.

NFR – Non functional requirements! Are you aware of this term? Yes you would be, should be and could be. But when I talk to Architects, DBAs, and Project Management Teams I noticed that these group of people understand at a layman level, but don’t understand much at depths. It usually ends with “Yes, NFRs are very important like response time, total number of users, number of concurrent users”.

Somehow I don’t like stopping at this level, and if you are building a solution for a transactional system or analytical system you will have to get deeper to know the behavior of the system at a level of Availability, Scalability, Performance, Multi-Tenancy when business is on work. The words I have brought up here are well known, but when you start digging further you will notice we would be covering lot more. At the end, the built solution should comply to NFRs as an integrated piece of all the above said factors.

So for the CDI solution I had built, I considered the below list as most important NFRs. I don’t believe this is the exhaustive list, but I know these are to be addressed for me to build the architecture.

· Availability

o Disaster Recovery

§ How important is the system and how quickly does it need to be returned to online in case of an outage or failure?

· Data

o Data Staleness

§ How up-to-date does your information need to be?

§ Do you need real-time or are delays acceptable?

o Data Retention

o Internationalization

· Performance

o What is the batch window available for completion of complete the batch cycle?

o What is the batch window available for completion of individual jobs?

o What is the frequency of the batch that could be considered by default?

o What are the data availability SLAs provided by the source systems of the batch load?

o What is the expected load (peak, off-peak, average)?

o What is the hardware and software configuration of the environment where the batch cycle could be run?

o Are the resources available exclusive or shared?

· Scalability

o What is the expected annual growth in data volume (for each of the source systems) in the next three to five years?

o What is the projected increase in number of source systems?

o How many jobs to be executed in parallel?

o Is there any need for distributed processing?

· Reliability

o What is the tolerance level or percentage of erroneous data that is permitted?

o Is there any manual process defined to handle potential failure cases?

o Under what conditions is it permissible for the system/process to not be completely accurate, and what is the frequency/probability of such recurrence?

· Maintainability

o What is the maintenance/release cycle?

o How frequently do source or target structures change? What is the probability of such change?

· Extensibility

o How many different/divergent sources (different source formats) are expected to be supported?

o What kind of enhancements/changes to source formats are expected to come in?

o What is the probability of getting new sources added? How frequently does the source format change? How divergent will the new source formats be?

· Security

o Are there any special security requirements (such as data encryption or privacy) applicable?

o Are there any logging and auditing requirements?

· Capacity

o How many rows are created every day in transactional DB, CRM and ERP?

o What is the size of data that is generated across LOBs in DB, CRM, ERP systems?

o What is the agreeable processing time of data in data hub?

o How stale can the reporting data be?

o What are the agreeable database system downtime hours?

§ Administration & Maintenance

§ Data Refresh /Processing

o How many concurrent users will access the report /application?

o What is the total number of users expected to use the reporting system?

o What are the expected complexity levels of reporting solution? (High, Medium, Low)

o How much of the processed data and data used for reporting has to be archived /purged?

o How many years of data have to be archived in the system?

o How many years of yester year’s data have to be processed?

o What is the possible backup and recovery time required for the Data Hub and Reporting system?

o What are the availability requirements of the data hub and reporting system?

o How many users will be added year on year for the reporting system, and what are the types of users?

o What will be year on year growth of data in the transactional source system?

o What could be the other sources of data that could be added during a period of 1-2 years, and how much of these data sources could provide data to data hub?

o Are there any other external systems that would require the data from data hub?

o How many rows of the transactional DB, CRM and ERP needs to be processed for the Data Hub?

o How much data is currently processed for reports?

o What type of data processing queries exist in the system which provide Static /Ad-Hoc reports?

o What types of reports are currently available, and what is the resource usage?

o What are the query profiling inputs for these data processing queries /reporting queries?

§ CPU usage

§ Memory usage

§ Disk usage

· Resource Utilization

o Is there an upper limit for CPU time or central processing systems, etc.?

o Are there any limitations on memory that can be consumed?

o Do the target store/ database/ file need to be available 24X7?

o Any down time is allowed?

o Is there any peak or off peak hours during which loading can happen?

o Are there crucial SLAs that need to be met?

o What if SLAs are missed are there any critical system/ business impact?

This list was prepared after researching the web for similar implementations, best practices, standards, and based on the past experiences.

I am sharing few links where I found information on capacity planning which had questions that were around NFRs



Please feel free to comment.

Cheers and Enjoy!

Few learning’s..suggestions based on experience January 10, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , , ,
add a comment

This is a continuation post of my CDI series however less technical, and I felt these were very essential for success of the CDI projects. Some of these were pre-meditated pointers based on industry based CDI experiences, and few from my experience.

The list isn’t exhaustive but have high importance and higher criticality. So let’s start off.

  1. The type of CDI that was to be adopted for implementation had to have lot of stake by several teams. It starts with Business Teams, Technical Teams (usually Deployment), and most importantly the Senior Management. So stakeholder participation is one of the top most important points.
  2. Constant engagement with business teams is critical for understanding the functional requirements and should be maintained through design.
  3. The CDI is to bring customer-centric solution which would involve bringing in customer and customer related information from different lines of business into a repository. Sometimes the customer relation information from several businesses is so overwhelming and this could lead to scope-leakage, and schedule hits. What does it matter to a technical person like me? Well the solution and the design would be hit and hurt.
    To avoid pain in later stages, its essential to focus on small and easy business groups for implementation. This could be contradicting the purpose of solution & design – “solution and design should cover the customer-centricity without loss of any information”, but is harmless when we consider only for implementation.
  4. Establishing a Data Governance team which would monitor the enterprise level information flow into and out of the the CDI. This team has to work closely with the business teams from early phases of project, continue to work with the implementation teams and be responsible for that data after release to the production.
  5. The last and never the least point is having proficient testing team who would understand the functional requirements with the implementation teams, and carry out well-defined test strategies at each phases of the project once the development is kicked off.

I couldn’t explain on why I have listed the above points because these are self-explanatory and represent the importance significantly.

I am sharing few links that I had been reading from project execution perspective, and these links talk more than nuances of executing such projects and are surely worth the share Smile.






Cheers and Enjoy!

What is 360° view of a Customer? January 3, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , ,
add a comment

A much delayed post as part of my series on CDI (Customer Data Integration). My earlier post had an introduction on “How do I see 360° view of my Customer?” . In this post I shall talk a bit about on what is 360° in business terms, and what it is for a person like me.

There are multiple definitions of 360° view of a customer in a business. Let me pick up an example from the Travel industry and this is the client for whom I proposed the solution. The customer is well known online booking agent, and has booking businesses across several modes of transportation. It starts from a Car Rental to Airline booking, and all this happens through two well known modes of booking – online & offline.

Whichever is the mode of booking, the booking agent website finally deals with the end customer who is either a traveler or a flyer. The business verticals – marketing through customer relationship teams would want to know the “behavior” of the customer. Sadly this type of information could be captured mostly for online transaction than of offline. Hence I could manage understanding only the “online behavior”.

Why does business need a customer behavior? In today’s world everything revolves around interest of a customer and providing an optimal travel package based on his past, recent and probably future interests. Its definitely not in lines of earlier way of marketing and selling pre-packaged travel solutions. So what does a customer behavior mean here? It could be any of these at least, and many more than the below list.

  1. How many times has a registered customer visited and clicked the search flights or other website features link?
  2. How many times has a customer reached a booking stage but dropped off?
  3. What are the type of these customers? Are they regular visitors (registered customers)?
  4. What are the age groups? What season do these customers peak on the website, particular links?
  5. Have these customers done any booking earlier on the website? What are their past transactions (successful, failed, abandoned)?
  6. Have these customers ever interacted with the customer support? How had been the interaction? What is that customer support could be of more help?
  7. How does all the above data help the marketing and sales team? Have the sales and marketing team of different business (travel mode) units made in-roads to a customer need?
  8. And finally, is the customer genuine by his identity – name, age, mobile /cell #, mail id?

These were the key pointers from the business perspective, and what does all this mean to person like me? Here is the list of things that came up as first thought and answers to these helped me in bringing up a solution.

  1. Are there any implemented mechanisms that capture the customer behavioral data?
  2. What are those data sources?
  3. How clean are these data sources? How authentic and genuine are these data sources? Are there any duplication of data or information at a master level. For e.g. Is customer data duplicated?
  4. How many data sources should be dealt to bring that single view of the customer? Is this for master and transactional data?
  5. What are the type of data sources? Are these heterogeneous at data technology and platform levels?
  6. What are different forms of data – structured, semi-structured, and non-structured?
  7. What are the volumes of data or # of transactions that generate data in these data sources?
  8. Finally what is the one key that could be used to tie the transactions, behavior of a customer with that one key?

This post was self-interrogative and pointers to these questions from business and technology team would bring up the solution – architecture, data model, ETL, report design.

Do share what you think or what else could be included?

Cheers and Enjoy!

How do I see 360° view of my Customers? December 6, 2012

Posted by msrviking in Business Intelligence, Integration.
Tags: ,
1 comment so far

What does 360° view of the Customers mean? This is one of the first questions that came to my mind when my client had asked us “How do I see 360° view of my Customers”? Trust me, this opened a new world for me where I spent working on core-technical subjects all the time. This is the eye-opener for me into the world of “understanding data from business perspective.” This question didn’t stop me in asking back the customer on what is that he wants to do with the data (for sure I know its all about data at EOD)? My first question started here and I ended in doing the Architecture of Customer Data Integration (CDI) solution.

I am going to walk through several posts on how this journey of CDI or otherwise commonly known as Data Hub had happened. In this post I shall share few links that talk about definitions, architecture, implementation pointers of Data Hub or CDI and what it means to different businesses, CIOs and so on.

I surely have gained lot of knowledge after reading through several articles, blog posts, books, forums, internal and external discussions however not all could be condensed as blog posts. So you would see bundle of links, and of course my views around these articles (per article or all together) so that I too add my experience and learning’s. After all that is what this blog and blogging is about, isn’t it Smile?

So here are few reference links (talks at Architecture and Business levels) that I would want to share to have your thought process to kick-start if you are in search of the meaning or definition of the word CDI or Data Hub or 360° view of the customer?


>> This is one of the most important articles I have noted all over the web. The excerpts in this article is from a book called as Master Data Management and Customer Data Integration for a Global Enterprise authored by Alex Berson, and Larry Dubov. You could find a copy here if you want to own one.

The content of this article was revelation to me on the concept of CDI and I loved reading it. I kept reading the same article several times over a period of time until I published the intended architecture for implementation.

I wish to talk a lot about the article and the book but I will reserve those for other posts where I will pick each of the topics – used as guidelines and talk in detail on how I brought up the architecture. So don’t wait to hit the above link to read more.


This is the next website, where Manjeet has written extensively on how CDI could be adopted, who could adopt, different models, best practices, principles, its relation with Master Data Management (MDM). I would say that one should read further into these topics as a supplementary for the above, first link.


This article gives more or less same information, but was relevant for me so that I could validate on what I was reading. Good one.

This is how I started my journey with CDI, and stay tune to read more. I would probably share more links as and when I dig my mail folders.

Cheers and Enjoy!