jump to navigation

CDIs – Non functional requirements January 24, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , , , ,
add a comment

Its been quite sometime that I had posted on this topic and I believe its time to share the next post which is to do with technical work I had done in this assignment.

NFR – Non functional requirements! Are you aware of this term? Yes you would be, should be and could be. But when I talk to Architects, DBAs, and Project Management Teams I noticed that these group of people understand at a layman level, but don’t understand much at depths. It usually ends with “Yes, NFRs are very important like response time, total number of users, number of concurrent users”.

Somehow I don’t like stopping at this level, and if you are building a solution for a transactional system or analytical system you will have to get deeper to know the behavior of the system at a level of Availability, Scalability, Performance, Multi-Tenancy when business is on work. The words I have brought up here are well known, but when you start digging further you will notice we would be covering lot more. At the end, the built solution should comply to NFRs as an integrated piece of all the above said factors.

So for the CDI solution I had built, I considered the below list as most important NFRs. I don’t believe this is the exhaustive list, but I know these are to be addressed for me to build the architecture.

· Availability

o Disaster Recovery

§ How important is the system and how quickly does it need to be returned to online in case of an outage or failure?

· Data

o Data Staleness

§ How up-to-date does your information need to be?

§ Do you need real-time or are delays acceptable?

o Data Retention

o Internationalization

· Performance

o What is the batch window available for completion of complete the batch cycle?

o What is the batch window available for completion of individual jobs?

o What is the frequency of the batch that could be considered by default?

o What are the data availability SLAs provided by the source systems of the batch load?

o What is the expected load (peak, off-peak, average)?

o What is the hardware and software configuration of the environment where the batch cycle could be run?

o Are the resources available exclusive or shared?

· Scalability

o What is the expected annual growth in data volume (for each of the source systems) in the next three to five years?

o What is the projected increase in number of source systems?

o How many jobs to be executed in parallel?

o Is there any need for distributed processing?

· Reliability

o What is the tolerance level or percentage of erroneous data that is permitted?

o Is there any manual process defined to handle potential failure cases?

o Under what conditions is it permissible for the system/process to not be completely accurate, and what is the frequency/probability of such recurrence?

· Maintainability

o What is the maintenance/release cycle?

o How frequently do source or target structures change? What is the probability of such change?

· Extensibility

o How many different/divergent sources (different source formats) are expected to be supported?

o What kind of enhancements/changes to source formats are expected to come in?

o What is the probability of getting new sources added? How frequently does the source format change? How divergent will the new source formats be?

· Security

o Are there any special security requirements (such as data encryption or privacy) applicable?

o Are there any logging and auditing requirements?

· Capacity

o How many rows are created every day in transactional DB, CRM and ERP?

o What is the size of data that is generated across LOBs in DB, CRM, ERP systems?

o What is the agreeable processing time of data in data hub?

o How stale can the reporting data be?

o What are the agreeable database system downtime hours?

§ Administration & Maintenance

§ Data Refresh /Processing

o How many concurrent users will access the report /application?

o What is the total number of users expected to use the reporting system?

o What are the expected complexity levels of reporting solution? (High, Medium, Low)

o How much of the processed data and data used for reporting has to be archived /purged?

o How many years of data have to be archived in the system?

o How many years of yester year’s data have to be processed?

o What is the possible backup and recovery time required for the Data Hub and Reporting system?

o What are the availability requirements of the data hub and reporting system?

o How many users will be added year on year for the reporting system, and what are the types of users?

o What will be year on year growth of data in the transactional source system?

o What could be the other sources of data that could be added during a period of 1-2 years, and how much of these data sources could provide data to data hub?

o Are there any other external systems that would require the data from data hub?

o How many rows of the transactional DB, CRM and ERP needs to be processed for the Data Hub?

o How much data is currently processed for reports?

o What type of data processing queries exist in the system which provide Static /Ad-Hoc reports?

o What types of reports are currently available, and what is the resource usage?

o What are the query profiling inputs for these data processing queries /reporting queries?

§ CPU usage

§ Memory usage

§ Disk usage

· Resource Utilization

o Is there an upper limit for CPU time or central processing systems, etc.?

o Are there any limitations on memory that can be consumed?

o Do the target store/ database/ file need to be available 24X7?

o Any down time is allowed?

o Is there any peak or off peak hours during which loading can happen?

o Are there crucial SLAs that need to be met?

o What if SLAs are missed are there any critical system/ business impact?

This list was prepared after researching the web for similar implementations, best practices, standards, and based on the past experiences.

I am sharing few links where I found information on capacity planning which had questions that were around NFRs



Please feel free to comment.

Cheers and Enjoy!


Few learning’s..suggestions based on experience January 10, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , , ,
add a comment

This is a continuation post of my CDI series however less technical, and I felt these were very essential for success of the CDI projects. Some of these were pre-meditated pointers based on industry based CDI experiences, and few from my experience.

The list isn’t exhaustive but have high importance and higher criticality. So let’s start off.

  1. The type of CDI that was to be adopted for implementation had to have lot of stake by several teams. It starts with Business Teams, Technical Teams (usually Deployment), and most importantly the Senior Management. So stakeholder participation is one of the top most important points.
  2. Constant engagement with business teams is critical for understanding the functional requirements and should be maintained through design.
  3. The CDI is to bring customer-centric solution which would involve bringing in customer and customer related information from different lines of business into a repository. Sometimes the customer relation information from several businesses is so overwhelming and this could lead to scope-leakage, and schedule hits. What does it matter to a technical person like me? Well the solution and the design would be hit and hurt.
    To avoid pain in later stages, its essential to focus on small and easy business groups for implementation. This could be contradicting the purpose of solution & design – “solution and design should cover the customer-centricity without loss of any information”, but is harmless when we consider only for implementation.
  4. Establishing a Data Governance team which would monitor the enterprise level information flow into and out of the the CDI. This team has to work closely with the business teams from early phases of project, continue to work with the implementation teams and be responsible for that data after release to the production.
  5. The last and never the least point is having proficient testing team who would understand the functional requirements with the implementation teams, and carry out well-defined test strategies at each phases of the project once the development is kicked off.

I couldn’t explain on why I have listed the above points because these are self-explanatory and represent the importance significantly.

I am sharing few links that I had been reading from project execution perspective, and these links talk more than nuances of executing such projects and are surely worth the share Smile.






Cheers and Enjoy!

Goal set..what next? January 8, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration.
Tags: , , , ,
add a comment

The last post in this series had Vision, Mission and Goal of building the CDI solution. In here I am going to list down few top rules that are to be adhered for building architecture and design.

The solution should be around these standards

  1. Data Hub or Customer Data Integration (CDI) is a step towards Master Data Management (MDM), hence the solution should have features as much as possible to accommodate an MDM implementation in future.
  2. The solution should have “single truth” of customer information, which would mean data from different data sources should be cleansed and consolidated.
  3. The architecture of the CDI should not miss any of the customer and customer- centric attributes. Note I have introduced a new word “customer-centric”, and its deliberate because CDI is no relevant if the solution is not customer-centric.
  4. The solution and design should meet the pre-defined NFRs (non-functional requirements), and of course on FRs (functional requirements) without having to mention always about it.
  5. The solution should be built so that the downstream activities like design, and development are realized.

I felt its worth bringing up this point and emphasize that solution has to be built and realized. Usually in such large and complex implementations such paths are lost and the pinch is felt later when we production deployments.

So we have high level standards that needs to be adhered for building architecture and design. In my next post I shall mention about the principles and decisions.

Cheers and Enjoy.

DB Map January 7, 2013

Posted by msrviking in General, Technical Documentation.
Tags: , , , , ,
add a comment

I was reading the tweets today and hit upon a tweet by @victoria_holt. This nice tweet which led me to a link with a nice map. This is created by @maslett whose blog is hosted over here .

I got reminded of London Tube Maps after seeing this map with different lines, and a circle depicting a DB technology . I don’t have to speak about this map in detail because its self-explanatory.

But as a SQL Server fan I was in search of its position in the map, and here is what I see.

SQL Server is on the track of operational, analytical and appliance and all these in the Relational Zone. There is no grading against the any other database system but the map gives picture of whole range of database technologies, their existing line of features. I hope you too will like it.


Cheers and Enjoy!

Vision, Mission and Goal of Data Hub January 7, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration.
Tags: ,
1 comment so far

In my last post I had mentioned about “What does 360° view of a Customer” mean for Data person like me. To achieve the objective of implementation I had to set the Mission, Vision and Goal.

Vision – A golden record of truth of each customer

Mission – To build a central data hub encompassing all attributes of customer and depicting the business done, through the single view

Goal – Cleansing key customer information to identify single business valid row, gather customer related information and consolidate to a picture, and report the consolidated view for business consumption

These pointers were important for architecting the solution, designing the data model , ETL’s, development of ETL’s, reporting and finally delivering business information to the end-user.

The vision, mission and goal was derived from various Data Hub /CDI implementations in the industry, opinions of CDI experts and of course after detailed study of business requirements.

I shall speak about each point in brief so that one knows what it all means.

In an Enterprise there is definitely mix of different technologies adopted and adapted to various requirements of the business. Each of these technologies surely would have different back-end systems, and these back-end systems are to support structured, semi-structured and unstructured data that is generated by applications, web sites. When there is such a heterogeneous setup, and there are possibilities that the systems are “duplicating” data.

Duplication of data can be two ways – master and transactional, and both cause high business impact with skewed up numbers, erratic and invalid references. Data Hub is a central repository of data coming from various data sources and it is important to find, identify and correct master or transactional data for getting a complete view of a customer. So the vision was to “Identify the Golden Record of Truth of each customer”.

This is not an easy job to technically implement nor to convince the business users who are already using the systems over years.

Now after identifying the “Golden Record” it is necessary to have several and all attributes of the customer which are necessary for 360° view. A short explanation of the mission but further lays the foundation for architecture and design.

The Golden Record with its all attributes for a customer is good enough, however not complete unless it is cleansed at each level and then reported as a complete picture for business users to consume.

All these sound good at 10,000 ft height but becomes as a real challenge when the execution phase is set. I shall walk through as much as possible on on solution and design encompassing these three (vision, mission, and goal) points.

Until then enjoy reading and Cheers! BTW, comments are welcome.

Who am I – DBA or Database Architect or Data Architect? January 4, 2013

Posted by msrviking in Architecture, Career.
Tags: , , , ,
add a comment

This post is not about the result that would be shown when I execute the command. There is a command in Unix environment to find the user through which a logon has happened. I am trying to run the same command on based on the context of my work.

Who am I? – I had been always trying to figure out if I am a Database Administrator, Database Architect or Data Architect. Which one of these is the ultimate goal? What should I do to become one of them as the best one out there among the best? What will this all mean in action?

All these questions are at high level, and needs lot of thinking, self-reflection, resolutions, action items, try and implement, evaluate and mark complete. I don’t think this is going to happen in a year or two or little more. It all depends on how quickly one would want to be there – that goal. Okay, now back to my question Smile.

As of today if I could think that I was a Database Administrator (DBA) on SQL Server, Oracle (a bit at least), then DBA-turned-Database Architect. I believe I am still in the path of becoming a complete Database Architect. To be one complete I have set this criteria for myself, and all these are based on personal experiences, interactions, following the industry experts, idols.

A Database Architect can be one complete individual if he /she has

  1. Expertise on one or two of the RDBMS technology or non-RDBMS technology
  2. Expertise on one of the business domains
  3. Expertise on transactional and analytical systems
  4. Expertise on infrastructure management, and architecture
  5. Expertise on application architecture, and few domain specific products
  6. And finally breadth of the knowledge in all these areas, with depth from 1-4

So if one becomes a complete DB Architect, what next? Surely the traits and experience of DB Architect role leads to Data Architect who is the Enterprise Information Management Architect. Don’t ask me on how I named this EDIMA – and neither I checked over the web if I am infringing into someone’s already pro-claimed word of EDIMA.

A EDIM Architect is the guy who would have vision on how the information in enterprise would flow in, and out of the business systems. The Architect would decide the technology, products to be adopted, over short and long term. He /She also would decide on how and what type of data models have to be built. At the end that is all I could think of the ultimate goal and role + responsibility. I am sure I could add more, and if anyone knows better than me, please do feel free to comment.

A link which talks neatly about Data Architect, Database Architect and Database Administrator.

Cheers and Enjoy.

What is 360° view of a Customer? January 3, 2013

Posted by msrviking in Architecture, Business Intelligence, Integration.
Tags: , , ,
add a comment

A much delayed post as part of my series on CDI (Customer Data Integration). My earlier post had an introduction on “How do I see 360° view of my Customer?” . In this post I shall talk a bit about on what is 360° in business terms, and what it is for a person like me.

There are multiple definitions of 360° view of a customer in a business. Let me pick up an example from the Travel industry and this is the client for whom I proposed the solution. The customer is well known online booking agent, and has booking businesses across several modes of transportation. It starts from a Car Rental to Airline booking, and all this happens through two well known modes of booking – online & offline.

Whichever is the mode of booking, the booking agent website finally deals with the end customer who is either a traveler or a flyer. The business verticals – marketing through customer relationship teams would want to know the “behavior” of the customer. Sadly this type of information could be captured mostly for online transaction than of offline. Hence I could manage understanding only the “online behavior”.

Why does business need a customer behavior? In today’s world everything revolves around interest of a customer and providing an optimal travel package based on his past, recent and probably future interests. Its definitely not in lines of earlier way of marketing and selling pre-packaged travel solutions. So what does a customer behavior mean here? It could be any of these at least, and many more than the below list.

  1. How many times has a registered customer visited and clicked the search flights or other website features link?
  2. How many times has a customer reached a booking stage but dropped off?
  3. What are the type of these customers? Are they regular visitors (registered customers)?
  4. What are the age groups? What season do these customers peak on the website, particular links?
  5. Have these customers done any booking earlier on the website? What are their past transactions (successful, failed, abandoned)?
  6. Have these customers ever interacted with the customer support? How had been the interaction? What is that customer support could be of more help?
  7. How does all the above data help the marketing and sales team? Have the sales and marketing team of different business (travel mode) units made in-roads to a customer need?
  8. And finally, is the customer genuine by his identity – name, age, mobile /cell #, mail id?

These were the key pointers from the business perspective, and what does all this mean to person like me? Here is the list of things that came up as first thought and answers to these helped me in bringing up a solution.

  1. Are there any implemented mechanisms that capture the customer behavioral data?
  2. What are those data sources?
  3. How clean are these data sources? How authentic and genuine are these data sources? Are there any duplication of data or information at a master level. For e.g. Is customer data duplicated?
  4. How many data sources should be dealt to bring that single view of the customer? Is this for master and transactional data?
  5. What are the type of data sources? Are these heterogeneous at data technology and platform levels?
  6. What are different forms of data – structured, semi-structured, and non-structured?
  7. What are the volumes of data or # of transactions that generate data in these data sources?
  8. Finally what is the one key that could be used to tie the transactions, behavior of a customer with that one key?

This post was self-interrogative and pointers to these questions from business and technology team would bring up the solution – architecture, data model, ETL, report design.

Do share what you think or what else could be included?

Cheers and Enjoy!