jump to navigation

What am I doing now-a-days over last few months..? January 30, 2014

Posted by msrviking in General, MySQL, Performance tuning.
Tags: , , , ,
add a comment

I had been busy doing stuff on SSIS for few months, then on MySQL from the last post I have put up in the blog. What is that I am doing on MySQL and writing a post on a SQL Server blogging site? Well, I am kind a trying to get hold on how MySQL works, while I am trying to stabilize the performance of system. Surely I am not a MySQL geek to look at OS, Hardware, MySQL configurations deeply, but with little knowledge I had and gaining as time is going by I am trying to troubleshoot performance of the queries, indexing, and partitioning tables. These are the few things that I have been trying to put in place even before I get on other levels of performance engineering.

A thread is already on if I could shard the database, introduce read-write splitting to provide a scale-out solution by using out of the box features like MySQL cluster or customizing by partitioning tables, sharding them into different nodes, read-write splitting using MemCache.

These are lot of thinking in terms of MySQL but then I don’t have such flexibility in SQL Server although there are some implementations that use read-write splitting, using load balancers at application level and not database. I am highlighting some of those that are not there in SQL Server, and available to be used at fullest in MySQL. But then there are many of those missing in MySQL which is so good to use and work out things in SQL Server.

Some of the ones that I am missing MySQL badly are

  • Profiler
  • Graphical execution plan
  • Indexes with include
  • Free and easy to use monitoring tools for OS, Hardware
  • Multiple query plan algorithms
  • Proper documentation (implementation or bugs)

This post is to share what I am seeing those top few things in MySQL and few things that I am missing when I think of SQL Server. It’s kind of missing or home-sick post I would say.

I will keep writing as and when I learn new thing and definitely put a comparison with SQL Server features.

Happy reading.

CDIs-Non Functional Requirements-Other few May 21, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Design, Integration, Security.
Tags: , , , , , , , , ,
add a comment

In the series of CDIs-Non Functional Requirements I had covered on NFRs like Performance, Scalability, Availability, Reliability, and Utilization Benchmarks that could be useful to build a CDI system. In this post I shall talk about less known but important NFRs like Maintainability, Extensibility and Security. I wouldn’t term these as NFRs but questions on these topics will help you to get an insight on how the system is expected to behave, from a business and technical perspective. Also, these don’t work in silos, instead are linked back to some of the ones mentioned in the above list.

Maintainability

  1. What is the maintenance/release cycle?

    This question will give us an idea on the practices that is followed by the business and technical teams about the release cycle. And each release cycle would mean there would be change in the source system which may be applicable to down-stream applications. The more the release cycles, the difficult is the job of maintaining code base. And to avoid long-term overheads the system, data model and the low level design of the ETLs should be carefully built considering that this changes would be constant, and frequent.

  2. How frequently do source or target structures change? What is the probability of such change?

    This point is relevant to the first question, but elicits information one level deeper by asking “if more maintenance cycles, what are those changes at source and expected to be in target”. If the changes are constant, frequent and less complex then the data model and the ETLs have to be configurable to accommodate ‘certain’ changes in the source. The configurability comes with a rider and tradeoff on other NFR like performance. The changes on data source could affect the performance of the ETL and sometimes the business laid SLA can’t be met.

Now having said this, I presume the next NFR will be closely related with Maintainability.

Configurability

The answers to queries under this topic is supposedly to be challenging for the business, and technical teams. Not everyone is sure of what should be configurable and what shouldn’t be based on the changes that are expected from business at the source system level. One would get the answer of “not sure”, “may be”, “near future quite possible”, “probably” the source will change, and what change will remain as a question. The challenge of providing an appropriate solution at different layers will be a daunting task for the technical team.

  1. How many different/divergent sources (different source formats) are expected to be supported?

    The answer to this question will help in understanding what formats of sources (csv, tsv, xml..etc…) have to be supported, and if there is plenty of difference then alternate design practices could be implemented on the target which could provide extensibility to all formats.

  2. What kind of enhancements/changes to source formats are expected to come in?

    An answer to this point would help in deciding if there be abstract transformations or reusable mappings.

  3. What is the probability of getting new sources added? How frequently does the source format change? How divergent will the new source formats be?

    This information will help in knowing how often the sources format change, and is it with existing sources or with new ones. Again it would also help in deciding between abstract transformations or reusable mappings.

The last NFR is Security which is usually the last preferred in any system architecture and design, but most important.

Security

In case of CDI we are dealing with sensitive information of the customer and the transaction details. It is important to understand how business treats this type of data, and how do security compliance team want to consider the data is being gathered from different source systems and consolidated at a single place – “Data hub”. The below bunch of questions cover on the data protection level rather than access levels or privileges of different users.

  1. Are there any special security requirements (such as data encryption or privacy) applicable?

    An answer to this question usually would be “no”, but there are certain fields that are brought in from CRM and ERP systems and needs to be hidden from any misuse in case of breach of security. It is suggested that this question is explained well with a real scenario, and then a decision of having data or database encryption enabled or not could be taken.

  2. Are there any logging and auditing requirements?

    This is least required since the data from different systems is mostly massaged and made available in reporting format through a different data sink. A discussion in here would help in deciding if the security should be handled at reporting level (enabling different security features), rather than in massive data processes.

I hope all these posts on NFR for CDIs helps you in architecting, designing Data Hub system that is highly available, scalable, high performing, and most reliable.

Cheers!

Goal set..what next? January 8, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration.
Tags: , , , ,
add a comment

The last post in this series had Vision, Mission and Goal of building the CDI solution. In here I am going to list down few top rules that are to be adhered for building architecture and design.

The solution should be around these standards

  1. Data Hub or Customer Data Integration (CDI) is a step towards Master Data Management (MDM), hence the solution should have features as much as possible to accommodate an MDM implementation in future.
  2. The solution should have “single truth” of customer information, which would mean data from different data sources should be cleansed and consolidated.
  3. The architecture of the CDI should not miss any of the customer and customer- centric attributes. Note I have introduced a new word “customer-centric”, and its deliberate because CDI is no relevant if the solution is not customer-centric.
  4. The solution and design should meet the pre-defined NFRs (non-functional requirements), and of course on FRs (functional requirements) without having to mention always about it.
  5. The solution should be built so that the downstream activities like design, and development are realized.

I felt its worth bringing up this point and emphasize that solution has to be built and realized. Usually in such large and complex implementations such paths are lost and the pinch is felt later when we production deployments.

So we have high level standards that needs to be adhered for building architecture and design. In my next post I shall mention about the principles and decisions.

Cheers and Enjoy.

Who am I – DBA or Database Architect or Data Architect? January 4, 2013

Posted by msrviking in Architecture, Career.
Tags: , , , ,
add a comment

This post is not about the result that would be shown when I execute the command. There is a command in Unix environment to find the user through which a logon has happened. I am trying to run the same command on based on the context of my work.

Who am I? – I had been always trying to figure out if I am a Database Administrator, Database Architect or Data Architect. Which one of these is the ultimate goal? What should I do to become one of them as the best one out there among the best? What will this all mean in action?

All these questions are at high level, and needs lot of thinking, self-reflection, resolutions, action items, try and implement, evaluate and mark complete. I don’t think this is going to happen in a year or two or little more. It all depends on how quickly one would want to be there – that goal. Okay, now back to my question Smile.

As of today if I could think that I was a Database Administrator (DBA) on SQL Server, Oracle (a bit at least), then DBA-turned-Database Architect. I believe I am still in the path of becoming a complete Database Architect. To be one complete I have set this criteria for myself, and all these are based on personal experiences, interactions, following the industry experts, idols.

A Database Architect can be one complete individual if he /she has

  1. Expertise on one or two of the RDBMS technology or non-RDBMS technology
  2. Expertise on one of the business domains
  3. Expertise on transactional and analytical systems
  4. Expertise on infrastructure management, and architecture
  5. Expertise on application architecture, and few domain specific products
  6. And finally breadth of the knowledge in all these areas, with depth from 1-4

So if one becomes a complete DB Architect, what next? Surely the traits and experience of DB Architect role leads to Data Architect who is the Enterprise Information Management Architect. Don’t ask me on how I named this EDIMA – and neither I checked over the web if I am infringing into someone’s already pro-claimed word of EDIMA.

A EDIM Architect is the guy who would have vision on how the information in enterprise would flow in, and out of the business systems. The Architect would decide the technology, products to be adopted, over short and long term. He /She also would decide on how and what type of data models have to be built. At the end that is all I could think of the ultimate goal and role + responsibility. I am sure I could add more, and if anyone knows better than me, please do feel free to comment.

A link which talks neatly about Data Architect, Database Architect and Database Administrator.

Cheers and Enjoy.