jump to navigation

CDIs-Non Functional Requirements-Other few May 21, 2013

Posted by msrviking in Architecture, Business Intelligence, Data Integration, Design, Integration, Security.
Tags: , , , , , , , , ,
add a comment

In the series of CDIs-Non Functional Requirements I had covered on NFRs like Performance, Scalability, Availability, Reliability, and Utilization Benchmarks that could be useful to build a CDI system. In this post I shall talk about less known but important NFRs like Maintainability, Extensibility and Security. I wouldn’t term these as NFRs but questions on these topics will help you to get an insight on how the system is expected to behave, from a business and technical perspective. Also, these don’t work in silos, instead are linked back to some of the ones mentioned in the above list.

Maintainability

  1. What is the maintenance/release cycle?

    This question will give us an idea on the practices that is followed by the business and technical teams about the release cycle. And each release cycle would mean there would be change in the source system which may be applicable to down-stream applications. The more the release cycles, the difficult is the job of maintaining code base. And to avoid long-term overheads the system, data model and the low level design of the ETLs should be carefully built considering that this changes would be constant, and frequent.

  2. How frequently do source or target structures change? What is the probability of such change?

    This point is relevant to the first question, but elicits information one level deeper by asking “if more maintenance cycles, what are those changes at source and expected to be in target”. If the changes are constant, frequent and less complex then the data model and the ETLs have to be configurable to accommodate ‘certain’ changes in the source. The configurability comes with a rider and tradeoff on other NFR like performance. The changes on data source could affect the performance of the ETL and sometimes the business laid SLA can’t be met.

Now having said this, I presume the next NFR will be closely related with Maintainability.

Configurability

The answers to queries under this topic is supposedly to be challenging for the business, and technical teams. Not everyone is sure of what should be configurable and what shouldn’t be based on the changes that are expected from business at the source system level. One would get the answer of “not sure”, “may be”, “near future quite possible”, “probably” the source will change, and what change will remain as a question. The challenge of providing an appropriate solution at different layers will be a daunting task for the technical team.

  1. How many different/divergent sources (different source formats) are expected to be supported?

    The answer to this question will help in understanding what formats of sources (csv, tsv, xml..etc…) have to be supported, and if there is plenty of difference then alternate design practices could be implemented on the target which could provide extensibility to all formats.

  2. What kind of enhancements/changes to source formats are expected to come in?

    An answer to this point would help in deciding if there be abstract transformations or reusable mappings.

  3. What is the probability of getting new sources added? How frequently does the source format change? How divergent will the new source formats be?

    This information will help in knowing how often the sources format change, and is it with existing sources or with new ones. Again it would also help in deciding between abstract transformations or reusable mappings.

The last NFR is Security which is usually the last preferred in any system architecture and design, but most important.

Security

In case of CDI we are dealing with sensitive information of the customer and the transaction details. It is important to understand how business treats this type of data, and how do security compliance team want to consider the data is being gathered from different source systems and consolidated at a single place – “Data hub”. The below bunch of questions cover on the data protection level rather than access levels or privileges of different users.

  1. Are there any special security requirements (such as data encryption or privacy) applicable?

    An answer to this question usually would be “no”, but there are certain fields that are brought in from CRM and ERP systems and needs to be hidden from any misuse in case of breach of security. It is suggested that this question is explained well with a real scenario, and then a decision of having data or database encryption enabled or not could be taken.

  2. Are there any logging and auditing requirements?

    This is least required since the data from different systems is mostly massaged and made available in reporting format through a different data sink. A discussion in here would help in deciding if the security should be handled at reporting level (enabling different security features), rather than in massive data processes.

I hope all these posts on NFR for CDIs helps you in architecting, designing Data Hub system that is highly available, scalable, high performing, and most reliable.

Cheers!

Advertisements

SSAS named instance February 2, 2010

Posted by msrviking in Security.
add a comment

Here is what I have learnt today while configuring a static port for a named instance of AS for one of my clients.

As per best practices for Security guidelines given by MSFT, SQL Server Browser could be disabled and to connect to named instance of Analysis Services we would need a static port. It was fun doing this although there are good steps around in BOL, and few other links.

However for benefit of having this stored in a knowledge repository I decided to list down steps clearly in here.

1) Goto SQL Server configuration manager and over the named instance in the SQL Protocols, click on the TCP/IP.

2) Under protocol table make the “Listen All” to “No”.

3) Under IP Addresses tab, make “TCP Dynamic Ports” to blank from “o” for IP1, “Enabled” value to “Yes”. Also don’t forget to put in a value “TCP Port”.

4) Under IP 2, set “Enabled” to “No”, and in IP All set TCP Dynamic Ports to blank from “o”, and the TCP Port to the port number you want listen.

5) And before restarting the service ensure that you put the port number you want to listen in the server property page of analysis services.

At the end restart the service, and you will have the analysis services named instance up and running, and connecting on your defined port without SQL Server Browser service running.

Some of the links that gave me leads are

http://dinesql.blogspot.com/2009/08/connecting-to-analysis-services-with.html
http://blogs.msdn.com/sqlserverfaq/archive/2009/09/08/sql-server-analysis-services-port-sql-2005-2008.aspx
http://support.microsoft.com/kb/823938
http://msdn.microsoft.com/en-us/library/cc966526.aspx

I hope this quick post helps. Enjoy!

SQL Server connectivity issues November 18, 2009

Posted by msrviking in Security.
add a comment

Strangely for past few days people from different projects are approaching me to find solution for the SQL Server connectivity and login failure issues. To my surprise I see this happening all of a sudden,uh, may be the teams realized suddenly that I exist to help :). I don’t mind helping and I indeed got the issues fixed. How did I do this, well I know few of the blogs that are part of my blog rolls which gives insight information on all these kind of issues.

I thought I should share all the links which I referred as part of my finding solution. Although the links are about troubleshooting SQL Server 2005, it definitely applies to SQL Server 2008 also.

SQL Server 2005 Connectivity Issue Troubleshoot – Part I
SQL Server 2005 Connectivity Issue Troubleshoot – Part II
SQL Server 2005 Connectivity Issue Troubleshoot – Part III
Understanding “login failed” (Error 18456) error messages in SQL Server 2005

I hope these links gives you guys enough information too!

Cheers.

First post on WordPress – Database Encryption October 6, 2008

Posted by msrviking in Security.
add a comment

Alright this is my first posting on wordpress.com and I am little excited. Okay, let me leave my feelings behind and come to the topic – Database Encryption.

One of my client had come up with a question on SQL Server 2005 Database Encryption. Although, I haven’t worked extensively (exhaustively!) in designing encryption policies for any databases I gathered my thoughts and had this for him. The context was set before I could get back and it was in the form of questions

When we have data is stored in columns of different data types with different length in an encrypted format (except the image values),

1) What will be the performance impact of encryption and decryption of these columns? Should all the columns be encrypted?
2) What methodology of encryption should be used?
3) Would performance be impacted having the service layer calling the stored procedure that would do encryption and decryption of the values?
4) What should be compromised for columns that have encryption enabled (Performance or Security)?
5) Are there any options or solutions to get Performance and Security together?
6) What are the guidelines to backup and restore the encrypted keys?

And before I could answer the questions, I wanted to make a note that some of the points in the below points are contentious and keeping requirements in mind we agreed upon, this is what I could think of.

–       What is Database Encryption and what does this mean to us?

Database encryption protects data from internal and external attacks. However, database encryption isn’t a one-size-fits-all solution and we need to evaluate what data needs protection, where the encryption should be performed (e.g., in the database, in the application where the data lies, above the database), and how much performance would be traded for increased security. Well, we may also have no choice in these factors if we must encrypt data to comply with regulatory standards.

–       What level of encryption should be implemented?

Having the option of only cell level encryption and with the hierarchical encryption, key management infrastructure for encrypting data we have 3 choices of implementation.

·         Certificates (quoted from BOL)

A public key certificate, usually just called a certificate, is a digitally-signed statement that binds the value of a public key to the identity of the person, device, or service that holds the corresponding private key.

·         Asymmetric keys (quoted from BOL)

An asymmetric key is made up of a private key and the corresponding public key. Each key can decrypt data encrypted by the other. Asymmetric encryption and decryption are relatively resource-intensive, but provide a higher level of security than symmetric encryption.

·         Symmetric keys (quoted from BOL)

A symmetric key is one key that is used for both encryption and decryption. Encryption and decryption by using a symmetric key is fast, and suitable for routine use with sensitive data in the database.

With all these points in mind I moved onto next level of questions based on our discussion in the call. The context was when we have data is stored in columns of different data types with different length in an encrypted format (except the image values),

–       What will be the performance impact of encryption and decryption of these columns? Should all the columns be encrypted?

Due to encryption the value of the column attributes changes and therefore the data type of the column is changed to varbinary and then is recast back to the original value after decryption. The manual process means that none of the automatic techniques used by the SQL Server query optimizer will work and becomes cpu intensive operation. The performance impact could be felt significantly if the system has high cpu load and high i/o.

Now the larger the data type length of the encrypted column, the poorer the performance of the system during decryption of these column values. Here is a link (you probably would have had look at this article) which talks about how much data is stored in each data type defined.

With these points in mind we would need to identify columns that need to be encrypted based on type data (sensitivity) or based on data type defined for the column.

In short my answers to the above questions will be – Performance is certainly impacted based on type of encryption method used, data type and length defined of the columns, and unfortunately I couldn’t quantify how much performance will be impacted. The above mentioned link could give us first insight of the performance issues rising. And definitely all columns can’t be encrypted, instead columns based on sensitivity of information should be encrypted and if all the columns have to be encrypted we need to justify the data types and column length to obtain best performance or at least not increase the performance problems.

–       What methodology of encryption should be used?

Going by most obvious reasons with the points discussed above and results from the link, we should prefer using encryption and decryption using symmetric keys.

–       Would performance be impacted having the service layer calling the stored procedure that would do encryption and decryption of the values?

My answer would be performance impact could be higher or lower based on where we embed the logic of identifying the type of key. And if we have an uniform key to be used across all types of columns then we don’t have to be worried about the service layer performance overhead.

To get into more of this situation let’s say –  if we create multiple symmetric keys for different type of columns like Name (First, Last) we have symmetric key created FSymKey and columns like SSN, Health Insurance Number we have symmetric key created SSymKey each key having its own encrypting mechanism.  Now while decryption we would use

/*To decrypt data for name columns*/

OPEN SYMMETRIC KEY FSymKey…… or OPEN SYMMETRIC KEY SSymKey ……

If logic to decrypt data is not embedded in stored procedure against specific column type, instead the logic is embedded in service layer then the overhead of identifying which key to be used could be a performance (overhead) hurdle. But now we have a catch in here, for sake of performance gain from service layer if we put in the logic into stored procedure we may breach the security by exposing the key names and other details, but we have solution for all problems. We probably would encrypt the stored procedure to protect the symmetric key that is used against specific columns, and again that could turn out to be a overhead for development and maintenance of encrypted sp’s, encrypted keys, and so on. We probably should have all wise thoughts together before implementing any of the options to gain performance and /or security.

–       What should be compromised for columns that have encryption enabled (Performance or Security)? Are there any options or solutions to get Performance and Security together?

I have clubbed the questions 4 and 5 for ease of answering.

Well, the first question is a tricky question because we need to do a trade-off between performance and security. We will have to identify what columns could cause performance issues, what basis should we decide that columns would be performance overheads, would we need to encrypt the column that would cause performance issues, if yes what should be the data type and length defined of the column.

What if we are unable to compromise on the data type, security and length then what are the other alternatives for getting the performance. This would be answer to the second question by using MAC (Message Authentication Codes) and there are few articles article 1, article 2 and article 3 that give a insight on how to gain performance by using Message Authentication Codes and indexing encrypted data. We could also use SQL CLR procedures to gain better performance (although not tested one) for encryption and decryption for columns that will have large data types.

–       What are the guidelines to backup and restore the encrypted keys?

There is a good article on how should we backup and restore database that have objects with encrypted columns enabled. IMHO, we should stick the methodology defined in this article for efficient management of database that uses encryption.

I hope this post helps you all and do let me know your comments. I would like to Thank! Laurentiu, Raul and Rob Garrison for sharing there experienced thoughts on their blogs of which I could make best of benefit.

Test your SQL Server security – SQL Injection! May 5, 2008

Posted by msrviking in Security.
add a comment

Denis, (Oops, did I get his name wrong?) has given link to SQL Server security testing. I haven’t tried all the tests listed in the page but thought I should share it with you all.

Click here to read more,
http://www.owasp.org/index.php/Testing_for_SQL_Server