jump to navigation

SSIS for Extraction & Loading (EL) only September 2, 2013

Posted by msrviking in Business Intelligence, Data Integration, DBA Rant, Design, Heterogeneous, Integration, Integration Services.
Tags: , , , , , , , ,
add a comment

There were series of posts earlier on the different ways to pull data from Oracle to SQL Server where I shared on how to port data either through Replication, CDC for Oracle, and few others like SSIS. I thought I will give a starter to a series of post(s) on how we went about picking up the option of implementing SSIS as a way to bring data from Oracle to SQL Server.

The major drivers (business and non-functional) for picking the solution were

  • The destination database on SQL Server should be replica of the source
  • No business rules or transformation need to be implemented when data is pulled
  • The performance of pull and loading should be optimal with no overheads or bottlenecks either on the source or destination

The destination system was mentioned as ODS, but it surely isn’t the same as Operational Data Store of DWH systems. Sadly the naming convention had been adopted by the shop where we had to implement the solution, and you might see me using the word ODS for sake of explaining the implementation, and I am sorry for putting the incorrect usage of the word. You would probably see little more of conventional or standards being skewed in the implementation, and of course a bit of my comments on how this could have been handled, better.

So the ODS was to hold data that would be used by downstream business analytics applications for the business users, with an intent of providing Self-Service BI. The ODS is to be of exact – in the form of schema, and data as in the source. The job was to pull data from the source without any performance overhead, and failures. Not to forget to mention that there needn’t be any transformation of data. At the end we use SSIS as only an Extraction & Loading tool instead of ETL – Extraction, Transformation and Loading.

This sounds simple, eh, but beware this wasn’t that simple because the design was to be kept simple to address all of the above factors. The SSIS package(s) had to handle these at a non-functional level,

  • The package should be configurable to pick and pull data from any set date
  • The package should be re-runnable from the point of failure
  • The package should have configurability to address performance needs – package execution time should be less, the source should not be overloaded, the destination should be full in its efficiency while data is loading.
  • The package should be configurable to pull data from source in preset chunks. The preset chunks could be based on a period basis – days /months /years, or number of records per period
  • The package should have the option to flag any other dependent packages to run or not run during initial and incremental loads
  • The package should have defensive design to handle bunch of different type of errors
  • The package should have error logging and handling at a package level, and at record levels

In Toto this solution was more to how do we design for the non-functional requirements and implement, leaving the functional side of the data. This type of implementation is half-blind and I will talk more on those cons we had to face, when I get in to details of each implementation.

A post after few weeks of gap, and one with learnings and summaries on how it could have been better.

Advertisements

SQL Server 2012 CDC for Oracle July 5, 2013

Posted by msrviking in Architecture, Change Data Capture, Configuration, Data Integration, Heterogeneous, Integration, Integration Services, Oracle.
Tags: , , , , , ,
add a comment

Sometime in June 2013, I had to work on an alternate option to Replication to replicate data from Oracle to SQL Server 2012. The primary requirements were

  1. There should not be performance impact on the Oracle instance, and the CDC should work seamlessly with a RAC setup having multiple log file locations, and reading any DML changes invariably where the transaction happens (different nodes of the Oracle RAC)
  2. There should not be any additional objects (temporary tables, triggers) that should be created in Oracle instance unlike to what happens in Replication setup
  3. The DDL changes on the source should be manageable with appropriate configuration

Keeping all this in mind I brought up CDC for Oracle in SQL Server 2012 as the next best option and built a POC. As a result of the POC I had documented the whole of the steps or probably the implementation picture of the POC for my future reference. Today I decided to share it with you all so that you too could benefit the implementation.

So here goes the excerpts from the document. Sorry couldn’t convert the document into blog style, though.

Cheers and Enjoy.

Change Data Capture (CDC)

Overview

Change Data Capture (CDC) is way of tracking and capturing changed data on real-time basis.

The data changes because of DML operations like insert, update, delete these are logged into log files of the source database. The log files serve as input to the capture process which runs asynchronously. The capture process reads the log files and adds the change information to the tracked tables that are associated with the respective source tables.

The below diagram (as represented in MSDN) summarizes the process of change data capture.


The data captured in the tracking tables is consumed by SQL Server Integration Services to refresh the end business tables in an Operational Data Store (ODS).

Internals of SQL Server 2012 CDC for Oracle

The capabilities of change data capture is to read data from data sources and synchronize the data changes incrementally from source to target tables, and is real-time.

The Change Data Capture for Oracle setup has these components

Oracle Database Instance

The data changes happening on the Oracle source tables are captured into redo log files, and these redo log files are archived for reading by Oracle LogMiner.

The Oracle LogMiner reads the archived redo log files if the Oracle databases

  1. Archived log mode is enabled
  2. Supplemental logging is enabled

CDC Service & CDC for Oracle Designer

SQL Server 2012 has the new feature called SQL Server 2012 CDC for Oracle and is available as part of Service Pack 1 (SP1). The CDC feature comes with two installable as part of the SP1, and these files install CDC Service Configuration, and Oracle CDC Configuration utilities.

Console for CDC Service

A pre-requisite before creating a CDC service is to have a system database called MSXDBCDC created on SQL Server instance. The database is created by executing the step Prepare SQL Server through the CDC service configuration console. The database consists of tables that help in storing the configured databases, services and trace (diagnostics) information.

All CDC operations use the CDC service to synchronize data, and the service is created as Windows service. The CDC Service management console is used to configure CDC service however CDC services could be configured and run on any server, other than the ODS or Integration Services (IS) systems. Also more than one CDC service could run on a dedicated system.

Designer console for CDC for Oracle

A CDC for Oracle instance is configured under a CDC service, and multiple CDC instances can be configured against an Oracle database using the CDC designer console.

A CDC instance for Oracle

  1. Has a connection string to Oracle instance. The connection string is configured against scan IP name, or VIP name
  2. Has multiple source tables (with respective primary keys) and with selected capture columns. Each source table with capture columns has a respective supplemental logging script which is executed on the Oracle instance. The supplemental logging on the table captures the data changes on the columns into redo log files
  3. Creates a CDC enabled database on SQL Server
  4. Has an option of enabling advanced properties like cdc_stop_on_breaking_schema_changes, or trace. These properties help in managing the instance behavior upon a schema change in source table, with additional diagnostics for troubleshooting any issues

SQL Server

    CDC enabled database

A CDC instance creates a CDC enabled database and each of this database has multiple mirror tables, capture instances or tables representing each source table, and the system tables that provide information on the change capture objects, columns, DDL history, health of the CDC instance.

The CDC enabled database is by default created with case sensitive collation, and could be changed by modifying the database deployment script created by the CDC instance designer.

Mirror tables

Each source table in Oracle has a mapping mirror table in CDC enabled database in SQL Server. The mirror table is a similar copy of the source table, except the data types and length which are different in SQL Server. Any schema changes on the source table is verified by CDC process by checking against schema of the mirror table.

The mirror tables are always empty, and should not be used for as business tables. The mirror table is identified with a specific schema owner, and are not created under default dbo schema.

     A sample name of mirror table is CDC_POC.CDC_POC_DEPT.

Capture instances

Each source table has a capture instance, and the capture instance is created as system table. The CDC process populates capture instances with data that is changed in source. These tables has columns that represent the source or the mirror table, and columns that store the Start and End Last Sequence Number (LSN), a sequence value, type of operation and update mask.

Here is the sample structure of a capture instance

Column Name

Purpose of the column

__$start_lsn

LSN associated with the commit transaction of the change.

__$seqval

Sequence or order of rows that the transaction has affected.

__$operation

The type of DML operation

1 – Delete

2 – Insert

3 – Update (old value)

4 – Update (new value)

__$update_mask

Bit mask of the column ordinals

The capture instance always has schema identifier as cdc, and ends with _CT.

A sample name looks like cdc.CDC_POC_DEPT_CT.

The capture instance has the information based on the DML changes that are committed on the source table. This table provides inputs in the form changes (data that is changed, columns whose values have changed, and the type of change) for SSIS packages to refresh the business tables.

The capture instance could increase in size based on the changes that are captured from the source. This table could be purged on regular basis in order to keep the size of the table and database per standards.

Business tables

The business tables contain the data that is used by the business applications. The business table should be created with any standard naming convention and could be created in any other database, and need not be in CDC enabled database. The database can be hosted on any other SQL Server instance.

The schema structure of the business table could be same as source or mirror table but the data types could vary. If the data types are different then the SSIS packages should have appropriate data type conversions.

The business table is initially populated through SSIS as a full load, and then the incremental loads are applied to refresh the data. The incremental loads fetch the data changes from the capture instance tables, and does an insert or merge operations on the end tables.

Overall process

    The overall CDC process is summarized in the below diagram.


The CDC service fetches the data changes from the Oracle source table. The CDC service connects to the Oracle database instance and invokes the Oracle LogMiner. The CDC process requests the LogMiner for the data changes that have been committed on a source table. The LogMiner starts the data fetching process from the archived redo logs, and returns the changes against columns that are enabled with supplemental logging.

The CDC service then populates the changed data into capture instances along with last sequence number, data that is changed in a column, and the type of operation.

The CDC service uses the CDC instance information to connect to an Oracle instance and to fetch data changes in particular source table for the specified columns.

Implementation Scenarios

Data Synchronization – Connecting to Load Balancer

The below diagram summarizes on a scenario where the CDC for Oracle instance is configured to Oracle RAC through the scan IP name.


Oracle Cluster

The Oracle instance is in a RAC configuration mode, where Node 1 xxxxx01.xxxx.net) and Node 2 (xxxxx02.xxxx.net) are load balanced and is connected by using a scan IP. The two nodes share the same storage area, and the redo log files generated by these nodes are archived and stored in file locations in the storage area. There are multiple file locations, and each location is created based on the day, month and year. Each folder could have multiple archived redo files.

SQL Server

The SQL Server system hosts CDC service and CDC Designer for Oracle tools.

The CDC service is created using the CDC service management console and is run as Windows service using Local System account.

The CDC for Oracle instance is created using CDC designer console. The CDC instance connects to the Oracle instance using the scan IP name – xxx-rac01-scan.xxxx.net. The CDC process reads the archive logs using LogMiner and writes to the CDC enabled DB. The data changes that are captured are written into the capture instance in the SQL Server.

The mirror tables are empty and are never populated either during the full load initially or during the data refreshes through incremental load.

    ODS

The business tables are created in the operational data store, and the table is populated with full load from the source for the first time. The data changes captured through the CDC process is applied on this table through incremental loads.

    SSIS

SQL Server integration services (SSIS) runs on a different system. The purpose of having SSIS is to perform a full load into the business table initially, and when the CDC synchronization is set the data refreshes happen on the business tables in the ODS.

There are two packages that perform the full load and incremental load. The below tables summarize the component used, purpose of each component, and the overall process executed by each package.

Package Name

Component Used

Purpose of the component

Purpose of the package

Initial Load Execute SQL Task Deletes the data from the business table before a full load The package is executed only once as a full load, and when the CDC is setup or re-configured.

This package deletes the destination business table, and populates latest data from the Oracle source table using Oracle Connector. After the data is populated from Oracle to SQL Server business table in ODS, system table cdc_state is updated with a LSN value that is referenced as next starting point to read the data changes from the capture instances.

Data Flow Task Populates data from the Oracle source table into business table using Oracle connector from Attunity
CDC Control Task Sets a LSN value in cdc_state (a system table created when the package is executed) table as a bookmark. This value will be referenced by the incremental package for reading changed data
Incremental Load CDC Control Task The task reads the last saved LSN value in cdc_state, and marks the process as start fetching data from the capture instance The package is executed after the first initial full load.

The package could be scheduled to fetch the data changes and refresh the end business table in ODS.

The CDC source components marks the processing range of the data based on the last saved LSN value.

The data is inserted in case there are new rows through the OLE DB destination task, or updates or deletes through OLE DB command.

After the data refresh is complete on the destination table, the process is marked as completed the latest value LSN value is saved into cdc_state.

Data Flow Task

The task reads the capture instance for All or Net data changes using CDC Source component from last LSN value saved in cdc_state.

The CDC Splitter component splits the data read from CDC source into three different operations – Insert, Update and Delete.

The CDC splitter directs the data based on the DML operation – To INSERT into OLE DB connection; To UPDATE and DELETE into OLE DB command.

CDC Control Task

This component marks the end of reading the data, and the state of the reading, populating data into business tables.


Data Synchronization – Connecting to RAC Nodes

The below diagram summarizes on a scenario where the CDC for Oracle instance is configured to Oracle RAC through the VIP names (individual nodes).


This section describes on an approach adopted to verify SQL Server 2012 CDC for Oracle works with Oracle RAC nodes, and multiple redo log files in different locations.

The first and second diagram above show that Oracle CDC instances are connected to Node 1 (xxxxx01.xxxx.net) through the VIP name xxxxx01.xxxx.net:1521/orcl and Node 2 (xxxxx02.xxxx.net) through the VIP name xxxxx02.xxxx.net:1521/orcl respectively. The CDC for Oracle reads the changed data from each node that is committed independently on each node.

As part of verification the following steps were executed

On Node 1 or Node 2:

  1. Logged onto to Oracle nodes using the connection string uname/pwd@ xxxxx01.xxxx.net:1521/orcl or uname/pwd@ xxxxx02.xxxx.net:1521/orcl
  2. Ran a SELECT query against a sample table under the same schema and captured the existent data in the table for validation, later
  3. Ran DML (UPDATE, DELETE, INSERT) statements on each node, and committed the transaction
  4. Validated the DML changes done on each node, by running SELECT statements
  5. Verified if the CDC instances were capturing the changed data by running SELECT statements on capture instances

The result after these steps was that CDC instances (configured against each node) were reading the DML changes done on each node, and is captured in capture instances on SQL Server.

The above result concludes that CDC for Oracle works with RAC environment where in data is read, invariably where the transactions are committed. This also confirms that CDC for Oracle instances captures data changes that are logged into multiple archived redo log files, in multiple locations.

The key to above conclusion is that CDC process fetches the data changes by calling the Oracle LogMiner which in turn reads the multiple archived redo log files in different locations.

The captured data in the capture instances is refreshed onto business tables using the incremental package process as described in section SQL Server under SSIS.

Schema changes

This section explains on the DDL statements that impact on the Oracle CDC instances, and steps to apply the changes on the destination database.

SQL Server 2012 CDC for Oracle does not propagate the schema changes on Oracle tables to SQL Server. However enabling an advanced property named as cdc_stop_on_breaking_schema_changes helps in capturing accurate and consistent data or latest data. This property could be set under each CDC instance under Advanced Properties tab.

The values that could be set against this property are 0 and 1. On the event of schema change on Oracle source table, and when the value of cdc_stop_on_breaking_schema_changes is

  1. Set to 1, then the data capture is stopped and this helps in capturing consistent data, with exceptions in few scenarios
  2. Set to 0, then the corresponding capture instance in SQL Server is dropped, and this helps in capturing latest data

If the schema changes are pertinent on the source, and if these changes should be propagated to capture instance in SQL Server, and the business table then a series of steps have to be considered.

The below table summarizes different DDL statements with its impact on the CDC instance capture process, and way to mitigate the schema change.

Type of Schema Change

Impact and corrective steps on schema change

Adding a new column to the source table Impact: If the cdc_stop_on_breaking_schema_changes is set to 0 then the capture instance drops the table

cdc_stop_on_breaking_schema_changes is set to 1 then the CDC will capture the data for the existing columns ignoring the new columns

Action steps: If the property cdc_stop_on_breaking_schema_changes is set to 1, if schema changes are to be applied

1.Backup the existing CT table and the cdc_state table

2.Drop the source table from CDC instance

3.Add the CDC instance with the altered source table

4.Import the historical data from CT to the new capture instance

5.Modify the SSIS package to capture the new column for initial and incremental load from the last LSN

Drop an existing column from the source table Impact: If the cdc_stop_on_breaking_schema_changes is set to 0 then the capture instance drops the table

cdc_stop_on_breaking_schema_changes is set to 1 then the CDC will stop data capture with an error

Note: A column is dropped on source table only if the supplemental logging on that table is dropped

Action Steps:If the property cdc_stop_on_breaking_schema_changes is set to 1, if schema changes are to be applied

1.Drop the supplemental logging on the source table

2.Drop the column on source table

3.Backup the existing CT table and the cdc_state table

4.Drop the source table from CDC instance

5.Add the CDC instance with the altered source table

6.Import the historical data from CT to the new capture instance

7.Modify the SSIS package to capture the new column for initial and incremental load from the last LSN

Rename an existing column on the source table Impact: If the cdc_stop_on_breaking_schema_changes is set to 0 then the capture instance drops the table

cdc_stop_on_breaking_schema_changes is set to 1 then the CDC will stop data capture with an error

Action steps: If the property cdc_stop_on_breaking_schema_changes is set to 1, if schema changes are to be applied

1.Backup the existing CT table and the cdc_state table

2.Drop the source table from CDC instance

3.Add the CDC instance with the altered source table

4.Import the historical data from CT to the new capture instance

5.Modify the SSIS package to capture the new column for initial and incremental load from the last LSN

Change in data type or data type length on the source table Impact: If the cdc_stop_on_breaking_schema_changes is set to 0 and if data type is changed from for e.g. int to varchar

then the capture instance drops the table

cdc_stop_on_breaking_schema_changes is set to 1 then the CDC will do below

1.If the data type change is from for e.g. varchar to int, then NULL value is captured although different row operations state (3 & 4) are logged

2.If the data type length is changed then row operation state 3 is alone captured

Action steps: If the property cdc_stop_on_breaking_schema_changes is set to 1, if schema changes are to be applied

1.Backup the existing CT table and the cdc_state table

2.Drop the source table from CDC instance

3.Add the CDC instance with the altered source table

4.Import the historical data from CT to the new capture instance

5.Modify the SSIS package to capture the new column for initial and incremental load from the last LSN

A schema change on the source table should be applied on the business tables through a series of steps and procedure to capture accurate and consistent data.

Oracle to SQL Server Replication-Design considerations May 21, 2013

Posted by msrviking in Configuration, Data Integration, Design, Heterogeneous, Integration, Replication.
Tags: , , , , , , ,
add a comment

I am thinking I have covered all and best of the points for one to configure a transactional replication between Oracle and SQL Server in all my previous posts. In this post I would like to give my 2 cents on design considerations and limitations of Oracle publishers that I had to keep in mind during this exercise. There is a neat list of other points published in this Technet article.

  1. Configuration of Oracle Publisher with options

    Per the above article I learnt that the Oracle publishing can be done using the Oracle Gateway and Oracle Complete options. Few things that make difference these two options is that Oracle Gateway provides better performance but doesn’t let multi-transactional publications of a single to-be or published table. Whereas in the case of Oracle Complete option the table can be published in several publications, but lesser performance.

    I tried digging in Google to see on how these two options work as described, because I didn’t try on the setup I built to find the difference. I didn’t find any reliable sources that talk about difference in detail but one of the posts by Richard Douglas (Editor in Chief of SQL ServerPedia) gives a short description “Gateway is for high performance replication solutions. Complete is when you want to transform your data or to do row filtering”. I might have to try and see what it means in here.

    I remember seeing row filtering in a transaction replication using the option of Oracle Gateway, and I am yet to find out on how this is different when Oracle Complete is chosen. For now I shall park for future tries on this limitation and share my findings.

  2. It’s quite obvious with this point (supports few objects) because the similarities that Oracle and SQL Server have at object level implementation are few. So,

    What are the objects that are supported?

    1. Tables
    2. Index-organized tables
    3. Indexes
    4. Materialized views (as tables)

    What are those objects that cannot be replicated, but can exist on published tables?

    1. Domain-based indexes
    2. Function-based indexes
    3. Defaults
    4. Check constraints
    5. Foreign keys
    6. Storage options (tablespaces, clusters, etc.)

    What are the objects that cannot be replicated at all?

    1. Nested tables
    2. Views
    3. Packages, package bodies, procedures, and triggers
    4. Queues
    5. Sequences
    6. Synonyms

    Okay with this list we know what can be replicated, what cannot be replicated? I picked the list from the above post, but was hunting to find out why we can’t replicate these objects. One simple reason that keeps coming to my mind is that “some of the above objects in the ‘cannot’ list don’t have equivalents in SQL Server”. Okay, so aren’t there workarounds for these objects? For example Packages have bundle of programmed objects like packaged variables, cursors and few others which have to be converted through a specific process, and this specific process can’t be through Replication. That exercise could be migration of objects from Oracle to SQL Server, and surely can’t be part of Replication topology. Does this apply for simple objects also? I don’t have any technical reason to explain for this too so concluding it with my 2 cents. Meanwhile I shall see if I can find better “reasons”.

    Here is a link from Microsoft on what it all means to convert some of the above listed objects. This was one of the reasons that I helped me to conclude logically that this is part of migration from Oracle to SQL Server and can’t be part of Replication at a conceptual or architectural level.

Since my setup was to test the new Oracle Publisher I didn’t have to keep tab on other design considerations, but I am sure I will have to deal with others too when this gets on to full deployment mode than a POC.

I hope this short post helps you folks out there, and happy reading.

The usual ask – please feel free to share your thoughts or comments.

Cheers!

Oracle to SQL Server data synchronization–What are the other ways? May 17, 2013

Posted by msrviking in Architecture, Configuration, Data Integration, Heterogeneous, Integration, Integration Services, Replication.
Tags: , , , , , , , ,
2 comments

In this series I have been talking about environment setup, configuring the box with appropriate drivers, setting up Oracle Publisher and then short post on Transactional Replication. While doing this I had my bit of issues in setting and configuring the whole workflow so a parallel thread was running in my head asking only one question – “Is this the only way to synchronize data from Oracle?”.

The only one option which was looking viable was using SSIS, and with SSIS 2012 lot more features have come in to help the situation. I haven’t worked on using SSIS – CDC for Oracle per below links but intend to do it sometime.

Here are the few links I found when I Googled around on this topic,

1. Microsoft SQL Server 2012 Change Data Capture for Oracle by Attunity Topics (http://social.technet.microsoft.com/wiki/contents/articles/7644.microsoft-sql-server-2012-change-data-capture-for-oracle-by-attunity-topics.aspx)

2. CDC for Oracle in SQL Server 2012 (http://blogs.msdn.com/b/mattm/archive/2012/03/26/cdc-for-oracle-in-sql-server-2012.aspx)

3. CDC in SSIS for SQL Server 2012 (http://www.mattmasson.com/2011/12/cdc-in-ssis-for-sql-server-2012-2/)

4. Installing Microsoft SQL Server 2012 Change Data Capture for Oracle by Attunity (http://social.technet.microsoft.com/wiki/contents/articles/7647.installing-microsoft-sql-server-2012-change-data-capture-for-oracle-by-attunity.aspx)

5. Change Data Capture (CDC) for Oracle video now available (http://www.mattmasson.com/2012/09/change-data-capture-cdc-for-oracle-video-now-available/)

6. SQL Server 2012 CDC for Oracle – a Review of One Implementation (http://blogs.msdn.com/b/dfurman/archive/2013/04/15/sql-server-2012-cdc-for-oracle-a-review-of-one-implementation.aspx)

7. Troubleshoot CDC Instance Errors in Microsoft Change Data Capture for Oracle by Attunity (http://social.technet.microsoft.com/wiki/contents/articles/7642.troubleshoot-cdc-instance-errors-in-microsoft-change-data-capture-for-oracle-by-attunity.aspx)

Although I say that SSIS is the best option and yet to be tested I wanted to list the differences of implementation using Replication and SSIS. So I built a quick comparison table between Replication and SSIS from perspective of DDL and Performance to start, and this is work-in-progress. As of now I am unable to judge the timeline I should tie to close this table of comparison with other factors included. Nevertheless I am thinking this should help you folks too, just in case.

Feature Category

Feature

SSIS

Repli-cation

Notes

DDL /Schema Changes

– New column addition

– Change in mapped (source) column name

– Change in data type of source column name

– Drop column in the source table

Yes (customized coding and SMO)

No

– SSIS – SMO based programming can help in detecting changes at the source and help in remapping in destination

– Replication – Subscriber has to be dropped /deleted, Publisher should add new table or regenerate the snapshot in case of Oracle publisher

Performance

– On source table or system

Impacted

Impacted

– Replication – Triggers are created on the published tables to track data change at row level. Such triggers could be overhead to Oracle published tables. Triggers are created by SQL Server replication to capture data changes

– SSIS – the impact on the performance on the source table is lesser comparatively to replication since no triggers are created for tracking the changes of data. This could be achieved using CDC with SSIS 2012 for Oracle

– On destination table or system

Impacted

Impacted

– Replication – in case of large data changes (inserts and updates) transactional replication could cause performance bottleneck on the destination /subscriber tables because of DML statements cause lock escalation on the tables. Another option for large data changes is to use snapshot replication

– SSIS – the impact is lesser and can be controlled by batch operations

Synchro

nization

– Real-time

Supported

Supported

– Replication – transactional replication helps in achieving the real-time data synch up between Oracle and SQL Server however if the data changes are large then this could impact the performance. There is very minimal latency of data synchronization

– SSIS – with CDC for Oracle using SSIS 2012, real-time synchronization is achievable without any performance overhead, since no triggers are created on Oracle tables, and the changes in data doesn’t need any configuration or maintenance or setup issues because of the capabilities of CDC to create change tables in SQL Server. The data captured as part of changes is applied by SQL Server through the CDC tables

Please feel free to share your thoughts or comments. Happy reading.

Cheers!

Oracle to SQL Server Replication – Configure Transactional Replication w/ Pull Subscription May 16, 2013

Posted by msrviking in Configuration, Data Integration, Heterogeneous, Integration, Replication.
Tags: , , , , , ,
add a comment

This post is going to be short because I am thinking I shouldn’t give my bit of explanation on how to configure a Transactional Replication in SQL Server against Oracle Publisher because there is no difference that I could share.

Instead let me share few links that are well explained by few other SQL Server experts on how to configure SQL Server Transactional Replication except that you have to pick already configured Oracle Publisher in the publication list.

So here are those for your reference and as guidance.

The last link is the source for me to work on the design, implementation and troubleshooting of Oracle Publisher – SQL Server Transaction Replication w/ Pull Subscription. In few of my concluding posts of this series I shall talk about the design, deployment and issues I had faced.

Happy reading.

Cheers!

Oracle to SQL Server Replication-Configure Oracle Publisher May 15, 2013

Posted by msrviking in Configuration, Data Integration, Heterogeneous, Integration, Replication.
Tags: , , , , ,
add a comment

In today’s post as part of this series I will share few steps I diligently followed to configure Oracle publication. As first step certain pre-requisites have to be addressed on SQL Server box and the Oracle instance.

1. Installing Oracle client and OLE DB provider on SQL Server system

I am not too much of an Oracle guy, but have fair deal of experience to identify annoying issues that rise from connecting between SQL Server and Oracle. So make sure you have the Oracle client and OLE DB provider installed appropriately on the SQL Server box.

I didn’t play much with changing environment variable paths, tnsnames.ora, so I let the installations happen all by default path, and the end I wanted SQLPLUS let me connect to the Oracle instance from the box.

So did it happen with me? Yes, after a bit of tussle I did manage to have Oracle connected and that was the first relieving moment only because I knew if I cross this first major hurdle others (configuring publisher, replication) would be easy steps. Here is the snapshot of the SQLPLUS that gave me a bit of smile, and the next one was in waiting (Oracle Publication).


There is one more tiny part to be done on SQL Server box, is to set read and execute permissions to the Oracle (client and provider) folders against the service account that runs SQL Server, and for obvious known reasons.

2. Create Replication administrative user, schema

I used the commands from this blog post and the post has decent inline comments on what each command does and what does it expect from you when you run this on Oracle instance.

Remember, that you logon using sysdba privileges (per recommendation), and in my case I managed to get hands on the sys login of the Oracle instance. The script has bunch of commands that needs to be executed on SQL Server instance too and that’s to configure Oracle Publication, Distributor and Replication with Pull Subscription.

I didn’t run all those commands because I wanted to try through the GUI way of configuring the publication. So feel free to pick your path, however make sure you are following each step until configuring the Replication administrative user, creating a user schema, and grant appropriate permissions. If you miss any you might have trouble in making the Oracle side of publication work properly and I am sure you will be clueless for a little while and may revisit the steps you have executed.

This step was not difficult and upon completion I know I was getting into the world of troubleshooting SQL Server to Oracle connectivity.

I shall paste the steps – commands for your quick reference and are exact copy from the above link.

  • Logon to ORACLE database as SYSDBA and follow the below steps
    • Create Tablespace to Store SQL Server Replication Metadata Objects

<Sample Script>

Create Tablespace <SQL_REPLUSER_TBL>

datafile ‘<Path>\<SQL_REPLUSER_TBL>.dbf’
size 512M autoextend on
next 512M maxsize
unlimited
;

</Sample Script>

  • Create SQLReplUser on the ORACLE Database and make the tablespace created in step 1 above as the user default tablespace

<Sample Script>

–***************************************************************************

— Copyright (c) 2003 Microsoft Corporation

— File:

— oracleadmin.sql

— Purpose:

— PL/SQL script to create a database user with the required permissions

— to administer SQL Server publishing for an Oracle database.

— &&ReplLogin == Replication user login

— &&ReplPassword == Replication user password

— &&DefaultTablespace == Tablespace that will serve as the default tablespace for the

— replication user.

— The replication user will be authorized to allocate UNLIMITED space

— on the default tablespace, which must already exist.

— Notes:

— This script must be run from an Oracle login having the authorization to

— create a new user and grant unlimited tablespace on any existing tablespace. The

— login must also be able to grant to the newly created login the following authorizations:


— create public synonym

— drop public synonym

— create sequence

— create procedure

— create session

— create table

— create view


— Additionally, the following properties are also required for transactional publications.


— create any trigger


— All of the privileges may be granted through a role, with the exception

— of create table, create view, and create any trigger. These must be

— granted explicitly to the replication user login. In the script, all grants are

— granted explicitly to the replication user.


— In addition to these general grants, a table owner must explicitly grant select

— authorization to the replication user on a table before the table can be published.


–***************************************************************************

ACCEPT
ReplLogin
CHAR
PROMPT
‘User to create for replication: ‘;

ACCEPT
ReplPassword
CHAR
PROMPT
‘Replication user passsword: ‘
HIDE;

ACCEPT
DefaultTableSpace
CHAR
DEFAULT
‘SYSTEM’
PROMPT
‘Default tablespace: ‘;

— Create the replication user account

CREATE
USER
&&ReplLogin
IDENTIFIED
BY &&ReplPassword
DEFAULT
TABLESPACE &&DefaultTablespace
QUOTA
UNLIMITED
ON &&DefaultTablespace;

— It is recommended that only the required grants be granted to this user.


— The following 5 privileges are granted explicitly, but could be granted through a role.

GRANT
CREATE
PUBLIC
SYNONYM
TO
&&ReplLogin;

GRANT
DROP
PUBLIC
SYNONYM
TO
&&ReplLogin;

GRANT
CREATE
SEQUENCE
TO
&&ReplLogin;

GRANT
CREATE
PROCEDURE
TO
&&ReplLogin;

GRANT
CREATE
SESSION
TO
&&ReplLogin;

— The following privileges must be granted explicitly to the replication user.

GRANT
CREATE
TABLE
TO
&&ReplLogin;

GRANT
CREATE
VIEW
TO
&&ReplLogin;

— The replication user login needs to be able to create a tracking trigger on any table that is

— to be published in a transactional publication. The CREATE ANY privilege is used to

— obtain the authorization to create these triggers. To replicate a table, the table

— owner must additionally explicitly grant select authorization on the table to the

— replication user.


— NOTE: CREATE ANY TRIGGER is not required for snapshot publications.

GRANT
CREATE
ANY
TRIGGER
TO
&&ReplLogin;

EXIT

</Sample Script>

  • Grant Select privilege on table you would like to publisher to <SQLReplUser> created in the step (b) above

<Sample Script>

–Execute the below statement from SQL*PLus

Grant
Select
on
<Owner>.<Table> to <SQLReplUser>;

</Sample Script>

3. Configuring Oracle Publication

This is the one I did multiple times, and when I say multiple times it’s for these reasons

  • Connecting to Oracle instance is very demanding process and anytime it could bounce back as a surprise failure although the instructions are followed to configure the Oracle client or OLE DB providers
  • The failed connectivity errors raised by Oracle through OLE DB provider will be gobbled by SQL Server. What does that mean? If there is logon failure, Oracle instance /service id resolution failure, or user schema access issues – all these have one type of error although there are bunch defined on MSDN.

So watch out on what type of error you could be getting and what reason. It’s almost like reading the errors through your inner sense, huh. That was a tough ask so I requested a team mate to build a small .NET code using the OLE DB provider and anything I do on the wizard I imitated on the code and I was able to find the cause of errors and made things work quicker. Thanks to the team mate for pitching in on right time to fix some “hidden errors”.

Here are the snapshots that would you see when configuring Oracle Publication.

  1. Configure Distributor


    After few screens which are default for configuring a distributor with any type of replication you will have to end with this step at the end. I opted to have SQL Server box which was supposedly to be subscriber as distributor.


  2. Configure Oracle Publication




    From here you would see a configured Oracle Publisher with the name that is mentioned as part of Server Instance. This is one of the most important step which confirms that the Oracle Publication is configured on SQL Server. I was relieved after configuring the publisher, and this didn’t come that easy at the first attempt.

    I hope all these help for configuring the Oracle Publisher, and the next step would be a Transactional Replication with a Pull Subscription. I shall share configuring Transactional Replication details against the Oracle publication in the upcoming posts, however there is nothing new once the publisher is done and is similar with that of SQL Server publication.

    I am also intending to share some important tips on design considerations, troubleshooting and performance of Oracle Publication.

    Until then, happy reading.

    Cheers!

Oracle to SQL Server Replication-A bit of insight May 13, 2013

Posted by msrviking in Configuration, Data Integration, Heterogeneous, Integration, Replication.
Tags: , , , , , ,
add a comment

In last couple of posts in the series, I had shared an approach and also appropriate pre-requisites for setting up an Oracle Publisher based Transactional Replication setup. Today in this post, I shall talk about how different it is to replicate data from SQL Server and Oracle.

Just before I started working on this assignment I recollected several of my past experiences where I had configured SQL Server Replication setups in all the topologies, and based on that I safely presumed that it won’t be that different with Oracle Publisher too. Well, I was partly right, SQL Server replication setup was, and is easy because the wizard helps one to configure publisher, distributor and replication quickly. I would think and also voice that its same for what I had worked on in this assignment except that I had to establish successful connection with Oracle and then let SQL Server do its job of configuring the publisher, distributor and Transactional Replication (pull-subscription).

The difficult part was having the appropriate Oracle client and OLE DB drivers installed, and please refer to my post which has links to the downloads from Oracle Technology Network (OTN).

I would like to share few internals on Oracle Publication from SQL Server, and without this the significance of configuration such type of publication wouldn’t be much.

1. Configuring Oracle Publisher is a new feature starting from SQL Server 2005, and works with Oracle 9i onwards.

Before SQL Server 2005, I remember well we had used SQL Server 2000 DTS packages to pull data to synchronize data between SQL Server and Oracle, and this way got replaced with Oracle Publication using Replication, and this is nice. The use of DTS package wasn’t straightforward because we needed few staging tables to capture the data because of DML change in Oracle published (source tables) and these were to be propagated into SQL Server tables through an UPSERT method. Although this worked well, but had the whole bunch of problems associated initially, before the system settled with that type of data changes that need to be captured. And the with this new feature in SQL Server 2005 I think we have similar setup happening, but everything is handled by SQL Server, and that’s why I said its pretty neat and nice.

2. Data from Oracle could be synched up with SQL Server either through Snapshot Replication or Transactional Replication.

Please note the difference in here between these two type of Replications when configured with Oracle Publisher. At a configuration and working level its almost similar with SQL Server Publisher yet I thought I will put up couple of notable points for anyone’s quick reference.

– A snapshot replication takes snapshot of published tables and applies them as a whole on the subscription end. Simple, eh, and this is run by a Snapshot Agent job which gathers the table data in the form of schema scripts, and publishes for subscriptions to consume. No “change tracking triggers” are created on the publisher tables.

– A transactional replication tracks change in each row of the published tables, through “triggers” created on the tables. The triggers that are created on published tables captures the DML changes and pushes the data into change tracking table and the Log Reader agent reads, and moves the changed data for publishing to subscribers through an intermediary Distribution database.

So transactional replication creates database triggers on the published tables and data is tracked into change tracking tables on the Oracle end. The changed data then is read into distribution database through Log Reader Agent running on SQL Server. This link has details on how the whole process happens, but let me share a snapshot of the tables that are created on the Oracle database.

Here are few snapshots of the system tables, change tracking tables and the triggers created when Oracle Publisher is configured.

Figure.1.

Replication tables created on Oracle schema

ScreenShot003

Figure.2.

Change Tracking Tables to track changes of data on published tables

ScreenShot004

Figure.3.

Trigger that is created on the publisher table

ScreenShot002

With this I am thinking you should be well briefed on what happens when an Oracle Publisher is configured with Transactional Replication. Of course, I didn’t talk much on the Snapshot Replication with any pics, because I couldn’t setup one during this assignment.

In the next post I will walk you through configuration of Oracle Publisher, Distributor.

Until then happy reading, and please feel free to comment.

Cheers!

Oracle to SQL Server Replication-Pre-Requisites May 10, 2013

Posted by msrviking in Configuration, Heterogeneous, Integration, Replication.
Tags: , , , , ,
2 comments

In the first post on this topic I had shared my tiny plan or approach on how to get the Oracle to SQL Server replication done. Today I shall talk a bit about on why I listed the steps, and also share the links of forums, blog posts I had used to work on each step.

1. Environment setup

This was the first step where I had to decide what is that I could have running on that box at an OS level, the hardware configuration I would need to run an instance of Oracle and SQL Server. So here are the specs I chose for bringing up this demo-able replication setup quickly.

  • A virtual machine, this is the fastest and best way to have POCs setup for understanding the internals and share it with teams.
  • A virtual machine with enough memory, CPU power and hard disk space. What would be the configuration for this? I knew I wasn’t showing anything on performance when there is data being replicated between systems, but a simple walk-through to understand the pain of having replication done from Oracle to SQL Server.Here are the specs,
    • Memory /RAM > 4GB
    • CPU > Dual core processor
    • Hard Disk space > 150GB
  • The VM with a an appropriate OS. While I was deciding on these specs I thought it doesn’t make sense to configure replication between Oracle and SQL Server running on the same box, and that too on Windows. So there was a little change of course of what should run where. That made me to pick Windows Server 2008 R2 to run SQL Server 2012 with SP1, and Oracle on Linux box. Now the challenge to find a box running with Linux and Oracle on top of it. I got some help because I had one of the boxes with Oracle on Linux, running for another customer so I had to just use it after few requests with the team.

Here is the link that tells you about software and hardware requirement to run SQL Server 2012 SP1 Developer Edition (this edition has Replication Features). We know about this link, nevertheless I thought sharing won’t cost me anything and could help anyone of us.

2. Installation of Oracle

As mentioned earlier I wanted to have the setup with Oracle on Linux and “Oracle edition that supports replication setup”. Just take a note of the last sentence in quotes, by saying that I wanted Oracle Standard /Enterprise Edition to support Oracle Gateway option which is part of these editions, and this option is not available in Oracle Express Edition. Oracle Gateway needs to be installed separately without any license in the same box of Oracle instance or a different box. I didn’t want to get into too many installations for a quick POC hence chose an Oracle Enterprise edition instance which was already up and running on Linux, 64 bits environment. Made my job precise and easy, eh!

All this said I shall still share the links on the Oracle Editions, and the license information /feature availability per Edition.

Oracle Standard Edition (SE), and Enterprise Edition (EE) features availability

http://docs.oracle.com/cd/E11882_01/license.112/e10594/editions.htm#CJACGHEB

Oracle Express Edition (EE)

http://docs.oracle.com/cd/B25329_01/doc/license.102/b25456/toc.htm#BABJIJCJ

Another point that I realized although didn’t or probably won’t be of great importance is that you have only an Oracle 32-bit Express Edition on Windows, and no 64-bit is available. I checked, and verified that there is no such installation. Of course I didn’t have to bother much because I dropped the idea of running an Oracle XE on Windows, rather a EE, 64 bit on Linux. Nevertheless, in case you are hunting for one for whatever reason I am thinking this should help you to stop the mad hunt. Here is the link where it certifies that no 64-bit.

https://forums.oracle.com/forums/thread.jspa?threadID=2277728&tstart=0

3. Installation of SQL Server

This was simplest because this was done a zillion times earlier, except that I had to be choosy in features list when I was installing the SQL Server instance on Windows Server 2008 R2 64 bit. The feature I had to make sure that was available was Replication as part of installation, and that too was done with ease. The pre-requisites were taken care when I chose the OS, Hardware for this piece of work, but make sure  you have latest service pack. I haven’t seen anything break when I tried with SQL Server 2012 w/o SP, though.

4. Configuration of Oracle publisher and SQL Server Transactional Replication – Subscriber

Under this heading I won’t write about how the configuration was done but the intent is to list down the pre-requisites.

  • What is needed before Oracle publisher is configured?
    • Oracle Database Client (Oracle Database 11g Release 2 Client) on Windows box where SQL Server 2012 instance is running and you could get this from here. Please remember the bit version you would like to download, and trust me this is the most important step to establish a connection from SQL Server on Windows with Oracle on Linux. You mess or miss this step for whatever reason you will realize that it was simple step, and bad mistake.
    • 64-bit Data Access Components to be installed on SQL Server box. You will get that installable from here, and you will want to choose the right one based on your environment. I had to pick this one 64-bit ODAC 11.2 Release 5 (11.2.0.3.20) for Windows x64.
  • What is needed for SQL Server Transaction Replication – Subscriber setup? Primarily everything done in the first step is good enough to establish a connection with Oracle, and then configuration a publication – subscription (pull) is quick.

I shall share finer details under each of the configuration steps and the intent in this post was to tell what you should have installed before you start putting the Oracle Publisher – SQL Server Transactional Replication – Subscriber in place.

At the end this link – Configuring an Oracle Publisher talk about same steps I have put in here, but falls short of pointing to right places of appropriate downloads.

I hope these help, and please feel free to comment.

Cheers!

Oracle to SQL Server Replication – First steps May 6, 2013

Posted by msrviking in Configuration, Heterogeneous, Integration, Replication.
Tags: , , , ,
1 comment so far

There was a request in my shop to setup a Transactional Replication on SQL Server 2012 and the publisher would be Oracle. A known setup in SQL Server – SQL Server, but not straight if Oracle is the publisher. The setup was to be such a way that nothing should be done on Oracle end, and everything should be configured from SQL Server. Sounds good and easy, eh! This was my first thought, but post this exercise I realized that it definitely isn’t that straight forward too and needs a bit of planning, careful installations of dependent features, and at the end has helped me refresh Replication setup knowledge which I didn’t apply for last few years.

In this post I shall share steps to configure transactional replication on SQL Server 2012, and the publisher is Oracle 11g enterprise edition running on Linux.

As mentioned earlier firstly I had to put in my thoughts to see what are the steps are involved, what are the other features or dependent components I need to install before I configure replication. I will try my best to post in few images of installation of any important step, otherwise it would be more of text writing.

Here is the high level activity I put in for sake of clarity and to know where I am. This helps and helped me to judge on how much time would I take to configure the whole setup, whatever in-depth experience I have. A bit of planning and picking the right steps (an approach) did help me avoid some rework.

  1. Environment setup
  2. Installation of Oracle
  3. Installation of SQL Server
  4. Configuration of Oracle Publisher
  5. Configuration of SQL Server transactional replication-subscriber

I shall share few thoughts against some of the above steps, because I realized everything requires few pre-requisites. For e.g. an environment is the box that is needed to run both SQL Server and Oracle, installation of appropriate version and edition of Oracle or SQL Server, drivers and client network /connectivity tools, connecting to Oracle through SQL Server replication and publishing the tables. All sound easy and familiar but things go bad and take few hours more than expected how much so you plan and jot your points.

I shall write details in the next post as part of this series, so stay tuned.

Cheers.