Sql to oracle replication

  • you don't use CDC instead of ETL - CDC IS a method that can be used as part of a ETL Process

  • Thanks! That's what I thought but I wanted to make sure if i'ts not any different. Would CDC method works better for Very large database(2TB)?  Does Informatica handles better Vs SSIS?

  • frederico_fonseca wrote:

    Admingod wrote:

    Thanks! Has anyone has any thoughts on Change Data Capture.  The CDC function is, probably similar to the transactional replication.  It is not the same, but it has this feature each transaction is represented in the CDC tables.

    https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/replicate-track-change-data-capture-always-on-availability?view=sql-server-ver15

    CDC that is what I use for one of my systems (90 tables that need to be replicated to Oracle).

    It will work but with its own issues if the tables under CDC are subject to DDL changes - and all tables will require a Primary Key to be set - without one CDC won't work

    rough setup is

     

      <li style="list-style-type: none;">

    • 1 Configuration table to hold timestamp last time ETL started processing changes

     

      <li style="list-style-type: none;">

    • 1 SP per table to identify changes

     

      <li style="list-style-type: none;">

    • 1 SSIS package per table - Using Attunity Driver (with its own issues due to differences between how Oracle treats empty strings vs SQL Server (Oracle sets to null, SQL sets to empty string (not null!!!))
        <li style="list-style-type: none;">

      • full load

      <li style="list-style-type: none;">

    • incremental
        <li style="list-style-type: none;">

      • Staging table in Oracle

      <li style="list-style-type: none;">

    • update/insert/delete Execute SQL in Oracle

      (Note - due to differences related to empty strings Load using Attunity driver must sometimes be set to Array load instead of Fast Load - although slower it allows the load to retain trailing spaces)

     

    Depending on exact setup and volume of data using CT (Change Tracking) may also be an option - and with less impact on DML operations - but harder to determine the values that changes as it will only tell you the PK that changed and what columns.

    an alternative to the above is having a column on all tables that gets set for any modification to the record (through triggers) so that you always know which records have changed since last ETL.

    Not to hijack this thread but I have a similar question that I don't know the answer to (yet).  Do you know if CDC can be used to push from AWS RDS to an on-premise instance and vice-versa or it there something even better for that type of thing?  I'm asking because I'm just starting to cut my teeth on AWS.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CDC can be used in AWS RDS - https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.SQLServer.CommonDBATasks.CDC.html

    but CDC does not push anything - just helps a ETL job to identify what has changed.

  • Admingod wrote:

    Thanks! That's what I thought but I wanted to make sure if i'ts not any different. Would CDC method works better for Very large database(2TB)?  Does Informatica handles better Vs SSIS?

    any ETL tool works well - after you learn how to use it according to your needs.

    As for what is better it again depends on the NEEDS - you need to do a analysis of what you need, then investigate the different tools on the market for their functionality, their cost, their support (both from the vendor and from third parties, availability of technicians to use the tool and so on.

    but this is work you have to do - I can tell you X is better they Y but on your case it could be the other way around if your needs are different than mine.

    So instead of always asking around for other people opinion I advise that you start reading and thinking for yourself and only after you do some ground work analysis and have a matrix of pros/cons of different products and if you then have specific problems questions you can ask - otherwise you are asking us to do your work for you without you paying us to do it.

     

  • frederico_fonseca wrote:

    CDC can be used in AWS RDS - https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.SQLServer.CommonDBATasks.CDC.html

    but CDC does not push anything - just helps a ETL job to identify what has changed.

    Got it.  Thank you, good Sir!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you sir! My understanding is CDC falls under asynchronous process and will not impact the current transactions Vs trigger which falls as synchronous process. Since CDC uses log reader to read the changes from the Log. If the SQL server is configured for AG group, I was thinking to enable the CDC on secondary replica instead of primary replica. Would you see any issues with this setup? Would Availability group and CDC works together well since they rely mostly on log. I have seen problems when I configured replication with DB mirroring. Again, I understand this needs to be tested, but checking to see if anyone has ever came across? How about Partitioned tables work for CDC? Thanks in Advance.

  • Any thoughts on Qlik or Qlik connector for replication from sql to oracle. It doesn't seems to work on prem is that for a cloud? Do you agree?

  • I started working on CDC from sql to oracle(destination). However, it might take some time for me to get access to oracle. So I thought enable the CDC on source(SQL) and start loading the data using ETL to SQL destination instead of Oracle until I get the access to oracle. Do you think this is good way to proceed or do you see any issues with this path forward?

  • Since we are on availability group. I don't think I can able to enable CDC on secondary replica since its read only copy.  Other than primary would you suggest any alternate option to enable CDC? Would enabling on production will impact any performance? Please advise?

  • There are two parts to this conundrum:

    • identify when a row has changed
    • extract the data into a format that Oracle can process

    There are many ways to identify when a row has changed and depends on your system. These include Change Tracking, CDC, datestamps and many more. I would suggest that you choose the simplest method that works for you. Steer clear of CDC unless you need the functionality as it can have a negative impact on s system, especially where there are a large number of changes in a short space of time.

    ETL can use this to perform the extract.

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply