Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234»»

Best Approach to Archieve DB Expand / Collapse
Author
Message
Posted Thursday, March 21, 2013 12:07 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:13 AM
Points: 35,262, Visits: 31,745
david.alcock (3/21/2013)
SSIS is normally utilised when the copy/move is complex, if it involves data transformations, validations etc.


I'll go ahead an disagree with that. I've built some killed ETL systems that would logon to FTP, download only what was needed, mark what was downloaded, import the data, transform it, validate it, and merge it with the final tables all using just T-SQL including some rather crazy "don't know how many columns there are in the import" functionality using data imported from Double-Click.net.

ETL is just not that difficult. It's ok to use SSIS for this type of stuff but I haven't actually found a need for it. I've even been able to do parallel loads without it.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1433953
Posted Thursday, March 21, 2013 1:23 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, September 26, 2014 7:33 AM
Points: 7,107, Visits: 12,660
In this case SSIS is a good choice regardless of the complexity of the tranformation because the source and destination databases are going to be on different instances.

If you want to stay in T-SQL your choices become Linked Servers or xp_cmdshell. Personally I would continue down the path of implementing an SSIS Package to do the archiving.

My Task is mentioned belwo..

Copy all data from Source to destination DB
Every month only the updated or new data needs to be transffered to destination..



Will SSIS help in this?

In order to retrieve 'only the updated or new data' you will need to know something about each row that you can use to derive whether it was updated or new. And what about deletes?

Typical columns used to determine new or modified rows include 'created date' or 'last updated date' columns. Do you have one or both of those, and are they reliable?

A rock solid way to do this, although it introduces more DBA work, is to leverage either Change Tracking or Change Data Capture. Those two features work at the database-engine-level to track which rows have changed.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1433999
Posted Friday, March 22, 2013 5:05 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 12:13 AM
Points: 258, Visits: 690
I am stuck at the end part

please refer below link

http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/75331/

I have everything uptill ---(Let's continue building the incremental load. )

I did build incremental load but new rows is not getting transffered

pleaes help









************************************
Every Dog has a Tail !!!!!
Post #1434199
Posted Friday, March 22, 2013 10:50 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, September 24, 2014 6:41 AM
Points: 217, Visits: 865
Jeff Moden (3/21/2013)
david.alcock (3/21/2013)
SSIS is normally utilised when the copy/move is complex, if it involves data transformations, validations etc.


I'll go ahead an disagree with that. I've built some killed ETL systems that would logon to FTP, download only what was needed, mark what was downloaded, import the data, transform it, validate it, and merge it with the final tables all using just T-SQL including some rather crazy "don't know how many columns there are in the import" functionality using data imported from Double-Click.net.

ETL is just not that difficult. It's ok to use SSIS for this type of stuff but I haven't actually found a need for it. I've even been able to do parallel loads without it.


When a guy like Jeff starts off with saying 'I'm going to disagree with that' you know you're in trouble, well I get that feeling of dread anyway!!!!

Of course you're absolutely right, you can implement all kinds of magic using T-SQL. My suggestion was more based on the requirement here and not a generalisation, if it involves some level of complexity then SSIS would be a way of implementing the process rather simply (like a lookup task or something).



'Only he who wanders finds new paths'
Post #1434396
Posted Saturday, March 23, 2013 6:51 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:13 AM
Points: 35,262, Visits: 31,745
david.alcock (3/22/2013)
Jeff Moden (3/21/2013)
david.alcock (3/21/2013)
SSIS is normally utilised when the copy/move is complex, if it involves data transformations, validations etc.


I'll go ahead an disagree with that. I've built some killed ETL systems that would logon to FTP, download only what was needed, mark what was downloaded, import the data, transform it, validate it, and merge it with the final tables all using just T-SQL including some rather crazy "don't know how many columns there are in the import" functionality using data imported from Double-Click.net.

ETL is just not that difficult. It's ok to use SSIS for this type of stuff but I haven't actually found a need for it. I've even been able to do parallel loads without it.


When a guy like Jeff starts off with saying 'I'm going to disagree with that' you know you're in trouble, well I get that feeling of dread anyway!!!!

Of course you're absolutely right, you can implement all kinds of magic using T-SQL. My suggestion was more based on the requirement here and not a generalisation, if it involves some level of complexity then SSIS would be a way of implementing the process rather simply (like a lookup task or something).



Heh... sorry, David. I didn't intend to make anyone nervous. Like you, I was making a mostly general statement. I'm just amazed at how many people make it seem like you have to use something like SSIS to do even the simplest of IO tasks. Perhaps the greatest irony of them all is the fact that many people will call a stored procedure from SSIS to accomplish their SSIS task but will still insist that SSIS is the only way to go for "complex" things.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1434637
Posted Saturday, March 23, 2013 7:24 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:13 AM
Points: 35,262, Visits: 31,745
opc.three (3/21/2013)
In this case SSIS is a good choice regardless of the complexity of the tranformation because the source and destination databases are going to be on different instances.

If you want to stay in T-SQL your choices become Linked Servers or xp_cmdshell. Personally I would continue down the path of implementing an SSIS Package to do the archiving.


I have to both agree and disagree there. It's no more difficult to setup a linked server than it is to setup an extra connection in SSIS. So far as importing data from a non-database source on another server goes, there's no need for either a linked server or xp_CmdShell. A simple shared drive and trusted connections along with BULK INSERT will do fine for what most people want or need to do.

On the subject of xp_CmdShell, if someone get's the heebie-geebies at the mere mention of xp_CmdShell or a trusted OPENQUERY that uses ACE drivers, then SSIS is definitely the way to go. Since we've been on that subject before, I'll remind folks that xp_CmdShell isn't a security risk. Bad security is a security risk because any hacker that get's into your system with "SA" privs is going to change your career by using xp_CmdShell whether you have it turned off or not.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1434642
Posted Saturday, March 23, 2013 8:06 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, September 26, 2014 7:33 AM
Points: 7,107, Visits: 12,660
Right, BULK INSERT, or sp_OA, or SQLCLR, and maybe some other things that could be initiated from the server that I didn't bother mentioning. I still disagree with you on the point of 'xp_cmdshell is not a security risk' but that's OK. For me, SSIS would bridge the 'instance gap' and allow us to move data across the network without initiating file system access from within the database engine, a "win" from my perspective.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1434651
Posted Sunday, March 24, 2013 11:56 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429, Visits: 1,721
I'm in the middle of an archiving project right now. The customer has a single database instance on a shared server with a major host and isn't running enterprise edition so this limited my options greatly.

My basic concept is to export the data to be archived to a separate table with a constraint on the last updated date column that covers the time period to be archived. Then, I add the table to a partitioned view that can be queried without knowing which base table is involved.

To make matters more complicated, the archival data must be "split" so that any personal data is in one encrypted base table and related information that has no personally identifiable information goes into a second table. The tables are related by a uniqueidentifier that only exists in the two archive tables. Once archived, the underlying tables are read-only. The data in the view can be queried via a stored procedure that has the month and year as parameters.

The final step after creating the archive and moving over the data is to delete the rows from the production tables. This step is not to be taken lightly! So when I create the archive tables I create a flag on the production tables that they are only virtually "deleted." Then I compare the two new archive tables against the rows that have been "virtually" deleted and make sure that the numbers and data all add up. Only then are the original rows physically deleted. All of this is done within a transaction so that if something goes haywire it will get rolled back.

 
Post #1434688
Posted Sunday, March 24, 2013 1:31 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 9:13 AM
Points: 35,262, Visits: 31,745
Steven Willis (3/24/2013)
I'm in the middle of an archiving project right now. The customer has a single database instance on a shared server with a major host and isn't running enterprise edition so this limited my options greatly.

My basic concept is to export the data to be archived to a separate table with a constraint on the last updated date column that covers the time period to be archived. Then, I add the table to a partitioned view that can be queried without knowing which base table is involved.

To make matters more complicated, the archival data must be "split" so that any personal data is in one encrypted base table and related information that has no personally identifiable information goes into a second table. The tables are related by a uniqueidentifier that only exists in the two archive tables. Once archived, the underlying tables are read-only. The data in the view can be queried via a stored procedure that has the month and year as parameters.

The final step after creating the archive and moving over the data is to delete the rows from the production tables. This step is not to be taken lightly! So when I create the archive tables I create a flag on the production tables that they are only virtually "deleted." Then I compare the two new archive tables against the rows that have been "virtually" deleted and make sure that the numbers and data all add up. Only then are the original rows physically deleted. All of this is done within a transaction so that if something goes haywire it will get rolled back.

 


Good documentary on what you do for archiving. I do have to ask though.... you mention the split for personal information and that you (if I'm reading this correctly) then encrypt the personal data and store it in a separate table. My question would be, why isn't that data encripted in the original table to begin with?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1434695
Posted Sunday, March 24, 2013 6:17 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429, Visits: 1,721
Jeff Moden (3/24/2013)
Steven Willis (3/24/2013)
I'm in the middle of an archiving project right now. The customer has a single database instance on a shared server with a major host and isn't running enterprise edition so this limited my options greatly.

My basic concept is to export the data to be archived to a separate table with a constraint on the last updated date column that covers the time period to be archived. Then, I add the table to a partitioned view that can be queried without knowing which base table is involved.

To make matters more complicated, the archival data must be "split" so that any personal data is in one encrypted base table and related information that has no personally identifiable information goes into a second table. The tables are related by a uniqueidentifier that only exists in the two archive tables. Once archived, the underlying tables are read-only. The data in the view can be queried via a stored procedure that has the month and year as parameters.

The final step after creating the archive and moving over the data is to delete the rows from the production tables. This step is not to be taken lightly! So when I create the archive tables I create a flag on the production tables that they are only virtually "deleted." Then I compare the two new archive tables against the rows that have been "virtually" deleted and make sure that the numbers and data all add up. Only then are the original rows physically deleted. All of this is done within a transaction so that if something goes haywire it will get rolled back.

 

Good documentary on what you do for archiving. I do have to ask though.... you mention the split for personal information and that you (if I'm reading this correctly) then encrypt the personal data and store it in a separate table. My question would be, why isn't that data encrypted in the original table to begin with?

The sensitive data in the original table is encrypted (or at least it will be). This client has been collecting data for his health-care related business and "suddenly" realized he might have a problem with HIPPA compliance.

However, in addition to the personal health information (PHI), there is a second set of data which consists of more than a hundred questions all of which are yes/no or have some scalar-value response that could never be directly associated to an individual unless they also had access to the encrypted table which has the uniqueidentifier FK. By separating this second set of data from the PHI the access control for reporting and analysis can be granted to users who don't need to see who the data was originally connected to. Also, performance is enhanced because it isn't necessary to encrypt the data. (Hmmmm...maybe I could take all those question responses, concatenate them into a delimited string and store them in a single column and use DelimitedSplit8K to query them...just kidding!)

If this client had the resources to do so, it would certainly be best practice to put this "cleaned" data on a completely different server or at least a different database instance so if the tables were compromised it would still be several degrees of separation from the sensitive PHI. Working in a shared hosting environment on non-Enterprise Edition instances without full sysadmin privileges requires a lot of workarounds. There have been cases when my only SQL access was via a host company's interface--yuk! But that is mostly the SQL world I live in.

Another purpose for separating the data is due to poor query performance resulting from the original design of the client's database tables. The table holding the PHI and the form question responses is in one ridiculously wide table with over 150 columns. Unfortunately I have no control over that since the data comes from a third-party application. So the best that I could do is manipulate the data myself. With all but the active and most recent data moved to the archive tables, the performance of the existing wide table is significantly improved. If a search of the archive is required it would only be on the encrypted table with the PHI (by users with proper credentials of course). That encrypted PHI archive table only has about 15 columns and by creating a partitioned view by month the query performance on searches goes from something like 30 seconds down to 1 sec. If then the user wants to see the details for one individual row, a separate query will retrieve the detail data. But then it is only pulling in the one related row and not querying 25,000 rows x 150 columns. For all of this I had to write the search query procedures so that the PHI will only be displayed encrypted except for a search column (such as name) or be completely unencrypted depending on the user's credentials.

 
Post #1434714
« Prev Topic | Next Topic »

Add to briefcase «««1234»»

Permissions Expand / Collapse