December 12, 2011 at 12:08 pm
Hello,
I have a SOURCE Server where Machine SNAPSHOTS are generated every second with a SNAPSHOT_ID. I am staging this SNAPSHOT table on our production server using a Linked Server.
I used many approaches for staging this data. Usually we find the MAX(SNAPSHOT_ID) on our staging area and get the SNAPSHOT IDs greater than the MAX from the Source server. However there are cases where we miss some SNAPSHOTS records.
For instance GPS signal on machine creates a SNAPSHOT and goes into a DELAY period. Then other GPS signals on other machine create SNAPSHOTS with an Incremental SNAPSHOT ID. When that previous Snapshot records comes out of the DELAY it is written into a table. Thus when we have the MAX logic when loose this records.
Lately I came to know about the SQL Server 2008 MERGE statement. I tried this on our table . It works very well. My only concern is my SNAPSHOT tables is Real time in nature where every second many records are added. When I use MERGE statement , it takes a lot of time to INSERT new records. Also, I suppose as the days pass by the Time would increase as the Millions of Millions records would be inserted into the table.
Merge MACHINE_SNAPSHOT as target
using (select * from [SSServer].SourceDB.dbo.MACHINE_SNAPSHOT) as source
on (Target.MACHINE_SNAPSHOT_ID = source.MACHINE_SNAPSHOT_ID)
/* when matched then update set target.MACHINE_SNAPSHOT_ID =source.MACHINE_SNAPSHOT_ID*/
when not matched by target then
insert(MACHINE_SNAPSHOT_ID, TIME_START, TIME_END, MACHINE_ID,) values
( source.MACHINE_SNAPSHOT_ID, source.TIME_START, source.TIME_END, source.MACHINE_ID);
Is there any we can restrict records on the TARGET side based on Current date at source and Target for comparison. This way it would not have to do a Cartesian product of Millions X Millions records. i.e. TIME_START > GETUTCDATE() - 2 and TIME_START < GETUTCDATE
Something like below
Merge (Select * from MACHINE_SNAPSHOT where TIME_START > GETUTCDATE() - 2 and TIME_START < GETUTCDATE ) as target
using (select * from [SSServer].SourceDB.dbo.MACHINE_SNAPSHOT where > GETUTCDATE() - 2 and TIME_START < GETUTCDATE
) as source
This does not work
I would appreciate any other approach or suggestions.
Thanks!!!
December 12, 2011 at 1:17 pm
Are you merely inserting new records or are you updating existing records at the same time?
If just inserting new records, I'd be more inclined to operate off a timestamp (either datetime or rowversion).
I'm not clear on how you're using snapshots for this. Snapshots are just a point-in-time of the data state. If you update records at 3:00:00.000 and you take a snapshot at 3:15 and another snapshot at 3:30, both will have the 3:00 data, unless it changed between 3:15 and 3:30. What they won't have is data changed after 3:15 and 3:30. Thus, I don't see how "missing one" would matter, unless you're updating data constantly (not just inserting, but also updating), and you somehow are using the snapshots to log the updates. Which again doesn't make much sense to me, since you could just use the snapshot itself as the log.
So I must be missing some detail. I'm sure you're doing what you need to, and I'm just not envisioning it correctly. Can you clarify a bit? If so, I can probably help better.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 2 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply