Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


T-SQL Data Processing


T-SQL Data Processing

Author
Message
Peter E. Kierstead
Peter E. Kierstead
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 453
Comments posted to this topic are about the item T-SQL Data Processing



PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.
GSquared
GSquared
SSC-Insane
SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)SSC-Insane (20K reputation)

Group: General Forum Members
Points: 20207 Visits: 9730
I may be missing something here, but why use a While loop for this kind of thing? Do a set-based merge of all the update data, then upsert it into the master table. Two steps, very simple, very clean, no While loop.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Jerry Hung
Jerry Hung
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1182 Visits: 1208
I don't like dedupe, let me tell you that Smile

But I can't imagine how this WHILE loop would work on millions of rows of data comparison
I had issues with comparing between 400K records vs 8000 inserts using SET operations (still acceptable if I separate by country, etc..)

started looking to SSIS Fuzzy Matching and Fuzzy Lookup

Fuzzy Matching - It's pretty cool, dedupe within say the Master table without any T-SQL work, takes a while but it even gives confidence score

Fuzzy Lookup - I am still working on it, supposedly I can lookup those 8000 inserts in the 400K Master table

SQLServerNewbie

MCITP: Database Administrator SQL Server 2005
Peter E. Kierstead
Peter E. Kierstead
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 453
I'm not a fan of dedup either, and didn't expect a harty round of approbation from this article...
I've found that you should ALWAYS consider your alternatives when it comes to SQL programming. Things frequently don't "work" as we think they "should".

I use this code to dedup databases of over 1 billion (yep, the B word) rows, a quarter of which (~250,000,000) are duplicates of some kind (exact or based on some "fuzzy" logic). One database processes in about 3 days, the other in about 18 hours (one has a larger record size than the other).

This represented a significant reduction in processing times for both these databases over the previous methodology written using a hybrid of external and internal (SQL w/cursors) coding; and its all done in SQL.

I am constantly looking for betterprocessing techniques that perform the required functions AND run quicker than existing procedures. So far, this is it.

As for SSIS, I will look into it, having not used it before.



PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.
Jacob Luebbers
Jacob Luebbers
SSC-Addicted
SSC-Addicted (496 reputation)SSC-Addicted (496 reputation)SSC-Addicted (496 reputation)SSC-Addicted (496 reputation)SSC-Addicted (496 reputation)SSC-Addicted (496 reputation)SSC-Addicted (496 reputation)SSC-Addicted (496 reputation)

Group: General Forum Members
Points: 496 Visits: 1215
Small suggestion: maybe I missed a reference to this, but wouldn't the Bit datatype be perfectly suited to the Data Source column (one bit column per source)? No need for explicit bitwise operations, acheives the same thing as you're doing against a TinyInt under the covers and the resulting code would be a little more human-readable...?

Regards,

Jacob
Peter E. Kierstead
Peter E. Kierstead
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 453
This process was designed on a SQL 2000 system and ported to SQL 2005 with minimal changes. When I refactor it for SQL 2005 that would be the way to go.



PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.
Frances L
Frances L
SSC-Addicted
SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)

Group: General Forum Members
Points: 453 Visits: 299
Select
PartyId,
FirstName,
LastName,
Case when Source&1<>0 then 1 else 0 End[Source1],
Case when Source&2<>0 then 2 else 0 End[Source2],
Case when Source&4<>0 then 4 else 0 End[Source3],

AddDate,
ModDate
from dbo.ExistingMaster

Will you please let me know what Case when Source&1<>0 then 1 else 0 for in this code ?
Thx.
Peter E. Kierstead
Peter E. Kierstead
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 453
Our current database represents data source as a TinyInt column containing bit values to indicate source type; 1=Source1, 2=Source2, 4=Source3, etc... where the decimal numbers 1, 2, & 4 represent binary bits 00000001, 00000010, 00000100, respectively.

In order to process these values within then constraints of SQL (SQL not having an aggregate OR function) I break the TinyInt into 3 separate columns Source1, Source2 & Source3 (as seen in the code). The 3 case statements, you've identified, serve this purpose.



PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.
Frances L
Frances L
SSC-Addicted
SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)SSC-Addicted (453 reputation)

Group: General Forum Members
Points: 453 Visits: 299
Source is the column name. I still do not understand Source&1 or Source&2 here.
Peter E. Kierstead
Peter E. Kierstead
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 453
Source is the name of a TinyInt column in the table.

Source&4<>0 is a method of determining a binary bit's value, in this case the 3rd bit from the right.

Source&4 is a boolean AND operaration using 4 as the mask, so if:

Source = 00000111 (decimal 7)
opcode = &
(mask) = 00000100 (decimal 4)
--------------------------------
yields 00000100

which is not equal to zero therefore the CASE statement would return 1.



PeteK
I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search