Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Manipulating data in a 48 million record table Expand / Collapse
Author
Message
Posted Friday, February 22, 2013 5:21 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, October 31, 2014 10:27 AM
Points: 708, Visits: 3,293
I have the following query which basically gets data from a 48 million record table, performs various things using SQL CLR C# functions then puts the results into a new table.

Initially I set the code to run on the entire table but after 3 hours the code failed due to some dodgy data in one of the columns! So now I want to perform this in chunks of 1 million records at a time.

However, as the Loading_B2C_keys_ table has more and more data, the insert is getting slower and slower!

I've tried adding a non-clustered index on the ID column on both tables but it's not having much effect? Is there a better way to do this?

By the way, the ID column appears to contain a mixture of integers and GUIDs. Should I create a new Identity column on the big table and use that as my ID? Would that help?

INSERT INTO dbo.Loading_B2C_keys_ (
ID,
mkNameKey,
mkName1,
mkName2,
mkName3,
mkNormalizedName,
mkOrganizationKey,
mkNormalizedOrganization,
mkOrgName1,
mkOrgName2,
mkorgName3,
mkPostIn,
mkPostOut,
mkPhoneticStreet,
mkPremise,
mkPhoneticTown,
mkEmailAddress)
SELECT top 1000000 ID,
dbo.clrFn_DoubleMetaphone(dbo.clrFn_GetLastWord(SURNAME)) + LEFT(FIRSTNAME, 1), dbo.clrFn_DoubleMetaphone(dbo.clrFn_GetLastWord(SURNAME)),
dbo.clrFn_DoubleMetaphone(dbo.clrFn_GetFirstWord(FIRSTNAME)),
dbo.clrFn_DoubleMetaphone(dbo.clrFn_GetFirstWord(SECOND_INIT_NAME)),
UPPER(dbo.clrFn_GetLastWord(SURNAME)) + ',' + UPPER(dbo.clrFn_GetFirstWord(FIRSTNAME)) + ',' + UPPER(dbo.clrFn_GetFirstWord(SECOND_INIT_NAME)),
NULL,
NULL,
NULL,
NULL,
NULL,
dbo.clrFn_SplitUKPostCode(POSTCODE, 2),
dbo.clrFn_SplitUKPostCode(POSTCODE, 1),
dbo.clrFn_DoubleMetaphone(dbo.clrFn_RemoveDigits(ISNULL(AD1, '') + ' ' + ISNULL(AD2, ''))),
dbo.clrFn_GetDigits(ISNULL(AD1, '') + ' ' + ISNULL(AD2, '')),
dbo.clrFn_DoubleMetaphone(dbo.clrFn_RemoveDigits(TOWN)),
Email
FROM dbo.Loading_B2C as a
WHERE NOT EXISTS (SELECT 1 FROM dbo.Loading_B2C_keys_ as b WHERE a.ID = b.ID)



---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1422996
Posted Friday, February 22, 2013 6:47 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Thursday, November 20, 2014 3:31 AM
Points: 56, Visits: 537
Please check this code:
-- Session 1
declare
@from_id int = 0
,@to_id int = 0
,@rowcnt int = -1

create table ##tmp_move_date (id int primary key, from_id int, rowcnt int)
insert into ##tmp_move_date (id, from_id, rowcnt ) values (1, 0, 0)
select @from_id = from_id from ##tmp_move_date where id = 1

while (@rowcnt <> 0)
begin
select top 1000000 @to_id = id from old_table where id > @from_id order by id
begin tran
insert into new_table <fields>
select <fields> from old_table where id between @from_id and @to_id
-- order by id

set @rowcnt = @@ROWCOUNT
set @from_id = @to_id + 1

update ##tmp_move_date
set from_id = @from_id,
rowcnt = rowcnt + @rowcnt
commit
end

-- Session 2: Monitoring
select * from ##tmp_move_date (nolock)

Post #1423029
Posted Friday, February 22, 2013 6:53 AM


Default port

Default portDefault portDefault portDefault portDefault portDefault portDefault portDefault port

Group: General Forum Members
Last Login: Friday, November 21, 2014 12:51 PM
Points: 1,433, Visits: 3,230
Abu Dina (2/22/2013)

By the way, the ID column appears to contain a mixture of integers and GUIDs. Should I create a new Identity column on the big table and use that as my ID? Would that help?


yes... and I would do the same on the destination table ... and make it a clustered index because you want to make sure all the inserts go at the end of the table.

Also, if you had an identity ID on the big table you could change the "WHERE NOT EXISTS (SELECT 1 FROM dbo.Loading_B2C_keys_ as b WHERE a.ID = b.ID)" in your inserts to "WHERE ID > @NextChunkID" instead. You would increment @NextChunkID by 1000000 (or whatever your top(N) is ) in each batch loop.

If I were doing this though, I would consider doing the transformations and writing rows to a CSV file. Then Bulk insert the end results back into a table. That would be a faster than the way you are doing it I think.




The probability of survival is inversely proportional to the angle of arrival.
Post #1423034
Posted Friday, February 22, 2013 6:55 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, October 31, 2014 10:27 AM
Points: 708, Visits: 3,293
Thanks for this suggestion.

I managed to find a workaround for this. The original ID column was a varchar column with a mixsture of INTs and GUIDs (that's how the data was supplied) so I created a new table and added an IDENTITY column then indxed the column. I am now using this new ID in my EXISTS clause.

After inserting 10 million rows as an initial load I can now do 10k rows in 2 seconds.


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1423039
Posted Friday, February 22, 2013 6:58 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Friday, October 31, 2014 10:27 AM
Points: 708, Visits: 3,293
sturner (2/22/2013)
Abu Dina (2/22/2013)

By the way, the ID column appears to contain a mixture of integers and GUIDs. Should I create a new Identity column on the big table and use that as my ID? Would that help?


yes... and I would do the same on the destination table ... and make it a clustered index because you want to make sure all the inserts go at the end of the table.

Also, if you had an identity ID on the big table you could change the "WHERE NOT EXISTS (SELECT 1 FROM dbo.Loading_B2C_keys_ as b WHERE a.ID = b.ID)" in your inserts to "WHERE ID > @NextChunkID" instead. You would increment @NextChunkID by 1000000 (or whatever your top(N) is ) in each batch loop.


heh.. I saw your reply come in as I was about to post my previous one!

The new identity column seems to have done the trick. Although I'm still using the exists. Will try your method to see if I can get better performance.


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1423042
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse