Performance Question

  • Dear all,

    I have a database in which exists a table with 100 Million rows. At the Moment no

    partationing, one column caled code with a varchar(10) Datatype as the primary key.

    Then I have a stored procedure with a paramter how many Codes I want to create.

    In a while loop a make a lookup to the code table to check if the code exists if not

    I insert the code to the code table.

    Has someone an idea to increase the performance. If i run the stored procedure

    with 1 Million rows it runs 15 Minutes, that is too long.

    Thanks for all replies

    Thorsten

  • Please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Here are the needed Informations:

    CREATE TABLE [dbo].[Codes](

    [Code] [char](10) NOT NULL,

    CONSTRAINT [PK_Codes] PRIMARY KEY CLUSTERED

    (

    [Code] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    ) ON [PRIMARY]

    -------------------------------------------------------------------------------------------------

    THE STORED PROCEDURE

    ALTER PROCEDURE [dbo].[CreateCodes]

    -- Add the parameters for the stored procedure here

    @amount bigint

    AS

    BEGIN

    SET NOCOUNT ON

    declare @tempcode as nvarchar(100)

    declare @counter as bigint

    set @counter = 1

    while @counter <= @amount

    begin

    set @tempcode = (SELECT [dbo].[RNGCharacterMask] (8 ,'34679CDFGHJKLNPRTVXY') )

    if ((select count(code) from codes where code = @tempcode) = 0)

    begin

    insert into codes (code) values (@tempcode)

    set @counter = @counter + 1

    continue

    end

    else

    continue

    end

    set nocount off

    END

    -----------------------------------------------------------------------------------------------

    The Stored Procedure calls a User defined function which generates the codes. But this ist not

    the reason why the performance ist so bad.

    In addition I add the excecution plan as txt file. and I think the most performance is consumed by the insert statement. But take a look and give me your feedbacks.

  • What are you trying to achieve here? What is this function supposed to do and why?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hey,

    in the data base we have codes which will printed on packages.

    We have two things to do:

    First we have the generate new codes in some interfalls. Generating means between 10 and 50 million new codes.

    the other thing is to query the database to get a code which is not used, the state for used will be insert in the future.

    I hope I can clearify your question.

    greetings

  • Can you post the code for that function? Running a function repeatedly in a loop is not the fastest way of doing things, but if I put the function into a set-based insert, it'll still be run repeatedly.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I can not give you the code of the function

    because of security reasons....

    Only as much the function generates a code with a security cryption provider.

    But I have one Problem because I have to ensure that exact

    the amount of codes is generate.

    You understand.

  • I can possibly write a set-based alternative, but it's not going to perform much better. The function will still have to be called a couple million times and that's going to cost.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I have a gut-feel that this is not the best approach. What uses these codes and why can a code not be generated when it's needed rather than a few million getting generated upfront?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Scalar udfs are generally quite slow and should be avoided.

    Try to isolated the issue , what is performance like if you make it a sequential count ?

    Also i notice that code is used is the clustered primary key , it could be page splitting on the inserts ?



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (7/2/2009)


    Scalar udfs are generally quite slow and should be avoided.

    Try to isolated the issue , what is performance like if you make it a sequential count ?

    Also i notice that code is used is the clustered primary key , it could be page splitting on the inserts ?

    Ouch, you can pretty much be sure there are page splits happening in a million row insert.

    Like Gail has said, without knowing what the code is in the function there really isn't a way to make it faster because the UDF will still run a million times.

    Can you at least give us some pseudocode for the UDF like:

    Take the second parameter passed in

    Loop through it by character, multiplying the the ASCII Code by the first parameter and if > 255 substract 255 and place that character in the 10 character string.

  • You said you can't post the code for the UDF, but could you post code for the UDF with a faked process? It just has to show us how it processes the data passed without showing us the actual details. I hope that makes some sense.

  • scziege (7/2/2009)


    The stored procedure calls a user-defined function which generates the code. But this is not

    the reason why the performance is so bad.

    I would encourage you to test that. The script below will allow you to see how expensive the function call is. It is set up to call a test function that doubles a number. The function is executed one million times per run, and there are five runs. You get summary performance statistics at the end. There are comments to explain what it is doing, and where to plug your private function in.

    Paul

    use tempdb;

    go

    -- Holds numbers 1 to 1 million

    createtable [#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446]

    (

    NBIGINT PRIMARY KEY CLUSTERED,

    );

    -- Generate numbers

    withNumbers (N)

    as(

    selecttop (1000000)

    ROW_NUMBER() over (order by (select null))

    frommaster.sys.columns C1, master.sys.columns C2, master.sys.columns C3

    )

    insert[#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446] with (tablockx) (N)

    selectN

    fromNumbers;

    go

    -- Test function - just doubles the input

    create function dbo.[sfn_7125CD1E-CF5A-4386-B82C-CF52AC756A71] (@number bigint) returns bigint with schemabinding as begin return @number * 2; end;

    go

    -- Clear ad-hoc sql plans (warning: server-wide, run on test only!)

    dbcc freesystemcache(N'SQL Plans');

    go

    -- Run function one million times - substitute your function here!

    declare@Bitbucket BIGINT;

    select@Bitbucket = dbo.[sfn_7125CD1E-CF5A-4386-B82C-CF52AC756A71] (N)

    from[#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446];

    go 5

    -- Show test results

    selectstatement_text = SUBSTRING([text], statement_start_offset / 2, (case statement_end_offset when -1 then DATALENGTH(text) else statement_end_offset end) / 2),

    sample_size = execution_count,

    [avg elapsed time µs] = total_elapsed_time / execution_count,

    [avg elapsed time sec] = CONVERT(DEC(9, 3), total_elapsed_time / execution_count / 1000000.0),

    [avg cpu time µs] = total_worker_time / execution_count,

    [avg cpu time sec] = CONVERT(DEC(9, 3), total_worker_time / execution_count / 1000000.0),

    [avg logical reads] = total_logical_reads / execution_count

    fromsys.dm_exec_query_stats as qs

    cross

    applysys.dm_exec_sql_text (qs.[sql_handle]) as st

    wherest.[text] like '%@BitBucket%[#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446]%'

    andst.[text] not like '%dm_exec_query_stats%';

    go

    -- Tidy up

    drop table [#418BC2CC-48A9-4BCF-9E18-B8FB7BE9D446];

    drop function dbo.[sfn_7125CD1E-CF5A-4386-B82C-CF52AC756A71];

    go

  • scziege (7/2/2009)


    In addition I add the execution plan as txt file. I think the most performance is consumed by the insert statement. But take a look and give me your feedback.

    You should disregard the estimated cost percentages when looking at plans - they are wildly inaccurate much of the time (for good reasons).

    For example, the call to your function ([Expr1000] = Scalar Operator([NestleDECodes].[dbo].[RNGCharacterMask]((8),N'34679CDFGHJKLNPRTVXY'))) is costed at 0.0000001 units (0%) since the server has no way to know how expensive the function call is in reality.

    Anyhow, as I hope you will now be able to test for yourself, the function call is probably over 99% of the cost of a real execution run.

    A few small points:

    IF NOT EXISTS (SELECT * FROM dbo.Codes WHERE code = @tempcode)

    ...is frequently more efficient than...

    IF (SELECT count(code) FROM codes WHERE code = @tempcode) = 0

    You should be careful to match data types. That lookup on code = @tempcode is less efficient than it could be due to an implicit conversion:

    CONVERT_IMPLICIT(nchar(10),[NestleDECodes].[dbo].[Codes].[Code],0)=[@tempcode]

    (The Code column is defined as CHAR(10) not NCHAR(10))

    The SET NOCOUNT OFF at the end of the procedure is pointless. SET NOCOUNT reverts to the setting in the outer scope when the procedure returns anyway.

    Paul

  • Paul White (7/2/2009)


    You should disregard the estimated cost percentages when looking at plans - they are wildly inaccurate much of the time (for good reasons).

    And they are always wildly inaccurate when there are scalar UDFs involved, because SQL estimates 0% for the udf execution, regardless of what it does.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 50 total)

You must be logged in to reply to this topic. Login to reply