How to use a CSV as parameter in a query

  • Henk Schreij (11/10/2016)


    Are you able to change the parameter to a table-parameter?

    I wish 🙂

    I get the parameters from a outsite source: the people who build the website.

    I have some influence on what they pass on, but they came with the solution of a comma separated list of integers (or a space separated list, xml list, etc.).

    I'll give it a try again.

    @celko/jeff:

    I tried the xml solution (I transformed the csv in xml), the performance is weird:

    98 % is for the xml reader, 1% for the xml path filter and 1% for the sql excecution (a join of 10 small tables).

    The total execution time is 660 msec (the first time with a new combination of parameters) or 15 msec (when parameters used again).

    Not so happy with it.

    It's the company policy that the DB is used for storing the tables (with FK's including cascades, defaults). But no stored procedures or functions.

    I'll have a look, if they will make an exception for the SplitDelimiter function: much better performance I expect.

    @celko: I have read your articles. Very good at explaining and well readable, but ...

    I feel a bit bullied about the way you describe people who try to unfold a csv string.

    I'm not so stupid as you write there, know something about normalization.

    If you are never getting more than 5-6 values, then I recommend not to use XML but rather the DelimiterSplit8K or the logic from that function as Alan B demonstrated previously. The main reason for using XML is that it can handle sizes much greater than the DelimitedSplit8K function.

    😎

    There is another option , which is to use 0 padded fixed length entries, i.e.

    0000100002000030000400005

    which requires far less iterations for splitting and therefore are much much faster than any other splitting methods, would that be an option?

  • Further on the fixed width method, here is an example, 100 x more efficient than XML and roughly 4-8 x more efficient than DelimitedSplit8K.

    😎

    I have proven many times that generic functions and algorithms are slower than purposely build ones, this scenario is a good example of that rule.

    DECLARE @PARAMSTR VARCHAR(8000) = '0001000200030004000500060007000800090010';

    ;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0)) X(N))

    , NUMS(N) AS

    ( SELECT 0 UNION ALL

    SELECT TOP((LEN(@PARAMSTR) / 4) - 1) ROW_NUMBER() OVER (ORDER BY @@VERSION) * 4 FROM T T1,T T2,T T3

    )

    SELECT

    CONVERT(INT,SUBSTRING(@PARAMSTR,NM.N + 1,4),0)

    FROM NUMS NM;

  • Henk Schreij (11/10/2016)


    It's the company policy that the DB is used for storing the tables (with FK's including cascades, defaults). But no stored procedures or functions.

    Yowch!. I did some part time work for a company like that once. They were plagued by performance problems and every time I told them which query it was coming from the GUI, they said it was the ORM and couldn't do anything about it. They wallowed for a couple of years of customer complaints about having to buy monster hardware to support their app before finally seeing the light when it comes to stored procedures and the like.

    I tried the xml solution (I transformed the csv in xml), the performance is weird:

    98 % is for the xml reader, 1% for the xml path filter and 1% for the sql excecution (a join of 10 small tables).

    The total execution time is 660 msec (the first time with a new combination of parameters) or 15 msec (when parameters used again).

    Not so happy with it.

    That's been a lot like my experience with XML in general, as well.

    I'll have a look, if they will make an exception for the SplitDelimiter function: much better performance I expect.

    In most cases, it blows the doors off he XML method. Be aware that if you modify it to use one of the MAX datatypes, it will run twice as slow. If you need to split one of the MAX datatypes on a regular basis, I recommend that someone write a well written splitter in one of the .NET languages and turn it into a CLR function. Be advised, though, that a lot of well-meaning folks flub that pretty badly. They miss things like what leading, trailing, and adjacent delimiters are supposed to mean. To me, leading/trailing means starting/ending with an empty element and adjacent delimiters mean the element between them is also empty. A couple of ready-to-use functionality tests are included in the header of the DelimitedSplit8K function.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson (11/10/2016)


    Further on the fixed width method, here is an example, 100 x more efficient than XML and roughly 4-8 x more efficient than DelimitedSplit8K.

    😎

    I have proven many times that generic functions and algorithms are slower than purposely build ones, this scenario is a good example of that rule.

    Have regularly done both myself and 100% agree with both notions.

    I'll add that the fixed field format for files also forces people to do it right and, except for truncation on their end, is really easy to detect formation and transmission errors not to mention being nasty fast.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/10/2016)


    Eirikur Eiriksson (11/10/2016)


    Further on the fixed width method, here is an example, 100 x more efficient than XML and roughly 4-8 x more efficient than DelimitedSplit8K.

    😎

    I have proven many times that generic functions and algorithms are slower than purposely build ones, this scenario is a good example of that rule.

    Have regularly done both myself and 100% agree with both notions.

    I'll add that the fixed field format for files also forces people to do it right and, except for truncation on their end, is really easy to detect formation and transmission errors not to mention being nasty fast.

    Now I wonder where I got this habit from....;-)

    😎

    As always, the proof is in the pudding

    USE TEEST;

    GO

    SET NOCOUNT ON;

    DECLARE @PARAMSTR VARCHAR(8000) = REPLICATE(CONVERT(VARCHAR(8000),'0001000200030004000500060007000800090010',0),800);

    DECLARE @PARAMSTX VARCHAR(8000) = REPLICATE(CONVERT(VARCHAR(8000),'1,2,3,4,5,6,7,8,9,10,',0),800);

    DECLARE @PARAXML XML = REPLICATE(CONVERT(VARCHAR(8000),'<X>1</X><X>2</X><X>3</X><X>4</X><X>5</X>',0),800);

    DECLARE @INT_BUCKET INT = 0;

    DECLARE @CHR_BUCKET VARCHAR(8000) = '';

    DECLARE @timer TABLE (T_TEXT VARCHAR(100) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));

    INSERT INTO @timer(T_TEXT) VALUES ('FIXED');

    ;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0)) X(N))

    , NUMS(N) AS

    ( SELECT 0 UNION ALL

    SELECT TOP((LEN(@PARAMSTR) / 4) - 1) ROW_NUMBER() OVER (ORDER BY @@VERSION) * 4 FROM T T1,T T2,T T3

    )

    SELECT

    @INT_BUCKET = CONVERT(INT,SUBSTRING(@PARAMSTR,NM.N + 1,4),0)

    FROM NUMS NM;

    INSERT INTO @timer(T_TEXT) VALUES ('FIXED');

    INSERT INTO @timer(T_TEXT) VALUES ('DELIMITED');

    SELECT

    @CHR_BUCKET = X.Item

    FROM dbo.DelimitedSplit8K(@PARAMSTX,CHAR(44)) X;

    INSERT INTO @timer(T_TEXT) VALUES ('DELIMITED');

    INSERT INTO @timer(T_TEXT) VALUES ('XML');

    SELECT

    @INT_BUCKET = P.DATA.value('(./text())[1]','INT')

    FROM @PARAXML.nodes('/X') P(DATA);

    INSERT INTO @timer(T_TEXT) VALUES ('XML');

    SELECT

    T.T_TEXT

    ,DATEDIFF(MICROSECOND,MIN(T_TS),MAX(T.T_TS)) AS DURATION

    FROM @timer T

    GROUP BY T.T_TEXT

    ORDER BY DURATION;

    Results on an W10 Tablet (4 core Atom)

    T_TEXT DURATION

    ----------- ---------

    FIXED 1004

    XML 15629

    DELIMITED 108780

    Only have access to my tablet toy at the moment, would like to see the results of this test from a proper PC/Laptop as the toy is a cpu starved SD card storage thingy 🙂

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply