Row number with Pivot

  • Hi all,

    Need some expert help!. Have a pivot that I use to transpose rows to columns (dynamic pivot). Works fine except performance with large datasets sucks.

    The table (tbl_forms_data below) has over 1.3M rows that I need to pivot on which is so slow it's timing out all the time. I thought using row_number would only fetch a limited dataset but it doesn't make any difference to the performance.

    I think the inner select/pivot is trawling through the entire 1M rows and hence the poor performance. The query is below (obviously trimmed for clarity).

    thanks

    -----------------

    select * from(select *

    , ROW_NUMBER() OVER (ORDER BY formdate desc) rowid

    from

    (select

    tf.formid,

    formdate,

    answer,

    'field-' + cast(tff.filtertype as varchar(10)) + '-' + cast(tff.id as varchar(10)) as columnname

    from tbl_forms tf

    inner join tbl_forms_data tfd on tf.formid = tfd.formid

    inner join tbl_forms_fields tff on ( tff.id = tfd.qid and tff.templateid = tf.templateid )

    where tf.templateid = 1234

    ) as x

    pivot ( max(answer) for columnname in ([field-0-1],[APPTIMI-0-2],

    [field-0-3],[field-0-4],[field-0-5],[field-0-6],

    [field-0-18])) p

    ) as forms

    WHERE rowid BETWEEN 1 and 50

  • Yes, you will be seeing that as the WHERE clause relates to the output of the whole PIVOT, rather than the data which is being pivoted. Put that WHERE clause into the source data for the PIVOT instead.

    Thomas Rushton
    blog: https://thelonedba.wordpress.com

  • As Thomas said. Something like this:

    select *

    from (

    SELECT

    tf.formid,

    ??.formdate,

    ??.answer,

    'field-' + cast(tff.filtertype as varchar(10)) + '-' + cast(tff.id as varchar(10)) as columnname

    FROM (

    SELECT *,

    rowid = ROW_NUMBER() OVER (ORDER BY formdate desc)

    FROM tbl_forms

    ) tf

    inner join tbl_forms_data tfd on tf.formid = tfd.formid

    inner join tbl_forms_fields tff on ( tff.id = tfd.qid and tff.templateid = tf.templateid )

    where tf.templateid = 1234

    AND tf.rowid BETWEEN 1 and 50

    ) as x

    pivot (

    max(answer) for columnname in ([field-0-1],[APPTIMI-0-2],[field-0-3],[field-0-4],[field-0-5],[field-0-6],[field-0-18])

    ) p

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris/Thomas

    Thanks for the replies.

    Chris - Understand your solution however the row id in it comes up in the 30K range (even if I select from 0 to 50. It's obviously taking the row_number from the source rows before the pivot and I can be pivoting 3 columns or 30 into each row.

    results from the query:

    Row Id Formid formdate etc....

    32433 4544 xxx

    32478 4665 xxx

    33104 5210 xxx

    Thanks

  • kevin 26531 (6/10/2016)


    Chris/Thomas

    Thanks for the replies.

    Chris - Understand your solution however the row id in it comes up in the 30K range (even if I select from 0 to 50. It's obviously taking the row_number from the source rows before the pivot and I can be pivoting 3 columns or 30 into each row.

    results from the query:

    Row Id Formid formdate etc....

    32433 4544 xxx

    32478 4665 xxx

    33104 5210 xxx

    Thanks

    Could be the placement of the filter. Using TOP() for convenience:

    FROM (

    SELECT TOP(50) *

    FROM tbl_forms

    WHERE templateid = 1234

    ORDER BY formdate desc

    ) tf

    If it isn't clear at this point, then I'd recommend posting up some readily-consumable sample data.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Ok been playing with this a lot this morning and I think I've fixed the issue with the row_number. Added a where templateid = xxx for the tbl_forms. Now at least it's producing the correct row number.

    Performance is way way better than before in that's not trying to pull in the complete dataset. Only issue I have now that I can no longer order by one of the dynamic fields as the ROW_NUMBER() OVER (ORDER BY [field-0-1] desc) spits out an error with Field-0-1 as an invalid column. I'm guessing it's because the pivot is not done at that stage and therefore the pivot is not available at this point.

    PS. Pivots really do hurt my brain.

    select *

    from (

    SELECT

    tf.formid,

    tf.formdate,

    tfd.answer,

    'field-' + cast(tff.filtertype as varchar(10)) + '-' + cast(tff.id as varchar(10)) as columnname

    FROM (

    SELECT *,

    rowid = ROW_NUMBER() OVER (ORDER BY formdate desc)

    FROM tbl_forms

    -----> where templateid = 1234 <------- This fixes the row number issue

    ) tf

    inner join tbl_forms_data tfd on tf.formid = tfd.formid

    inner join tbl_forms_fields tff on ( tff.id = tfd.qid and tff.templateid = tf.templateid )

    where tf.templateid = 1234

    AND tf.rowid BETWEEN 1 and 50

    ) as x

    pivot (

    max(answer) for columnname in ([field-0-1],[field-0-2],[field-0-3],[field-0-4],[field-0-5],[field-0-6],[field-0-18])

    ) p

  • If you don't like the PIVOT operator, you could use a cross tab query instead:

    SELECT TOP(50)

    tf.formid,

    tf.formdate,

    x.*

    FROM tbl_forms tf

    CROSS APPLY (

    SELECT

    [field-0-1] = MAX(CASE WHEN tff.filtertype = 0 AND tff.id = 1 THEN tfd.answer ELSE 0 END),

    [field-0-2] = MAX(CASE WHEN tff.filtertype = 0 AND tff.id = 2 THEN tfd.answer ELSE 0 END),

    [field-0-3] = MAX(CASE WHEN tff.filtertype = 0 AND tff.id = 3 THEN tfd.answer ELSE 0 END),

    [field-0-4] = MAX(CASE WHEN tff.filtertype = 0 AND tff.id = 4 THEN tfd.answer ELSE 0 END),

    [field-0-5] = MAX(CASE WHEN tff.filtertype = 0 AND tff.id = 5 THEN tfd.answer ELSE 0 END),

    [field-0-6] = MAX(CASE WHEN tff.filtertype = 0 AND tff.id = 6 THEN tfd.answer ELSE 0 END),

    [field-0-18] = MAX(CASE WHEN tff.filtertype = 0 AND tff.id = 18 THEN tfd.answer ELSE 0 END)

    FROM tbl_forms_data tfd

    INNER JOIN tbl_forms_fields tff

    ON tff.id = tfd.qid

    AND tff.templateid = tf.templateid -- outer reference

    WHERE tfd.formid = tf.formid -- outer reference

    ) x

    WHERE tf.templateid = 1234

    ORDER BY tf.formdate DESC

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks Chris but it's a dynamic query. There could be 10 columns (after the pivot) or 100 so I can't hardcode fields into the script.

    Had thought about just creating a table with the columns, tell the customer the max is 100 fields/columns and if they only use 10 then waste the other 90.

    thanks

  • kevin 26531 (6/10/2016)


    Thanks Chris but it's a dynamic query. There could be 10 columns (after the pivot) or 100 so I can't hardcode fields into the script.

    Had thought about just creating a table with the columns, tell the customer the max is 100 fields/columns and if they only use 10 then waste the other 90.

    thanks

    You can easily convert Chris' query into dynamic code.

    DECLARE @SQL nvarchar(MAX)

    SELECT @SQL =

    N'SELECT TOP(50)

    tf.formid,

    tf.formdate,

    x.*

    FROM tbl_forms tf

    CROSS APPLY (

    SELECT' + NCHAR(10) +

    STUFF(( SELECT CHAR(9) + ',[field-' + CAST( filtertype AS varchar(10))+ '-' + CAST( id AS varchar(10))

    + '] = MAX(CASE WHEN tff.filtertype = ' + CAST( filtertype AS varchar(10)) + ' AND tff.id = ' + CAST( id AS varchar(10))

    + ' THEN tfd.answer ELSE 0 END) ' + CHAR(10)

    FROM tbl_forms_fields

    FOR XML PATH(''), TYPE).value('./text()[1]', 'nvarchar(max)'), 2, 1, ' ') +

    N'FROM tbl_forms_data tfd

    INNER JOIN tbl_forms_fields tff

    ON tff.id = tfd.qid

    AND tff.templateid = tf.templateid -- outer reference

    WHERE tfd.formid = tf.formid -- outer reference

    ) x

    WHERE tf.templateid = 1234

    ORDER BY tf.formdate DESC; '

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Nice one, Luis 😎

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • For some reason I can't get this to pivot the results - It runs but just lists the data in a column. I'll play with it and see where it takes me.

    Thanks everyone for the help. Learnt a lot.

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply