Weird Sorting based on csv requirement

  • Hi sscians, I have got a weird requirement from my client. Although I have figured out a way to do it, I would like to know if there is a way in which this can be done faster.

    I have simplified the problem and removed other factors at play. This exercise has to be done for around half a million rows and the output would have around 500 rows.

    -Record table has recordid and recordvalue

    -Recordid could have multiple recordvalues

    -Need to get output as recordid (col1) and corresponding recordvalues as csv (col2)

    -The csv (col2) has to be sorted within and the output should be sorted based on col2

    -Need a row number based on this sorted output. Multiple instances of this sort should retain the same order.

    --================================================

    --Problem:

    --0. Record table has recordid and recordvalue

    --1. Recordid could have multiple recordvalues

    --2. Need to get output as recordid (col1) and corresponding recordvalues as csv (col2)

    --3. The csv (col2) has to be sorted within and the output should be sorted based on col2

    --4. Need a row number based on this sorted output. Multiple instances of this sort should retain the same order.

    --================================================

    if object_id('tempdb..#test') is not null drop table #test

    create table #test(recordid int, recordvalue char(2))

    --================================================

    insert into #test

    select 1, 'b6'

    union

    select 1, 'a4'

    union

    select 1, 'c1'

    union

    select 2, 'a2'

    union

    select 2, 'b7'

    union

    select 3, 'a1'

    union

    select 3, 'b3'

    select * from #test

    --================================================

    if object_id('tempdb..#testresult') is not null drop table #testresult

    create table #testresult(recordid int, recordvalue varchar(50), rownum int identity(1,1))

    --================================================

    --This is where I do the churning. Is there a better way to to this??

    insert into #testresult

    select recordid

    ,stuff((select ', ' + recordvalue from #test where recordid = t.recordid order by recordvalue for xml path('')),1,2,'') [value]

    from #test t

    group by recordid

    order by [value], recordid

    select * from #testresult

    --================================================

    drop table #test

    drop table #testresult

    --================================================

    Thanks in advance.

    -- arjun

    https://sqlroadie.com/

  • Arjun

    I think I'd write a simple SSIS package. There are tasks in SSIS that are specially designed to manipulate text files.

    John

  • John, I am not reading from text files. Also, SSIS is out of the scope. This is for sort of a search engine and all data is in DB.

    - arjun

    https://sqlroadie.com/

  • I tend to use FOR XML to concatenate values together. In some cases, you can also use the quirky update.

    Here is the FOR XML version.

    if object_id('tempdb..#test') is not null drop table #test

    create table #test(recordid int, recordvalue char(2))

    --================================================

    insert into #test

    select 1, 'b6'

    union

    select 1, 'a4'

    union

    select 1, 'c1'

    union

    select 2, 'a2'

    union

    select 2, 'b7'

    union

    select 3, 'a1'

    union

    select 3, 'b3'

    WITH Records AS (

    SELECT DISTINCT

    RecordID

    , ( SELECT

    CASE WHEN RecordValue = Min(RecordValue) OVER( PARTITION BY RecordID ) THEN '' ELSE ', ' END

    + RecordValue

    FROM #Test as sub

    WHERE sub.RecordID = main.RecordID

    ORDER BY RecordValue

    FOR XML PATH('')

    ) AS RecordValues

    FROM #Test AS main

    )

    SELECT RecordID, RecordValues, Row_Number() OVER(ORDER BY RecordValues)

    FROM Records

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks Drew. I will compare the execution times and post the results.

    - arjun

    https://sqlroadie.com/

  • Actually, I was thinking about this last night, and because of the number of records you may want to do batch the processing. Here's one example. You would adjust the TOP PERCENT to process more/fewer records. You may want to SET ROWCOUNT instead of using a TOP PERCENT.

    if object_id('tempdb..#test') is not null drop table #test

    create table #test(recordid int, recordvalue char(2))

    --================================================

    insert into #test

    select 1, 'b6'

    union

    select 1, 'a4'

    union

    select 1, 'c1'

    union

    select 2, 'a2'

    union

    select 2, 'b7'

    union

    select 3, 'a1'

    union

    select 3, 'b3'

    DECLARE @test-2 TABLE (

    RecordID int PRIMARY KEY CLUSTERED

    , RecordValues varchar(50) NULL

    );

    INSERT @test-2(RecordID)

    SELECT DISTINCT RecordID

    FROM #test

    WHILE @@ROWCOUNT > 0

    UPDATE main

    SET RecordValues = (

    SELECT

    CASE WHEN RecordValue = Min(RecordValue) OVER( PARTITION BY RecordID ) THEN '' ELSE ', ' END

    + RecordValue

    FROM #Test as sub

    WHERE sub.RecordID = main.RecordID

    ORDER BY RecordValue

    FOR XML PATH('')

    )

    FROM @test-2 AS main

    WHERE RecordID IN (

    SELECT TOP 20 PERCENT RecordID

    FROM @test-2

    WHERE RecordValues Is Null

    )

    SELECT RecordID, RecordValues, Row_Number() OVER(ORDER BY RecordValues)

    FROM @test-2

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (8/27/2010)


    You may want to SET ROWCOUNT instead of using a TOP PERCENT.

    SET ROWCOUNT is on the deprecated list. Use TOP (n) instead.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (8/27/2010)


    drew.allen (8/27/2010)


    You may want to SET ROWCOUNT instead of using a TOP PERCENT.

    SET ROWCOUNT is on the deprecated list. Use TOP (n) instead.

    I only suggested SET ROWCOUNT because I didn't want to use a subquery to specify the TOP clause and I got an error message when I put the TOP clause in the main UPDATE query. It turns out that the parens are required in the UPDATE clause and I had left them off.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Drew, thanks for the inputs. If you check my initial post, I had also used for xml to do the concatenation. So, there isn't much difference between your query and mine, except for the row_number part.

    I can't use batch processing as I need the entire set of recordids at one go. This is only a small part of the query and I have do some other operations with this result. Anyway, nice to hear from you.

    Thanks Wayne. Point noted.

    - arjun

    https://sqlroadie.com/

  • Limit the number of RecordID to do the correlated subquery for.

    As of now, you are grouping later and subquery first, for all RecordID

    INSERT#TestResult

    SELECTr.RecordID,

    STUFF(f.Value, 1, 2, '') AS Value

    FROM(

    SELECTRecordID

    FROM#Test

    GROUP BYRecordID

    ) AS t

    CROSS APPLY(

    SELECT', ' + x.RecordValue

    FROM#Test AS x

    WHEREx.RecordID = t.RecordID

    ORDER BY', ' + x.RecordValue

    FOR XMLPATH('')

    ) AS f(Value)


    N 56°04'39.16"
    E 12°55'05.25"

  • Arjun Sivadasan (8/30/2010)


    I can't use batch processing as I need the entire set of recordids at one go. This is only a small part of the query and I have do some other operations with this result. Anyway, nice to hear from you.

    This doesn't prevent you from using batch processing, it just limits the scope of the batch. I was suggesting using batches in this one particular step of the overall process, not running the whole process on small batches.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Ok drew, somehow, I don't think I can do that. I will explain the simplified logic of the query.

    There are records, and records consist of fields. Each record will have different values for the fields.

    User can pick a field from a field-list and perform a search.

    The logic of the query is:

    1. Find the correct version of records based on date and other parameters.

    2. Sort the recordids based on the field selected by the user. --This is the part i had posted.

    3. Generate a row number according to the sort.

    4. Based on the page accessed by the user, filter for the row number range.

    5. Get all fields and other info related to the filtered records.

    So, as you can see, I need the entire list of records to generate the row number. If I take only a percent of the effective records, my row numbering will not be the same.

    If my understanding is wrong, please correct me; but I think I cannot limit the scope of the batch in step 2.

    - arjun

    https://sqlroadie.com/

  • Arjun Sivadasan (8/31/2010)


    So, as you can see, I need the entire list of records to generate the row number. If I take only a percent of the effective records, my row numbering will not be the same.

    Look back at the post where I suggested using batches and you will see that I have already addressed this concern. The WHILE loop batches the records to process your second step, and the last SELECT statement includes all of the records to get the row numbers for your third step. So it is indeed possible to use batches in one step and then use the entire set in a subsequent step.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Yes, I get the point now. Sorry for acting like a dumbo. I will try it out and post the results. Thanks for being patient Drew.

    - arjun

    https://sqlroadie.com/

  • Hi SwePeso, I will try the query out and let you know if I am able to get better performance out of it. Thanks for the reply.

    - arjun

    https://sqlroadie.com/

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply