Row_Number() in SSIS?

  • Charmer (8/14/2012)


    Matt Miller (#4) (8/14/2012)[/b

    How much of a drag does the ROW_NUMBER() aspect put on query execution? It sounds like you already have a fairly hefty process.

    i am not able to get what exactly you are asking Matt....i mean this sentence "How much of a drag does the ROW_NUMBER() aspect put on query execution?"...could you brief me , please?

    Meaning - how much faster is your query if you simply comment out the ROW_NUMBER() OVER ... line. In the grand scheme of things I don't think that's the "big" thing slowing you down (given the other items slowing down your query).

    If turning it off doesn't yield a noticeable improvement, I'd consider NOT turning it off unless there is some other compelling reason you want to do the numbering outside of SSIS. The script component is really intended in SSIS to handle those scenarios where the source doesn't have a way to generate the sequencing for you: you happen to have one who can so be sure to keep the most efficient option.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Lynn Pettis (8/14/2012)


    Charmer (8/14/2012)


    Lynn Pettis (8/14/2012)


    I'd like to see the query, tables and indexes, and the actual execution plan of the query. Never know, there may be some tuning that can be done.

    with t as (

    select

    C.IncidentIDRef AS incidentidref

    ,A.Location,C.INADR#,A.GASNAM

    , ROW_NUMBER() over (PARTITION by sc.Code, C.IncidentIDRef order by ltrim(rtrim(A.GAINCD)) desc) AS ADDRSEQNO

    , GASTR# AS GASTR#

    , b.Code AS GASDIR

    , D.Code AS GASSUF

    , H.Code AS GASDR2

    , GASQLF AS GASQLF

    , E.Code AS GACDIR

    , GACNAM AS GACNAM

    , G.Code AS GACSUF

    , sc.Code

    from PLINCD C

    JOIN SrcCodeMap sc

    on LTRIM(RTRIM(INORI#)) = sc.SrcCodeValueRef and sc.CodeID = 100

    left outer JOIN PSGADR A

    ON INADR# = A.GAADR# and (ltrim(rtrim(GAINCD)) = 'y' or LTRIM(rtrim(gacinc)) = 'y')

    join GloucMidPolice.dbo.SrcIncident si

    on C.ININC# = si.IncidentIDRef and sc.Code = si.JID)

    Select

    IncidentIDRef

    , ADDRSEQNO

    , isnull(Location, 'Address: ' + cast(INADR# as varchar) + ' does not exist in Database') Location

    , LTRIM(RTRIM(GASTR#))

    , LTRIM(RTRIM(GASDIR))

    , LEFT(LTRIM(RTRIM(COALESCE(GASNAM,LOCATION,'No Address Available'))),30) as GASNAM

    , LTRIM(RTRIM(GASSUF))

    , LTRIM(RTRIM(GASDR2))

    , LTRIM(RTRIM(GASQLF))

    , LTRIM(RTRIM(GACDIR))

    , LTRIM(RTRIM(GACNAM))

    , LTRIM(RTRIM(GACSUF))

    , t.Code

    from t where addrseqno = 1

    No index has been created so far...

    Indexes won't help your query as it is. You are using ltrim(rtrim()) function pairs on the columns in your join criteria which is forcing SQL Server to do table/clustered index scans on those tables.

    The options here are to 1) clean up the source data so that the ltrim(rtrim()) of the columns is not needed, or 2) load the data from the source tables into temporary (staging) tables cleaning up the data as you do that so that proper indexing of the staging tables can benefit the query.

    The problem is that most of the columns from the source database is declared as char datatype.....so even if i cleared the source records, eventually it will be having the respective spaces since it is declared as char datatype....so that's why i use TRIM during pushing the records into my staging table....

    i knew that TRIM in SQL might reduce the performance and even it was once told to me only by you, Lynn....

    but i don't know how to handle this situation...

    Thanks,
    Charmer

  • If you can't clean the data in the source tables prior to using it, stage the data in temporary tables, cleaning the data as you load it. After loading the data into staging tables, make sure those tables are properly indexed for your query and pull the data from there without using all those ltrim(rtrim())'s.

  • Lynn Pettis (8/17/2012)


    If you can't clean the data in the source tables prior to using it, stage the data in temporary tables, cleaning the data as you load it. After loading the data into staging tables, make sure those tables are properly indexed for your query and pull the data from there without using all those ltrim(rtrim())'s.

    Actually Lynn, i am trying to push into my staging table only....but pushing into staging table itself got difficulties....tha's why i came to you and to you all

    Thanks,
    Charmer

  • If you mean that your query is loading a staging table, that isn't what I am talking about. I am talking about copying the data from the source tables to staging tables that mirror the source tables. The difference being that you perform the cleanup on the data as you move it. Then your query uses those tables, properly indexed.

  • Lynn Pettis (8/17/2012)


    If you mean that your query is loading a staging table, that isn't what I am talking about. I am talking about copying the data from the source tables to staging tables that mirror the source tables. The difference being that you perform the cleanup on the data as you move it. Then your query uses those tables, properly indexed.

    Lynn, you mean mirroring the whole database and modify as like as we want and get used to work with that..?

    and fYI, this is not only the query that i am getting struggled...so i want to mirror more or like whole database....but it is around 100 GB of size....

    Thanks,
    Charmer

  • Charmer (8/17/2012)


    Lynn Pettis (8/17/2012)


    If you mean that your query is loading a staging table, that isn't what I am talking about. I am talking about copying the data from the source tables to staging tables that mirror the source tables. The difference being that you perform the cleanup on the data as you move it. Then your query uses those tables, properly indexed.

    Lynn, you mean mirroring the whole database and modify as like as we want and get used to work with that..?

    No, not mirroring the database. If I counted correctly your query uses 4 tables. Create 4 staging tables that are exact copies of the those source tables including column names, data types, etc. You then copy the data from the source tables to these staging tables cleaning up the data so that you don't need the LTRIM(RTRIM()) pairs in your main query.

    Make sure that you properly index these staging tables to maximize performnace for your query and have your query read from those tables instead of the original source tables.

  • Lynn Pettis (8/17/2012)


    Charmer (8/17/2012)


    Lynn Pettis (8/17/2012)


    If you mean that your query is loading a staging table, that isn't what I am talking about. I am talking about copying the data from the source tables to staging tables that mirror the source tables. The difference being that you perform the cleanup on the data as you move it. Then your query uses those tables, properly indexed.

    Lynn, you mean mirroring the whole database and modify as like as we want and get used to work with that..?

    No, not mirroring the database. If I counted correctly your query uses 4 tables. Create 4 staging tables that are exact copies of the those source tables including column names, data types, etc. You then copy the data from the source tables to these staging tables cleaning up the data so that you don't need the LTRIM(RTRIM()) pairs in your main query.

    Make sure that you properly index these staging tables to maximize performnace for your query and have your query read from those tables instead of the original source tables.

    Lynn, i am just curious..... In case if we use exact copies of data types.....still we will not be able to clear the unwanted spaces right? because most of the columns are declared as char datatype....that is the main reason i am suffering with trims when i drag the data into staging table...

    Thanks,
    Charmer

  • Charmer (8/17/2012)


    Lynn Pettis (8/17/2012)


    Charmer (8/17/2012)


    Lynn Pettis (8/17/2012)


    If you mean that your query is loading a staging table, that isn't what I am talking about. I am talking about copying the data from the source tables to staging tables that mirror the source tables. The difference being that you perform the cleanup on the data as you move it. Then your query uses those tables, properly indexed.

    Lynn, you mean mirroring the whole database and modify as like as we want and get used to work with that..?

    No, not mirroring the database. If I counted correctly your query uses 4 tables. Create 4 staging tables that are exact copies of the those source tables including column names, data types, etc. You then copy the data from the source tables to these staging tables cleaning up the data so that you don't need the LTRIM(RTRIM()) pairs in your main query.

    Make sure that you properly index these staging tables to maximize performnace for your query and have your query read from those tables instead of the original source tables.

    Lynn, i am just curious..... In case if we use exact copies of data types.....still we will not be able to clear the unwanted spaces right? because most of the columns are declared as char datatype....that is the main reason i am suffering with trims when i drag the data into staging table...

    I am sorry, but what part of cleaning the data from source to stage are you missing? While copying the data you remove the leading and trailing spaces from the data. This means when the data lands in the staging tables, you no longer have to worry about the leading and trailing spaces.

  • Lynn, what about trailing spaces.....? they are char data type .....so if data is less than the exact length, automatically the reamining length will occupied with empty spaces right? ...then i must use trim...don't i have to?

    Thanks,
    Charmer

  • Charmer (8/17/2012)


    Lynn, what about trailing spaces.....? they are char data type .....so if data is less than the exact length, automatically the reamining length will occupied with empty spaces right? ...then i must use trim...don't i have to?

    If that is a concern, then use varchar/nvarchar data types in the staging tables. Not a big deal unless the trailing spaces are significant for any reason.

  • Lynn Pettis (8/17/2012)


    Charmer (8/17/2012)


    Lynn, what about trailing spaces.....? they are char data type .....so if data is less than the exact length, automatically the reamining length will occupied with empty spaces right? ...then i must use trim...don't i have to?

    If that is a concern, then use varchar/nvarchar data types in the staging tables. Not a big deal unless the trailing spaces are significant for any reason.

    yes Lynn, that's what i am doing....since source is char datatype, i use Trim and pushing into staging table having varchar data type....

    Thanks,
    Charmer

  • Hi

    I have blogged about ranking over partitions in SSIS here (using a code transform)

    http://paultebraak.wordpress.com/2013/02/25/rank-partitioning-in-etl-using-ssis/

    Regards,

    Paul

  • ptebraak (12/12/2013)


    Hi

    I have blogged about ranking over partitions in SSIS here (using a code transform)

    http://paultebraak.wordpress.com/2013/02/25/rank-partitioning-in-etl-using-ssis/

    Regards,

    Paul

    Thank you ptebraak....I will check this one

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply