How to handle very large dataset

  • NineIron

    SSChampion

    Points: 12515

    I need to find the most recent post date for all the invoices in my table. There are millions of records. Any thoughts?

    select INV_NUM,

    POST_DT

    from (

    select INV_NUM,

    POST_DT,

    row_number() over(partition by INV_NUM order by POST_DT desc) as RowNum

    from IDX_INCOME

    where GRP__2=‘7’

    ) b

    where b.RowNum=1

    and INV_NUM=‘0’

  • Joe Torre

    SSChampion

    Points: 10222

    Without you providing a create table statement and sample data in the form of inserts very generically:

    select non_agregated_column, max(date_column) MaxDate 
    from tbl
    group by non_agregated_column;

  • NineIron

    SSChampion

    Points: 12515

    I know how to find the most recent records, see my query, but it’s wicked slow because of the number of records. I would have to send a few million rows of sample data in order to get the same impact.

  • Phil Parkin

    SSC Guru

    Points: 243186

    NineIron - Monday, February 26, 2018 10:30 AM

    I know how to find the most recent records, see my query, but it's wicked slow because of the number of records. I would have to send a few million rows of sample data in order to get the same impact.

    Provide an actual execution plan, please.


    Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this [/url]link.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

    Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.

  • NineIron

    SSChampion

    Points: 12515

    Pardon my ignorance but, how do I copy then paste the execution plan?

  • Phil Parkin

    SSC Guru

    Points: 243186

    NineIron - Monday, February 26, 2018 11:24 AM

    Pardon my ignorance but, how do I copy then paste the execution plan?

    Right click / Save Execution Plan As … pick your filename & then attach.


    Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this [/url]link.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

    Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.

  • ChrisM@home

    SSC-Insane

    Points: 24260

    The optimiser will do this anyway but simplify your query and you clarify the index requirements:

    SELECT MAX(POST_DT)  
    FROM
    IDX_INCOME  
    WHERE
    GRP__2 =
    ‘7’
       AND INV_NUM =
    ‘0’

    If you don’t already have an index on GRP__2 and INV_NUM which also includes POST_DT in the KEY or INCLUDE part, then you might need one.

      

     


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • NineIron

    SSChampion

    Points: 12515

    See attached.

  • Jeff Moden

    SSC Guru

    Points: 993381

    NineIron - Monday, February 26, 2018 12:00 PM

    See attached.

    The query in that execution plan is quite a bit different than what you posted.

    In the execution plan, you’re doing a convert on the invoice date.  What is the datatype of that date column?

    You’re also searching for an invoice balance of ‘0’, which is a string rather than a numeric so please identify the datatype of the invoice column, as well.

    The only thing that may make this faster is an index on the WHERE criteria and, even then, it may result in an index scan simply because it needs to a scan to enumerate the rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • ChrisM@home

    SSC-Insane

    Points: 24260

    NineIron - Monday, February 26, 2018 12:00 PM

    See attached.

    “idx_income” is an odd name for a heap! Why don’t you have a clustered index? What’s the purpose of this table? What’s its daily/weekly cycle of changes?
    And what Jeff said too – this is wildly different from your trivial original query.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • NineIron

    SSChampion

    Points: 12515

    I appologize for the confusion. I’m trying to get some financial data to tie out and I can’t get out of this rat hole. The data is indexed on MRN, INV_NUM, and POST_DT. The data types on all of the columns is nvarchar(255). It’s a pain to work with this table but, that’s what I’m stuck with.
    Thanx for your help. I’m going to schedule this stuff to run off hours so, the time it takes won’t impact the user.

  • Jackie Lowery

    Default port

    Points: 1416

    I’m a bit of a noob, but couln’t he just create computed columns that convert the data to proper types, then create an index on and query on those computed columns?

    Technet article:  https://technet.microsoft.com/en-us/library/ms191250(v=sql.105).aspx

  • MMartin1

    One Orange Chip

    Points: 27375

    Any chance that in your script you can insert the contents of that generic table into a temp table of your creation, with proper data types and indexes that you create? If your data is not huge and you do it in on the same machine then that may be a good approach.

    ----------------------------------------------------
    How to post forum questions to get the best help [/url]

  • NineIron

    SSChampion

    Points: 12515

    I got some more information from the owner of the table and was able to reduce the number of records and stick them in a temporary table. Then, join the temp table to the other stuff. Now, seconds instead of 3 minutes. Thanx for all the input.

  • Jeff Moden

    SSC Guru

    Points: 993381

    NineIron - Thursday, March 1, 2018 4:07 AM

    I got some more information from the owner of the table and was able to reduce the number of records and stick them in a temporary table. Then, join the temp table to the other stuff. Now, seconds instead of 3 minutes. Thanx for all the input.

    Thanks for the feedback.  “Pre-aggregation” and “Divide’n’Conquer” are frequently all that’s needed to make monsters behave.  People make the mistake of thinking that “Set Based” means “All in one query” and nothing could be further from the truth.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply