A quick search without an index

  • Alex Chamchourine

    Right there with Babe

    Points: 783

    Comments posted to this topic are about the item A quick search without an index

  • heb1014

    Hall of Fame

    Points: 3788

    Thanks for sharing!  A successful developer is one who is skilled in the requisite tools AND is confident is his/her creative ability.  A developer is a creator.

     

  • Jeff Moden

    SSC Guru

    Points: 995608

    5 Stars for realizing that you were stuck in a box and then for thinking outside the box.  I wonder how many people would have actually thought of using a Binary Search like this? Nice job!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jbowers

    Old Hand

    Points: 363

    Brilliant - thanks for sharing! That's a great example of creatively using an existing index rather than adding another index to the table. I'll definitely add the binary search technique to my SQL toolkit.

  • skeleton567

    SSCertifiable

    Points: 5002

    Alex, just a couple things from an old-timer for you to consider.

    First, if you are serious about that number of rows, you need to fix that.  What are you doing, documenting the stars in the Milky Way?  In my active years as a DBA I was always advocating for archiving data.  You need to consider how old the history is and how often you need to access it, versus the time and cost to store it online, back it up,  etc.

    At least be sure it is NOT in a database with transactional data, preferably on a different server.  Better, move anything more than the last year or so to something like a dis-mountable NAS drive and put it 'in the basement'.  Break the data up into unique tables such as by year,  get it organized and indexed and ready to access.  Create summary data which might even suffice for lots of demands. On the rare occasion you need to access it, you can plug it in and query it at will with simple SQL code, instead of spending the time doing complex stuff just to look at it.  In the old days of removable 'disk packs', we had archive disks that we could drop in and fire up as needed.  Today I use a NAS device with internal drives that idle down when not active, and I'm thinking about getting a second one.  I can archive my data to that, and it's great for backups too so they're not taking space on my active server.

    It very easy for companies to get carried away with demands for accessible data, and it's up us to figure out how it gets done.  Those of us who have been around for decades had to learn all the tricks because budgets were lots smaller then, both for hardware and personnel.  Even back when we had to use 1/2 inch magnetic tape, we did weekly tape-to-tape merges of detail into archived data and only kept summary data online.

    We even traded storage with other companies close by where we kept copies of our data archives for security and protection from disasters.  Was that the original version of 'the cloud'?  Before that, several of us would actually carry home backups and archives for off-site storage.

    I have applications that use historical data that is as old as 35 years, with a small quantity of data going all the way back to 1944 but I only keep the most recent 5 or 6 years on my SQL Server, and the rest is on a RAID NAS device that I can access if and when I need to.

    So, instead of indexing, consider summarizing and archiving.

    Rick

    I can see clearly now....it's 2020!

  • Jeff Moden

    SSC Guru

    Points: 995608

    BTW... You already know this but you didn't write it quite right in the article.  To determine the max number of iterations in a binary search, you have to do a bit more that just taking the LOG of the number of values.  The full formula is CEILING(LOG(N)/LOG(2)) where "N" is the number of rows in a gap-less situation or MAX(ID)-MIN(ID)+1 for something like an IDENTITY column.

    Additional kudos for explaining that the IDENTITY column is probably not gap-less and what you did to compensate.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 995608

    Just a bit of caution for people reading this... if your DATETIME column only contains whole dates (no times) and you tell it to search for a date with no time or a 00:00:00.000 time, it'll find the correct ID.  If, however, you give it any time larger than 12:00:00.000, it'll find the next day.  I've not tested if you tell it to find a whole date that's not there but it seems like it will find whatever row contains the value closest to the target value.  That means that it might give you some extra rows you hadn't anticipated and that's my point...  once you've used a script like this, you need to double check the data you retrieve.

    None of this is a fault of this code.  It's must one of the devils in the data that you have to watch for.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ghislain

    Grasshopper

    Points: 17

    I believe the problem with time larger than 12:00:00.000 that picks the next day is because of the last part of the script that selects the closest neighbor.

  • Jeff Moden

    SSC Guru

    Points: 995608

    Yep... I got that.  It's not a problem if you know about it.  I just wanted to be sure that people knew about it.  Perhaps stripping the time would eliminate that problem if one were looking for whole date ranges, which I believe would be the case for the original task presented in the article.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply