What is meant by set based programming?

  • Phil Parkin (8/14/2012)


    patrickmcginnis59 (8/14/2012)


    Shadab Shah (8/12/2012)


    Jeff Moden (8/10/2012)


    Shadab Shah (8/9/2012)


    Hi all,

    I have come across various section on this website where it is mention that SQL Server is a set based programming language.

    My understanding about it is that : Set based means that the operations are perform on a set of values but i am not able to relate this defination with operations being perform with SQL Server.

    I am from C language background and i don't see much difference in applying logic over here but still these are early days for me in SQL Server, so i think that i am missing something or unable to graps the concept.

    Perhaps the easiest way to make this clear is to use an actual example of code. If you're game, post the code that you would use if someone told you they simply wanted a single column result set from 1 to a million and we'll discuss whatever you come up with.

    Hi,

    Let me consider a example over here as

    Select subject1 + subject2 + subject3 As TotalMarks from Student where class= 'ComputerScience'

    As per my understanding this operation would be perform row by row. The first row with the class ComputerScience is selected and then the other and then it goes on. so does that means that the above SQL Statement does not represent what we called as set based operation.

    I think after you answer this question my understanding for set based programming would be more clear.Thanks

    One thing to keep in mind is that set based execution is not any faster than c based manuallly coded loops except of course that database engine programmers are probably better than the average c programmer. What does make a HUGE difference is how slow T-SQL the language interpreter is. If you were to compare a simple loop coded in T-SQL against pretty much any other language, its just slower to a tremendous degree. So a single line of T-SQL is going to be slow in any case, but if that single line of T-SQL results in an operation against many rows of data, the cost of that slow T-SQL can be amortized across a much greater amount of work.

    I'd like to see some evidence to back up these assertions.

    Perhaps something like this:

    1. Code up a single INSERT statement that inserts 1000 records in one set based operation.

    2. Split the code into 1000 INSERT statements that each insert 1 record at a time.

    3. Finally, write a loop that executes an INSERT 1,000 times (1 record each time).

    I've done 1 vs. 2 and the results are tremendously in favor of the single INSERT. I think this lends evidence of the overhead for each executed SQL statement.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Eugene Elutin (8/14/2012)


    ...

    That said - if I gave you a file or a table with 10,000,000 invoices and I needed to get a balance by customer, I guarantee you SQL server will tear the doors off of anything procedural you might come up with.

    I could probably beat it with assembler just by virtue of fitting in level 1 cache

    ...

    Assembler? Try!

    I will not be surprised that your assembler code will be slower, until you are really good in it.

    Just a loop wouldn't be good enough. You will need to code some "bubbled" loops to get it faster even in assembler. And doing it in pure assembler (even for intel-like cpu's) will not be a simple task.

    So, stick to SQL 😉

    To be fair, we'd start from a file, because if the contest is aggregating from a table, assembly wins because I'll pick a table definition that consists of putting the values into assembly language constant operands and I just don't know if Microsoft is going to beat that implementation.

    So from a file, I could focus on bulk loading to an in memory hash or array for sorting, I don't know if it would help aggregating while building a hash then sorting for output, or sorting a key and another pass to accumulate, but I could try both (or anything) and go with the best results. No transactions, no concurrency control, no catalog. I just don't know if you're right on this one, but then again I haven't actually coded it.

    Anyways, getting off topic, I found my T-SQL testing helpful in understanding my SQL usage and thought I would share, but I didn't mean to upset anybody. My apologies for cluttering the thread!

    (edited for defective quoting)

  • patrickmcginnis59 (8/14/2012)


    The routines are simple enought that I also tried perl and vba but couldn't find anything slower than T-SQL. Obviously testing a c version would be sort of silly. Any counterpoints or criticisms are clearly welcome, but otherwise I'm going to continue in my belief that T-SQL is just not that fast a programming environment.

    DECLARE @X INT

    DECLARE @Y INT

    DECLARE @Z INT

    SET @X = 1

    SET @Z = 1

    WHILE @X < 20000

    BEGIN

    SET @Y = 1

    WHILE @Y < 20000

    BEGIN

    SET @Y = @Y + 1

    SET @Z = (@Z * 2) + 1

    IF @Z > 50000

    BEGIN

    SET @Z = @Z - 50000

    END

    END

    SET @X = @X + 1

    END

    SELECT @Z

    OK, let's try some proper SQL approach.

    Set based one.

    DECLARE @Z int

    DECLARE @StartTime datetime

    SET @Z = 1

    SET @StartTime = GETDATE()

    SELECT @Z = ((@Z * 2) + 1)%50000

    FROM DBA.dbo.Tally AS X

    INNER JOIN DBA.dbo.Tally AS Y ON Y.N > 0 AND Y.N < 20000

    WHERE X.N > 0 AND X.N < 20000

    -- Table Tally contains sequential numbers from 0 to as many as you need it.

    SELECT @Z, CONVERT(varchar(20), GETDATE() - @StartTime, 114) TimeSpent

    How does it compare?

    _____________
    Code for TallyGenerator

  • ...

    To be fair, we'd start from a file, because if the contest is aggregating from a table, assembly wins because I'll pick a table definition that consists of putting the values into assembly language constant operands and I just don't know if Microsoft is going to beat that implementation.

    ...

    Even if table definition is just an array of constants, you are not going to beat SQL to perform the discussed (calculating balance per customer) by loading every value into defined registers - as it's as simple as that.

    To do A(X[1 to n]),y[1 to m]) -> A(X,1)[1 to n] = SUM(Y[1 to m)) in assembler is a heck of task. I'm not saying it's impossible, but it's complex and as I said before, simple loop (even through constant values) highly unlikely will perform faster than SQL...

    😎

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Sergiy,

    I ran the test with a little modification

    DECLARE @Z int

    DECLARE @StartTime datetime

    SET @Z = 1

    SET @StartTime = GETDATE();

    SELECT @Z = ((@Z * 2) + 1)%50000

    FROM #Tally AS X

    Cross JOIN #Tally AS Y

    WHERE X.N > 0 AND X.N <= 20000

    -- Table Tally contains sequential numbers from 0 to as many as you need it.

    SELECT @Z, CONVERT(varchar(20), GETDATE() - @StartTime, 114) TimeSpent

    It ran in ~3m 43s. Though the #tally has exactly 20000 rows in it so the cross join works in the same manner, in total its doing around 400 million iterations, however still an improvement on the nested loop example that was given.

    SQL Server will never compete in terms of speed with a linear program loop, it doesnt need to, as thats not what its designed to do, however that doesnt mean it cannot out perform linear programming in terms of Data retrieval.

    Have a linear programming language scan a 20,000 row table and join to another 20,000 row table looking for a set of data and SQL server will more than likely out perform it.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

Viewing 5 posts - 31 through 34 (of 34 total)

You must be logged in to reply to this topic. Login to reply