O(n) , O(n log n), O(n^2)

  • Hi All,

    I have recently read the below article.

    http://www.sqlservercentral.com/blogs/philfactor/archive/2009/02/16/be-careful-with-string-concatenations-in-sql-server-with-big-strings.aspx

    They used some concepts which i couldn't understand.

    1) O(n) means ?

    2) O(n^2) means ?

    3) O(n log n) means ?

    4) What is mean naive linear string concatenation?

    http://stackoverflow.com/questions/859562/whats-the-best-way-to-recursively-reverse-a-string-in-java

    RBarryYoung said:

    It has long been known that naive linear string concatenation is an O(n^2) operation (more specifically, it's triangular). That simple fact is one of the big reasons that .NET has the StringBuilder class (which uses the strings->array->mass_concatenate trick that peter alludes to).

    As it happens, I spent quite a lot of time last month investigating this problem and potential solutions and let me just say: it's tough in SQL. Here are the problems:

    1) SQL Server apparently (based on my tests) already does the "buffer-extension" trick available to mutable strings. Unfortunately, the Extension trick does NOT solve the O(n^2) problem, it just partially alleviates it at the lower end because some percentage(k) of appends can be extensions instead of creating a new string. Effectively it changes the {(n)*(n+1)/2} cost of the naive implementation to {n +(n)*(n-1)/(2*k)} where "k" is that percentage. It's better, but it's still O(n^2).

    2) The "array & mass-concatenate" trick is not available to T-SQL, not because SQL doesn't have arrays (tables serve the same purpose), but because AFAIK, there is no function that can take a variable "collection" of strings (table, array, whatever) and produce an output string.

    3) The "pre-allocate and Stuff" trick popular with mutable strings is not workable in T-SQL because the STUFF() function in T_SQL is NOT like the function of the same name in some general purpose languages: the T-SQL STUFF() is an RHS (right-hand side) function and NOT an LHS (left-hand side) function. AFAIK, there is no function in SQL that can (physically) write into a pre-existing string.

    That all said, I did eventually find a way to do it. Not in the O(n) time (linear) achievable in most general purpose languages, but I could get it down to O(n*Log(n)) time which is still a huge improvement. Unfortunately, I do not have it together in a presentable form and it would take me some time to do so (a day or so), but I can get it to you if you want.

    February 18, 2009 12:32 PM

    RBarryyoung,

    1) What do you mean by "buffer-extension"?

    2) Can you explain about the formula you used?

    karthik

  • karthikeyan (6/8/2009)


    1) O(n) means ?

    2) O(n^2) means ?

    3) O(n log n) means ?

    Surely you covered the Big O notation for algorithmic (time) complexity at university? It's a fundamental of Comp Sci theory.

    Try these

    http://en.wikipedia.org/wiki/Computational_complexity_theory

    http://en.wikipedia.org/wiki/Big_O_notation

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Yes Gail...I have studied...But i don't remember the functionality...

    I have reviewed the above mentioed URL, Since there are lot of formula's are there,it is confusing...

    If you or anybody gave the short notes about it, I will review it once again with some understanding....instead of looking it blindly...

    Apart from that, I have some more questions....

    1) What is mean by pseudo cursor?

    "pre-allocate and Stuff" trick popular with mutable strings

    2) what this trick will do?

    3) What RHS (right-hand side) function denotes exactly?

    I believe the way it operates is from right side....ex: substring()

    right?

    4) What LHS (Left-hand side) function denotes exactly?

    I believe the way it operates is from left side....ex: right()

    right?

    OK, since I couldn't get a copy of Moby Dick, I made my own test data. Here is your test rig Phil, with my test data and my technique, followed by your original approach. Needless to say, my method is nearly linear, but the simple approach grows geometrically:

    5) what are all the method available apart from linear? What is mean by linear method?

    how to know whether my code use linear method?

    aaah....lot of doubts.....

    karthik

  • karthikeyan (6/8/2009)


    They used some concepts which i couldn't understand.

    1) O(n) means ?

    2) O(n^2) means ?

    3) O(n log n) means ?

    4) What is mean naive linear string concatenation?

    For (1-3), see the Wikipedia articles that Gail referenced. Big-O notation is simple once you understand it, but a little confusing to explain so I'm deferring to the experts.

    For (4), "naive linear string concatenation" is just the normal way of concatentating many strings together by appending each one onto the end of an accumulated output string:

    --NOTE: This is an example. I dont really use While loops in SQL.

    Declare @i As int

    Declare @a As varchar(MAX)

    Select @i = 1, @a = ''

    WHILE @i = Len(A$) + Len(B$), then copy all of A$ inot it, then copy B$ in after A$ and then re-point A$ to the new buff (and deallocate the old). Besides all of the memory allocation overhead, this is an O(n) operation where n = Len(A$)+Len(B$).

    However, if A$ just happens to have an unused portion of memory after it whose size is >= Len(B$) then, instead you can just copy B$ into that unused memory after A$ and extend A$'s length value to inlcude the appended characters.

    2) Can you explain about the formula you used?

    Which formula? I use a lot of them.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • karthikeyan (6/8/2009)


    Yes Gail...I have studied...But i don't remember the functionality...

    I have reviewed the above mentioed URL, Since there are lot of formula's are there,it is confusing...

    Ok, explain what you do know of Big O notation, since you said you did study it, and what exactly is confusing you and we'll try and fill in from there.

    3) What RHS (right-hand side) function denotes exactly?

    In what context?

    4) What LHS (Left-hand side) function denotes exactly?

    In what context?

    OK, since I couldn't get a copy of Moby Dick, I made my own test data. Here is your test rig Phil, with my test data and my technique, followed by your original approach. Needless to say, my method is nearly linear, but the simple approach grows geometrically:

    5) what are all the method available apart from linear? What is mean by linear method?

    how to know whether my code use linear method?

    Related to Big O notation. An algorithm is linear, or has linear time complexity if the time to process a number of inputs grows linearly with the number of inputs. It would be described as O(n)

    There's lots of complexities other than linear. Logarithmic, geometric, polynomial, exponential, etc.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • karthikeyan (6/8/2009)


    1) What is mean by pseudo cursor?

    Pusedocursor is a name that some people (including me) use for the SQL Server (and Sybase too, I believe) trick of being able to assign both to and from variables for every row of a resultset. The typical example is the simple string concatenation trick that we often use:

    Declare @STR varchar(MAX)

    Set @STR = ''

    Select @STR = @STR + ',' + [name] From sys.columns

    Print @STR

    "pre-allocate and Stuff" trick popular with mutable strings

    2) what this trick will do?

    We used to call this "string mapping" back in the old days. Instead of reallocating the output string with every iteration, you estimate the likely size of the output string and pre-allocate the output string with that much blank space and use a LenPtr to keep track of how much of this buffer you have used. Then for each string you "MAP" or "STUFF" the current source string into the next available space after LenPtr and then update LenPtr to the new end. This is linear time ( O(n) ) because you only have to touch each input character once.

    (Heh. I was re-reading this and just realized that I got my "Left" and "Right" reversed! :blush: Yikes!)

    4) What LHS (Left-hand side) RHS (Right-hand side) function denotes exactly?

    LeftRight-hand Functions appear leftright of the assignment operator ("="):

    Set @STR = UPPER('Some text.')

    These are normal functions, and AFAIK, in T-SQL, all functions are LRHS.

    3) What RLHS (rightleft-hand side) function denotes exactly?

    RightLeft-hand Side functions appear on the rightleft-hand side and they usually do "special" things having to do with addressing the output property or variable. AFAIK, T-SQL does not have any, but in some languages, STUFF is a RLHS:

    STUFF(@str, offset, len) = 'foo'

    This example would overwrite the output string (@str) with the input string starting at 'offset' for 'len' characters. The difference between this and the RHS STUFF() function in T-SQL is that the LHS version does not return anything, it actually does write over the characters of the @STR variable.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (6/8/2009)


    karthikeyan (6/8/2009)


    They used some concepts which i couldn't understand.

    1) O(n) means ?

    2) O(n^2) means ?

    3) O(n log n) means ?

    4) What is mean naive linear string concatenation?

    For (1-3), see the Wikipedia articles that Gail referenced. Big-O notation is simple once you understand it, but a little confusing to explain so I'm deferring to the experts.

    For (4), "naive linear string concatenation" is just the normal way of concatentating many strings together by appending each one onto the end of an accumulated output string:

    --NOTE: This is an example. I dont really use While loops in SQL.

    Declare @i As int

    Declare @a As varchar(MAX)

    Select @i = 1, @a = ''

    WHILE @i = Len(A$) + Len(B$), then copy all of A$ inot it, then copy B$ in after A$ and then re-point A$ to the new buff (and deallocate the old). Besides all of the memory allocation overhead, this is an O(n) operation where n = Len(A$)+Len(B$).

    However, if A$ just happens to have an unused portion of memory after it whose size is >= Len(B$) then, instead you can just copy B$ into that unused memory after A$ and extend A$'s length value to inlcude the appended characters.

    2) Can you explain about the formula you used?

    Which formula? I use a lot of them.

    RBarryyoung and Gail,

    Big O notation means the steps to complete a problem. Right?

    Still i have lot of questions.

    1) what do you mean by O(n^2) problem? Pls don't mistake me if i am asking this question again.

    still I am not getting this formula clearly.

    2) Big O notation means the steps to complete a problem. we denote it as O(n). But what is the difference between O(n) and O(n^2) ?

    This totals to ((N^2) + N)/2 which is complexity Order N^2 (or just O(N^2)).

    I believe if i understand the above one, i can idenitify the difference between them. But the above one is confusing me...

    say for example i have row like below

    1

    2

    3

    4

    5

    i would like to append the above values as

    12345

    so

    Declare @STR varchar(5)

    select @STR = ''

    select @STR = @STR + no from mystring

    step 1: '' + '1'

    step 2: '1' + '2'

    step 3: '12' + '3'

    step 4: '123' + '4'

    step 5: '1234' + '5'

    so if i apply ((N^2) + N)/2 formula

    N = 5

    ((5^2) + 5)/2 ( select ((5^2) + 5)/2 gave 6, but i thiink it is wrong as per mathematics. since ^ is considered as POWER in maths. Right ? )

    ((5^2) + 5)/2 equivalent to select (Power(5,2) + 5)/2. Right?

    5^2 = 5*5 = 25

    25+5 = 30/2 = 15. Right?

    2) Can you explain about the formula you used?

    Which formula? I use a lot of them.

    {(n)*(n+1)/2} cost of the naive implementation to {n +(n)*(n-1)/(2*k)} where "k" is that percentage. It's better, but it's still O(n^2).

    Can you explain it?

    karthik

  • 1) SQL Server apparently (based on my tests) already does the "buffer-extension" trick available to mutable strings. Unfortunately, the Extension trick does NOT solve the O(n^2) problem,

    If it supports the "buffer-extension" trick,then why it doesn't solve O(n^2) problem. again , i need to know exactly about O(n^2).

    can you explain it with example? so that i can remember it through my career.

    Does this issue solved in sql2008?

    aaah...i am confusing myself....i think....

    karthik

  • karthikeyan (6/9/2009)


    Big O notation means the steps to complete a problem. Right?

    No. Not at all.

    Big O notation describes the time complexity of an algorithm. That is describes how the time required to solve an algorithm will increase as the number of inputs (n) increases.

    A linear algorithm (O(n)) is one where the time required to solve the algorithm increases linearly with the number of inputs. So if 1000 inputs takes 1 minute to solve, 2000 inputs will take 2 minutes to solve and 5000 inputs will take 5 minutes to solve.

    A quadratic algorithm (O(n2) is one where the time required to solve the algorithm increases quadratically with the number of inputs. So if 1000 inputs takes 1 minute to solve, 2000 inputs will take 4 minutes to solve and 5000 inputs will take 25 minutes to solve. (roughly)

    I'm curious. Where and when did you do your MCA? Where and when did you do your undergrad Comp Sci degree?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • RBarryYoung (6/8/2009)


    karthikeyan (6/8/2009)


    1) What is mean by pseudo cursor?

    Pusedocursor is a name that some people (including me) use for the SQL Server (and Sybase too, I believe) trick of being able to assign both to and from variables for every row of a resultset. The typical example is the simple string concatenation trick that we often use:

    Declare @STR varchar(MAX)

    Set @STR = ''

    Select @STR = @STR + ',' + [name] From sys.columns

    Print @STR

    "pre-allocate and Stuff" trick popular with mutable strings

    2) what this trick will do?

    We used to call this "string mapping" back in the old days. Instead of reallocating the output string with every iteration, you estimate the likely size of the output string and pre-allocate the output string with that much blank space and use a LenPtr to keep track of how much of this buffer you have used. Then for each string you "MAP" or "STUFF" the current source string into the next available space after LenPtr and then update LenPtr to the new end. This is linear time ( O(n) ) because you only have to touch each input character once.

    4) What LHS (Left-hand side) function denotes exactly?

    Left-hand Functions appear left of the assignment operator ("="):

    Set @STR = UPPER('Some text.')

    These are normal functions, and AFAIK, in T-SQL, all functions are LHS.

    3) What RHS (right-hand side) function denotes exactly?

    Right-hand Side functions appear on the right-hand side and they usually do "special" things having to do with addressing the output property or variable. AFAIK, T-SQL does not have any, but in some languages, STUFF is a RHS:

    STUFF(@str, offset, len) = 'foo'

    This example would overwrite the output string (@str) with the input string starting at 'offset' for 'len' characters. The difference between this and th LHS STUFF() function in T-SQL is that the RHS version does not return anything, it actually does write over the characters of the @STR variable.

    So we should avoid pseudo cursor. Right?

    1) It will be treated as internal cursor. i.e it will perform looping internally.

    But comparing to external cursor, it will be very fasy. Right?

    2) is it a good habit to use pseudo cursor in sql programming? if not, what is the workaround for this?

    karthik

  • Gail,

    i expected this question from you. Since i studied these concepts just to pass the exams, beacuse that time i don't know the real time usage and it will be used in our programming skill, I forgot the concepts.

    Once i got the job, then only i realized that all i studied will be applied in the work...anyway hereafter i never forgot this chapters.:)..Thanks for teaching me....Thank you Gail 🙂

    karthik

  • Hi

    I'm sure Barry & Gail will give full answers, but this interpretation may help.

    Don't think of O(n) / O(n^2) as proper mathematical formulas that can be solved. Think of them as a convenient shorthand description for a problem type.

    So O(n) describes a linear problem - as n increases, the solution increases proportionally to n as in a straight line graph

    O(n^2) is essentially a quadratic problem - as n increases, the solution increases proportionally to n-squared - or some function of n-squared.

    O(log n) means the solution increase proportional to log n

    I may be totally wrong or this could be too simplistic, but thats what I understood from the Wiki article and various other researching.

  • karthikeyan (6/9/2009)


    Gail,

    i expected this question from you. Since i studied these concepts just to pass the exams, beacuse that time i don't know the real time usage and it will be used in our programming skill, I forgot the concepts.

    Once i got the job, then only i realized that all i studied will be applied in the work...anyway hereafter i never forgot this chapters.:)..Thanks for teaching me....Thank you Gail 🙂

    I have an idea, how about actually answering Gail's questions about where and when you went to school for your MCA and BS degrees instead of just this fluff.

  • Tom Brown (6/9/2009)


    I may be totally wrong or this could be too simplistic, but thats what I understood from the Wiki article and various other researching.

    That's pretty much the core of it, yes.

    There's a whole tonne of theory as to analysing algorithms to work out their complexity but I don't recall any of the details of that any more. It's way over most people's heads anyway and more for maths/CS researchers.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • karthikeyan (6/9/2009)


    RBarryyoung and Gail,

    Big O notation means the steps to complete a problem. Right?

    Sort of. Big-O notation categorizes an algorithm's complexity in terms of the dominant term (without any constant factors) of its execution run-time as a function of the length of its input data("N"), as N approaches infinity. It is a way of talking about the efficiency of an algorithm apart from the efficiency of any particular implementation of that algorithm.

    But then you would know that if you had read the Wikipedia articles.

    1) what do you mean by O(n^2) problem? Pls don't mistake me if i am asking this question again.

    You would know this if you had done what we asked. I will not respond to this question again.

    EDIT: fixed formatting mistakes...

    2) Big O notation means the steps to complete a problem. we denote it as O(n). But what is the difference between O(n) and O(n^2) ?

    This is incorrect. Please read the Wikipedia articles.

    Declare @STR varchar(5)

    select @STR = ''

    select @STR = @STR + no from mystring

    step 1: '' + '1'

    step 2: '1' + '2'

    step 3: '12' + '3'

    step 4: '123' + '4'

    step 5: '1234' + '5'

    so if i apply ((N^2) + N)/2 formula

    N = 5

    ...

    5^2 = 5*5 = 25

    25+5 = 30/2 = 15. Right?

    Yes. Now count the total number of characters in quotes in your steps 1 through 5 above, what do you get?

    2) Can you explain about the formula you used?

    Which formula? I use a lot of them.

    {(n)*(n+1)/2} cost of the naive implementation to {n +(n)*(n-1)/(2*k)} where "k" is that percentage. It's better, but it's still O(n^2).

    Can you explain it?

    It is explained by the text that preceeded it. It is a formula that calculates the execution time of string concatenation that has been improved by being able to use buffer-extension part of the time. "k" is a number between 0 and 1 that expresses how often the buffer-extension trick can be used.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 1 through 15 (of 41 total)

You must be logged in to reply to this topic. Login to reply