RBarryYoung said: It has long been known that naive linear string concatenation is an O(n^2) operation (more specifically, it's triangular). That simple fact is one of the big reasons that .NET has the StringBuilder class (which uses the strings->array->mass_concatenate trick that peter alludes to). As it happens, I spent quite a lot of time last month investigating this problem and potential solutions and let me just say: it's tough in SQL. Here are the problems: 1) SQL Server apparently (based on my tests) already does the "buffer-extension" trick available to mutable strings. Unfortunately, the Extension trick does NOT solve the O(n^2) problem, it just partially alleviates it at the lower end because some percentage(k) of appends can be extensions instead of creating a new string. Effectively it changes the {(n)*(n+1)/2} cost of the naive implementation to {n +(n)*(n-1)/(2*k)} where "k" is that percentage. It's better, but it's still O(n^2). 2) The "array & mass-concatenate" trick is not available to T-SQL, not because SQL doesn't have arrays (tables serve the same purpose), but because AFAIK, there is no function that can take a variable "collection" of strings (table, array, whatever) and produce an output string. 3) The "pre-allocate and Stuff" trick popular with mutable strings is not workable in T-SQL because the STUFF() function in T_SQL is NOT like the function of the same name in some general purpose languages: the T-SQL STUFF() is an RHS (right-hand side) function and NOT an LHS (left-hand side) function. AFAIK, there is no function in SQL that can (physically) write into a pre-existing string. That all said, I did eventually find a way to do it. Not in the O(n) time (linear) achievable in most general purpose languages, but I could get it down to O(n*Log(n)) time which is still a huge improvement. Unfortunately, I do not have it together in a presentable form and it would take me some time to do so (a day or so), but I can get it to you if you want. February 18, 2009 12:32 PM

RBarryyoung,

1) What do you mean by "buffer-extension"? 2) Can you explain about the formula you used?

Gail Shaw Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci) SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter We stand on the bridge and no one may pass

Group: General Forum Members Last Login: Tuesday, May 19, 2015 9:30 PM Points: 2,034,Visits: 2,555

Yes Gail...I have studied...But i don't remember the functionality...

I have reviewed the above mentioed URL, Since there are lot of formula's are there,it is confusing...

If you or anybody gave the short notes about it, I will review it once again with some understanding....instead of looking it blindly...

Apart from that, I have some more questions....

1) What is mean by pseudo cursor?

"pre-allocate and Stuff" trick popular with mutable strings

2) what this trick will do?

3) What RHS (right-hand side) function denotes exactly?

I believe the way it operates is from right side....ex: substring() right?

4) What LHS (Left-hand side) function denotes exactly?

I believe the way it operates is from left side....ex: right() right?

OK, since I couldn't get a copy of Moby Dick, I made my own test data. Here is your test rig Phil, with my test data and my technique, followed by your original approach. Needless to say, my method is nearly linear, but the simple approach grows geometrically:

5) what are all the method available apart from linear? What is mean by linear method? how to know whether my code use linear method?

Group: General Forum Members Last Login: Friday, June 12, 2015 8:42 AM Points: 9,294,Visits: 9,496

karthikeyan (6/8/2009)They used some concepts which i couldn't understand.

1) O(n) means ? 2) O(n^2) means ? 3) O(n log n) means ? 4) What is mean naive linear string concatenation?

For (1-3), see the Wikipedia articles that Gail referenced. Big-O notation is simple once you understand it, but a little confusing to explain so I'm deferring to the experts.

For (4), "naive linear string concatenation" is just the normal way of concatentating many strings together by appending each one onto the end of an accumulated output string:

--NOTE: This is an example. I dont really use While loops in SQL. Declare @i As int Declare @A As varchar(MAX) Select @i = 1, @A = ''

WHILE @i <= (Select Count(*) From MyStrings) BEGIN Select @A = @A + Select B From MyStrings Where ID = @i END

The problem with this approach is that you have to re-copy @A every time through the loop and @A is getting larger every time. So even if our strings are all 1 character, first we have to copy 1 character, then 2 characters, then 3, then 4, 5, 6, 7, ... all the way out to N where N is total number of strings we want to concatenate together.

This totals to ((N^2) + N)/2 which is complexity Order N^2 (or just O(N^2)).

RBarryYoung said: ... 1) SQL Server apparently (based on my tests) already does the "buffer-extension" trick available to mutable strings. Unfortunately, the Extension trick does NOT solve the O(n^2) problem, it just partially alleviates it at the lower end because some percentage(k) of appends can be extensions instead of creating a new string. Effectively it changes the {(n)*(n+1)/2} cost of the naive implementation to {n +(n)*(n-1)/(2*k)} where "k" is that percentage. It's better, but it's still O(n^2).

RBarryyoung,

1) What do you mean by "buffer-extension"?

The "buffer-extension trick" is referring to a low-level (Assembler or C) trick that takes advantage of that fact that in most OS's strings often have large gaps of unallocated memory after them. Thus if you want to add the string B$ onto the end of A$, then normally you would have to allocate a new buffer whose length is >= Len(A$) + Len(B$), then copy all of A$ inot it, then copy B$ in after A$ and then re-point A$ to the new buff (and deallocate the old). Besides all of the memory allocation overhead, this is an O(n) operation where n = Len(A$)+Len(B$).

However, if A$ just happens to have an unused portion of memory after it whose size is >= Len(B$) then, instead you can just copy B$ into that unused memory after A$ and extend A$'s length value to inlcude the appended characters.

Group: General Forum Members Last Login: Today @ 2:04 PM Points: 44,006,Visits: 41,402

karthikeyan (6/8/2009)Yes Gail...I have studied...But i don't remember the functionality... I have reviewed the above mentioed URL, Since there are lot of formula's are there,it is confusing...

Ok, explain what you do know of Big O notation, since you said you did study it, and what exactly is confusing you and we'll try and fill in from there.

3) What RHS (right-hand side) function denotes exactly?

In what context?

4) What LHS (Left-hand side) function denotes exactly?

In what context?

OK, since I couldn't get a copy of Moby Dick, I made my own test data. Here is your test rig Phil, with my test data and my technique, followed by your original approach. Needless to say, my method is nearly linear, but the simple approach grows geometrically:

5) what are all the method available apart from linear? What is mean by linear method? how to know whether my code use linear method?

Related to Big O notation. An algorithm is linear, or has linear time complexity if the time to process a number of inputs grows linearly with the number of inputs. It would be described as O(n) There's lots of complexities other than linear. Logarithmic, geometric, polynomial, exponential, etc.

Gail Shaw Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci) SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter We stand on the bridge and no one may pass

Group: General Forum Members Last Login: Friday, June 12, 2015 8:42 AM Points: 9,294,Visits: 9,496

karthikeyan (6/8/2009)1) What is mean by pseudo cursor?

Pusedocursor is a name that some people (including me) use for the SQL Server (and Sybase too, I believe) trick of being able to assign both to and from variables for every row of a resultset. The typical example is the simple string concatenation trick that we often use:

Declare @str varchar(MAX) Set @str = '' Select @str = @str + ',' + [name] From sys.columns Print @str

"pre-allocate and Stuff" trick popular with mutable strings

2) what this trick will do?

We used to call this "string mapping" back in the old days. Instead of reallocating the output string with every iteration, you estimate the likely size of the output string and pre-allocate the output string with that much blank space and use a LenPtr to keep track of how much of this buffer you have used. Then for each string you "MAP" or "STUFF" the current source string into the next available space after LenPtr and then update LenPtr to the new end. This is linear time ( O(n) ) because you only have to touch each input character once.

(Heh. I was re-reading this and just realized that I got my "Left" and "Right" reversed! Yikes!)

4) What LHS (Left-hand side)RHS (Right-hand side) function denotes exactly?

LeftRight-hand Functions appear leftright of the assignment operator ("="):

Set @str = UPPER('Some text.')

These are normal functions, and AFAIK, in T-SQL, all functions are LRHS.

3) What RLHS (rightleft-hand side) function denotes exactly?

RightLeft-hand Side functions appear on the rightleft-hand side and they usually do "special" things having to do with addressing the output property or variable. AFAIK, T-SQL does not have any, but in some languages, STUFF is a RLHS:

STUFF(@str, offset, len) = 'foo'

This example would overwrite the output string (@str) with the input string starting at 'offset' for 'len' characters. The difference between this and the RHS STUFF() function in T-SQL is that the LHS version does not return anything, it actually does write over the characters of the @str variable.

Group: General Forum Members Last Login: Tuesday, May 19, 2015 9:30 PM Points: 2,034,Visits: 2,555

RBarryYoung (6/8/2009)

karthikeyan (6/8/2009)They used some concepts which i couldn't understand.

1) O(n) means ? 2) O(n^2) means ? 3) O(n log n) means ? 4) What is mean naive linear string concatenation?

For (1-3), see the Wikipedia articles that Gail referenced. Big-O notation is simple once you understand it, but a little confusing to explain so I'm deferring to the experts.

For (4), "naive linear string concatenation" is just the normal way of concatentating many strings together by appending each one onto the end of an accumulated output string:

--NOTE: This is an example. I dont really use While loops in SQL. Declare @i As int Declare @A As varchar(MAX) Select @i = 1, @A = ''

WHILE @i <= (Select Count(*) From MyStrings) BEGIN Select @A = @A + Select B From MyStrings Where ID = @i END

The problem with this approach is that you have to re-copy @A every time through the loop and @A is getting larger every time. So even if our strings are all 1 character, first we have to copy 1 character, then 2 characters, then 3, then 4, 5, 6, 7, ... all the way out to N where N is total number of strings we want to concatenate together.

This totals to ((N^2) + N)/2 which is complexity Order N^2 (or just O(N^2)).

RBarryYoung said: ... 1) SQL Server apparently (based on my tests) already does the "buffer-extension" trick available to mutable strings. Unfortunately, the Extension trick does NOT solve the O(n^2) problem, it just partially alleviates it at the lower end because some percentage(k) of appends can be extensions instead of creating a new string. Effectively it changes the {(n)*(n+1)/2} cost of the naive implementation to {n +(n)*(n-1)/(2*k)} where "k" is that percentage. It's better, but it's still O(n^2).

RBarryyoung,

1) What do you mean by "buffer-extension"?

The "buffer-extension trick" is referring to a low-level (Assembler or C) trick that takes advantage of that fact that in most OS's strings often have large gaps of unallocated memory after them. Thus if you want to add the string B$ onto the end of A$, then normally you would have to allocate a new buffer whose length is >= Len(A$) + Len(B$), then copy all of A$ inot it, then copy B$ in after A$ and then re-point A$ to the new buff (and deallocate the old). Besides all of the memory allocation overhead, this is an O(n) operation where n = Len(A$)+Len(B$).

However, if A$ just happens to have an unused portion of memory after it whose size is >= Len(B$) then, instead you can just copy B$ into that unused memory after A$ and extend A$'s length value to inlcude the appended characters.

2) Can you explain about the formula you used?

Which formula? I use a lot of them.

RBarryyoung and Gail,

Big O notation means the steps to complete a problem. Right?

Still i have lot of questions.

1) what do you mean by O(n^2) problem? Pls don't mistake me if i am asking this question again.

still I am not getting this formula clearly.

2) Big O notation means the steps to complete a problem. we denote it as O(n). But what is the difference between O(n) and O(n^2) ?

This totals to ((N^2) + N)/2 which is complexity Order N^2 (or just O(N^2)).

I believe if i understand the above one, i can idenitify the difference between them. But the above one is confusing me...

Group: General Forum Members Last Login: Tuesday, May 19, 2015 9:30 PM Points: 2,034,Visits: 2,555

1) SQL Server apparently (based on my tests) already does the "buffer-extension" trick available to mutable strings. Unfortunately, the Extension trick does NOT solve the O(n^2) problem,

If it supports the "buffer-extension" trick,then why it doesn't solve O(n^2) problem. again , i need to know exactly about O(n^2).

can you explain it with example? so that i can remember it through my career.

Group: General Forum Members Last Login: Today @ 2:04 PM Points: 44,006,Visits: 41,402

karthikeyan (6/9/2009)Big O notation means the steps to complete a problem. Right?

No. Not at all.

Big O notation describes the time complexity of an algorithm. That is describes how the time required to solve an algorithm will increase as the number of inputs (n) increases.

A linear algorithm (O(n)) is one where the time required to solve the algorithm increases linearly with the number of inputs. So if 1000 inputs takes 1 minute to solve, 2000 inputs will take 2 minutes to solve and 5000 inputs will take 5 minutes to solve.

A quadratic algorithm (O(n^{2}) is one where the time required to solve the algorithm increases quadratically with the number of inputs. So if 1000 inputs takes 1 minute to solve, 2000 inputs will take 4 minutes to solve and 5000 inputs will take 25 minutes to solve. (roughly)

I'm curious. Where and when did you do your MCA? Where and when did you do your undergrad Comp Sci degree?

Gail Shaw Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci) SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter We stand on the bridge and no one may pass

Group: General Forum Members Last Login: Tuesday, May 19, 2015 9:30 PM Points: 2,034,Visits: 2,555

RBarryYoung (6/8/2009)

karthikeyan (6/8/2009)1) What is mean by pseudo cursor?

Pusedocursor is a name that some people (including me) use for the SQL Server (and Sybase too, I believe) trick of being able to assign both to and from variables for every row of a resultset. The typical example is the simple string concatenation trick that we often use:

Declare @str varchar(MAX) Set @str = '' Select @str = @str + ',' + [name] From sys.columns Print @str

"pre-allocate and Stuff" trick popular with mutable strings

2) what this trick will do?

We used to call this "string mapping" back in the old days. Instead of reallocating the output string with every iteration, you estimate the likely size of the output string and pre-allocate the output string with that much blank space and use a LenPtr to keep track of how much of this buffer you have used. Then for each string you "MAP" or "STUFF" the current source string into the next available space after LenPtr and then update LenPtr to the new end. This is linear time ( O(n) ) because you only have to touch each input character once.

4) What LHS (Left-hand side) function denotes exactly?

Left-hand Functions appear left of the assignment operator ("="):

Set @str = UPPER('Some text.')

These are normal functions, and AFAIK, in T-SQL, all functions are LHS.

3) What RHS (right-hand side) function denotes exactly?

Right-hand Side functions appear on the right-hand side and they usually do "special" things having to do with addressing the output property or variable. AFAIK, T-SQL does not have any, but in some languages, STUFF is a RHS:

STUFF(@str, offset, len) = 'foo'

This example would overwrite the output string (@str) with the input string starting at 'offset' for 'len' characters. The difference between this and th LHS STUFF() function in T-SQL is that the RHS version does not return anything, it actually does write over the characters of the @str variable.

So we should avoid pseudo cursor. Right? 1) It will be treated as internal cursor. i.e it will perform looping internally. But comparing to external cursor, it will be very fasy. Right? 2) is it a good habit to use pseudo cursor in sql programming? if not, what is the workaround for this?

You cannot post new topics. You cannot post topic replies. You cannot post new polls. You cannot post replies to polls. You cannot edit your own topics. You cannot delete your own topics. You cannot edit other topics. You cannot delete other topics. You cannot edit your own posts. You cannot edit other posts. You cannot delete your own posts. You cannot delete other posts. You cannot post events. You cannot edit your own events. You cannot edit other events.

You cannot delete your own events. You cannot delete other events. You cannot send private messages. You cannot send emails. You may read topics. You cannot rate topics. You cannot vote within polls. You cannot upload attachments. You may download attachments. You cannot post HTML code. You cannot edit HTML code. You cannot post IFCode. You cannot post JavaScript. You cannot post EmotIcons. You cannot post or upload images.