Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

O(n) , O(n log n), O(n^2) Expand / Collapse
Author
Message
Posted Monday, June 8, 2009 5:46 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, November 27, 2014 12:52 AM
Points: 2,031, Visits: 2,535
Hi All,

I have recently read the below article.

http://www.sqlservercentral.com/blogs/philfactor/archive/2009/02/16/be-careful-with-string-concatenations-in-sql-server-with-big-strings.aspx

They used some concepts which i couldn't understand.

1) O(n) means ?
2) O(n^2) means ?
3) O(n log n) means ?
4) What is mean naive linear string concatenation?

http://stackoverflow.com/questions/859562/whats-the-best-way-to-recursively-reverse-a-string-in-java


RBarryYoung said:
It has long been known that naive linear string concatenation is an O(n^2) operation (more specifically, it's triangular). That simple fact is one of the big reasons that .NET has the StringBuilder class (which uses the strings->array->mass_concatenate trick that peter alludes to).
As it happens, I spent quite a lot of time last month investigating this problem and potential solutions and let me just say: it's tough in SQL. Here are the problems:
1) SQL Server apparently (based on my tests) already does the "buffer-extension" trick available to mutable strings. Unfortunately, the Extension trick does NOT solve the O(n^2) problem, it just partially alleviates it at the lower end because some percentage(k) of appends can be extensions instead of creating a new string. Effectively it changes the {(n)*(n+1)/2} cost of the naive implementation to {n +(n)*(n-1)/(2*k)} where "k" is that percentage. It's better, but it's still O(n^2).
2) The "array & mass-concatenate" trick is not available to T-SQL, not because SQL doesn't have arrays (tables serve the same purpose), but because AFAIK, there is no function that can take a variable "collection" of strings (table, array, whatever) and produce an output string.
3) The "pre-allocate and Stuff" trick popular with mutable strings is not workable in T-SQL because the STUFF() function in T_SQL is NOT like the function of the same name in some general purpose languages: the T-SQL STUFF() is an RHS (right-hand side) function and NOT an LHS (left-hand side) function. AFAIK, there is no function in SQL that can (physically) write into a pre-existing string.
That all said, I did eventually find a way to do it. Not in the O(n) time (linear) achievable in most general purpose languages, but I could get it down to O(n*Log(n)) time which is still a huge improvement. Unfortunately, I do not have it together in a presentable form and it would take me some time to do so (a day or so), but I can get it to you if you want.
February 18, 2009 12:32 PM


RBarryyoung,

1) What do you mean by "buffer-extension"?
2) Can you explain about the formula you used?


karthik
Post #730611
Posted Monday, June 8, 2009 5:53 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 9:22 AM
Points: 40,632, Visits: 37,094
karthikeyan (6/8/2009)

1) O(n) means ?
2) O(n^2) means ?
3) O(n log n) means ?


Surely you covered the Big O notation for algorithmic (time) complexity at university? It's a fundamental of Comp Sci theory.

Try these
http://en.wikipedia.org/wiki/Computational_complexity_theory
http://en.wikipedia.org/wiki/Big_O_notation



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #730616
Posted Monday, June 8, 2009 7:07 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, November 27, 2014 12:52 AM
Points: 2,031, Visits: 2,535
Yes Gail...I have studied...But i don't remember the functionality...

I have reviewed the above mentioed URL, Since there are lot of formula's are there,it is confusing...


If you or anybody gave the short notes about it, I will review it once again with some understanding....instead of looking it blindly...

Apart from that, I have some more questions....

1) What is mean by pseudo cursor?

"pre-allocate and Stuff" trick popular with mutable strings


2) what this trick will do?

3) What RHS (right-hand side) function denotes exactly?

I believe the way it operates is from right side....ex: substring()
right?

4) What LHS (Left-hand side) function denotes exactly?

I believe the way it operates is from left side....ex: right()
right?

OK, since I couldn't get a copy of Moby Dick, I made my own test data. Here is your test rig Phil, with my test data and my technique, followed by your original approach. Needless to say, my method is nearly linear, but the simple approach grows geometrically:


5) what are all the method available apart from linear? What is mean by linear method?
how to know whether my code use linear method?

aaah....lot of doubts.....



karthik
Post #730657
Posted Monday, June 8, 2009 8:35 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Thursday, December 4, 2014 7:52 AM
Points: 9,294, Visits: 9,495
karthikeyan (6/8/2009)
They used some concepts which i couldn't understand.

1) O(n) means ?
2) O(n^2) means ?
3) O(n log n) means ?
4) What is mean naive linear string concatenation?

For (1-3), see the Wikipedia articles that Gail referenced. Big-O notation is simple once you understand it, but a little confusing to explain so I'm deferring to the experts.

For (4), "naive linear string concatenation" is just the normal way of concatentating many strings together by appending each one onto the end of an accumulated output string:
--NOTE: This is an example.  I dont really use While loops in SQL.
Declare @i As int
Declare @A As varchar(MAX)
Select @i = 1, @A = ''

WHILE @i <= (Select Count(*) From MyStrings)
BEGIN
Select @A = @A + Select B From MyStrings Where ID = @i
END

The problem with this approach is that you have to re-copy @A every time through the loop and @A is getting larger every time. So even if our strings are all 1 character, first we have to copy 1 character, then 2 characters, then 3, then 4, 5, 6, 7, ... all the way out to N where N is total number of strings we want to concatenate together.

This totals to ((N^2) + N)/2 which is complexity Order N^2 (or just O(N^2)).

RBarryYoung said: ...
1) SQL Server apparently (based on my tests) already does the "buffer-extension" trick available to mutable strings. Unfortunately, the Extension trick does NOT solve the O(n^2) problem, it just partially alleviates it at the lower end because some percentage(k) of appends can be extensions instead of creating a new string. Effectively it changes the {(n)*(n+1)/2} cost of the naive implementation to {n +(n)*(n-1)/(2*k)} where "k" is that percentage. It's better, but it's still O(n^2).


RBarryyoung,

1) What do you mean by "buffer-extension"?

The "buffer-extension trick" is referring to a low-level (Assembler or C) trick that takes advantage of that fact that in most OS's strings often have large gaps of unallocated memory after them. Thus if you want to add the string B$ onto the end of A$, then normally you would have to allocate a new buffer whose length is >= Len(A$) + Len(B$), then copy all of A$ inot it, then copy B$ in after A$ and then re-point A$ to the new buff (and deallocate the old). Besides all of the memory allocation overhead, this is an O(n) operation where n = Len(A$)+Len(B$).

However, if A$ just happens to have an unused portion of memory after it whose size is >= Len(B$) then, instead you can just copy B$ into that unused memory after A$ and extend A$'s length value to inlcude the appended characters.

2) Can you explain about the formula you used?

Which formula? I use a lot of them.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #730713
Posted Monday, June 8, 2009 10:25 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 9:22 AM
Points: 40,632, Visits: 37,094
karthikeyan (6/8/2009)
Yes Gail...I have studied...But i don't remember the functionality...
I have reviewed the above mentioed URL, Since there are lot of formula's are there,it is confusing...


Ok, explain what you do know of Big O notation, since you said you did study it, and what exactly is confusing you and we'll try and fill in from there.

3) What RHS (right-hand side) function denotes exactly?


In what context?

4) What LHS (Left-hand side) function denotes exactly?


In what context?

OK, since I couldn't get a copy of Moby Dick, I made my own test data. Here is your test rig Phil, with my test data and my technique, followed by your original approach. Needless to say, my method is nearly linear, but the simple approach grows geometrically:


5) what are all the method available apart from linear? What is mean by linear method?
how to know whether my code use linear method?


Related to Big O notation. An algorithm is linear, or has linear time complexity if the time to process a number of inputs grows linearly with the number of inputs. It would be described as O(n)
There's lots of complexities other than linear. Logarithmic, geometric, polynomial, exponential, etc.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #730797
Posted Monday, June 8, 2009 2:13 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Thursday, December 4, 2014 7:52 AM
Points: 9,294, Visits: 9,495
karthikeyan (6/8/2009)
1) What is mean by pseudo cursor?
Pusedocursor is a name that some people (including me) use for the SQL Server (and Sybase too, I believe) trick of being able to assign both to and from variables for every row of a resultset. The typical example is the simple string concatenation trick that we often use:
Declare @str varchar(MAX)
Set @str = ''
Select @str = @str + ',' + [name] From sys.columns
Print @str

"pre-allocate and Stuff" trick popular with mutable strings


2) what this trick will do?
We used to call this "string mapping" back in the old days. Instead of reallocating the output string with every iteration, you estimate the likely size of the output string and pre-allocate the output string with that much blank space and use a LenPtr to keep track of how much of this buffer you have used. Then for each string you "MAP" or "STUFF" the current source string into the next available space after LenPtr and then update LenPtr to the new end. This is linear time ( O(n) ) because you only have to touch each input character once.

(Heh. I was re-reading this and just realized that I got my "Left" and "Right" reversed! Yikes!)

4) What LHS (Left-hand side) RHS (Right-hand side) function denotes exactly?

LeftRight-hand Functions appear leftright of the assignment operator ("="):
 Set @str = UPPER('Some text.')

These are normal functions, and AFAIK, in T-SQL, all functions are LRHS.

3) What RLHS (rightleft-hand side) function denotes exactly?

RightLeft-hand Side functions appear on the rightleft-hand side and they usually do "special" things having to do with addressing the output property or variable. AFAIK, T-SQL does not have any, but in some languages, STUFF is a RLHS:
 STUFF(@str, offset, len) = 'foo'

This example would overwrite the output string (@str) with the input string starting at 'offset' for 'len' characters. The difference between this and the RHS STUFF() function in T-SQL is that the LHS version does not return anything, it actually does write over the characters of the @str variable.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #730960
Posted Tuesday, June 9, 2009 4:12 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, November 27, 2014 12:52 AM
Points: 2,031, Visits: 2,535
RBarryYoung (6/8/2009)
karthikeyan (6/8/2009)
They used some concepts which i couldn't understand.

1) O(n) means ?
2) O(n^2) means ?
3) O(n log n) means ?
4) What is mean naive linear string concatenation?

For (1-3), see the Wikipedia articles that Gail referenced. Big-O notation is simple once you understand it, but a little confusing to explain so I'm deferring to the experts.

For (4), "naive linear string concatenation" is just the normal way of concatentating many strings together by appending each one onto the end of an accumulated output string:
--NOTE: This is an example.  I dont really use While loops in SQL.
Declare @i As int
Declare @A As varchar(MAX)
Select @i = 1, @A = ''

WHILE @i <= (Select Count(*) From MyStrings)
BEGIN
Select @A = @A + Select B From MyStrings Where ID = @i
END

The problem with this approach is that you have to re-copy @A every time through the loop and @A is getting larger every time. So even if our strings are all 1 character, first we have to copy 1 character, then 2 characters, then 3, then 4, 5, 6, 7, ... all the way out to N where N is total number of strings we want to concatenate together.

This totals to ((N^2) + N)/2 which is complexity Order N^2 (or just O(N^2)).

RBarryYoung said: ...
1) SQL Server apparently (based on my tests) already does the "buffer-extension" trick available to mutable strings. Unfortunately, the Extension trick does NOT solve the O(n^2) problem, it just partially alleviates it at the lower end because some percentage(k) of appends can be extensions instead of creating a new string. Effectively it changes the {(n)*(n+1)/2} cost of the naive implementation to {n +(n)*(n-1)/(2*k)} where "k" is that percentage. It's better, but it's still O(n^2).


RBarryyoung,

1) What do you mean by "buffer-extension"?

The "buffer-extension trick" is referring to a low-level (Assembler or C) trick that takes advantage of that fact that in most OS's strings often have large gaps of unallocated memory after them. Thus if you want to add the string B$ onto the end of A$, then normally you would have to allocate a new buffer whose length is >= Len(A$) + Len(B$), then copy all of A$ inot it, then copy B$ in after A$ and then re-point A$ to the new buff (and deallocate the old). Besides all of the memory allocation overhead, this is an O(n) operation where n = Len(A$)+Len(B$).

However, if A$ just happens to have an unused portion of memory after it whose size is >= Len(B$) then, instead you can just copy B$ into that unused memory after A$ and extend A$'s length value to inlcude the appended characters.

2) Can you explain about the formula you used?

Which formula? I use a lot of them.


RBarryyoung and Gail,

Big O notation means the steps to complete a problem. Right?

Still i have lot of questions.

1) what do you mean by O(n^2) problem? Pls don't mistake me if i am asking this question again.

still I am not getting this formula clearly.

2) Big O notation means the steps to complete a problem. we denote it as O(n). But what is the difference between O(n) and O(n^2) ?

This totals to ((N^2) + N)/2 which is complexity Order N^2 (or just O(N^2)).

I believe if i understand the above one, i can idenitify the difference between them. But the above one is confusing me...

say for example i have row like below

1
2
3
4
5

i would like to append the above values as
12345

so

Declare @str varchar(5)
select @str = ''
select @str = @str + no from mystring
step 1: '' + '1'
step 2: '1' + '2'
step 3: '12' + '3'
step 4: '123' + '4'
step 5: '1234' + '5'

so if i apply ((N^2) + N)/2 formula

N = 5
((5^2) + 5)/2 ( select ((5^2) + 5)/2 gave 6, but i thiink it is wrong as per mathematics. since ^ is considered as POWER in maths. Right ? )

((5^2) + 5)/2 equivalent to select (Power(5,2) + 5)/2. Right?

5^2 = 5*5 = 25
25+5 = 30/2 = 15. Right?

2) Can you explain about the formula you used?

Which formula? I use a lot of them.


{(n)*(n+1)/2} cost of the naive implementation to {n +(n)*(n-1)/(2*k)} where "k" is that percentage. It's better, but it's still O(n^2).

Can you explain it?


karthik
Post #731289
Posted Tuesday, June 9, 2009 4:22 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, November 27, 2014 12:52 AM
Points: 2,031, Visits: 2,535
1) SQL Server apparently (based on my tests) already does the "buffer-extension" trick available to mutable strings. Unfortunately, the Extension trick does NOT solve the O(n^2) problem,


If it supports the "buffer-extension" trick,then why it doesn't solve O(n^2) problem. again , i need to know exactly about O(n^2).

can you explain it with example? so that i can remember it through my career.

Does this issue solved in sql2008?

aaah...i am confusing myself....i think....




karthik
Post #731295
Posted Tuesday, June 9, 2009 4:24 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 9:22 AM
Points: 40,632, Visits: 37,094
karthikeyan (6/9/2009)
Big O notation means the steps to complete a problem. Right?

No. Not at all.

Big O notation describes the time complexity of an algorithm. That is describes how the time required to solve an algorithm will increase as the number of inputs (n) increases.

A linear algorithm (O(n)) is one where the time required to solve the algorithm increases linearly with the number of inputs. So if 1000 inputs takes 1 minute to solve, 2000 inputs will take 2 minutes to solve and 5000 inputs will take 5 minutes to solve.

A quadratic algorithm (O(n2) is one where the time required to solve the algorithm increases quadratically with the number of inputs. So if 1000 inputs takes 1 minute to solve, 2000 inputs will take 4 minutes to solve and 5000 inputs will take 25 minutes to solve. (roughly)



I'm curious. Where and when did you do your MCA? Where and when did you do your undergrad Comp Sci degree?



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #731297
Posted Tuesday, June 9, 2009 4:24 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, November 27, 2014 12:52 AM
Points: 2,031, Visits: 2,535
RBarryYoung (6/8/2009)
karthikeyan (6/8/2009)
1) What is mean by pseudo cursor?
Pusedocursor is a name that some people (including me) use for the SQL Server (and Sybase too, I believe) trick of being able to assign both to and from variables for every row of a resultset. The typical example is the simple string concatenation trick that we often use:
Declare @str varchar(MAX)
Set @str = ''
Select @str = @str + ',' + [name] From sys.columns
Print @str

"pre-allocate and Stuff" trick popular with mutable strings


2) what this trick will do?
We used to call this "string mapping" back in the old days. Instead of reallocating the output string with every iteration, you estimate the likely size of the output string and pre-allocate the output string with that much blank space and use a LenPtr to keep track of how much of this buffer you have used. Then for each string you "MAP" or "STUFF" the current source string into the next available space after LenPtr and then update LenPtr to the new end. This is linear time ( O(n) ) because you only have to touch each input character once.

4) What LHS (Left-hand side) function denotes exactly?

Left-hand Functions appear left of the assignment operator ("="):
 Set @str = UPPER('Some text.')

These are normal functions, and AFAIK, in T-SQL, all functions are LHS.

3) What RHS (right-hand side) function denotes exactly?

Right-hand Side functions appear on the right-hand side and they usually do "special" things having to do with addressing the output property or variable. AFAIK, T-SQL does not have any, but in some languages, STUFF is a RHS:
 STUFF(@str, offset, len) = 'foo'

This example would overwrite the output string (@str) with the input string starting at 'offset' for 'len' characters. The difference between this and th LHS STUFF() function in T-SQL is that the RHS version does not return anything, it actually does write over the characters of the @str variable.


So we should avoid pseudo cursor. Right?
1) It will be treated as internal cursor. i.e it will perform looping internally.
But comparing to external cursor, it will be very fasy. Right?
2) is it a good habit to use pseudo cursor in sql programming? if not, what is the workaround for this?



karthik
Post #731298
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse