Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

complicated sql syntax - looking for explanation for working query Expand / Collapse
Author
Message
Posted Sunday, May 4, 2014 7:10 PM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Monday, August 25, 2014 11:12 AM
Points: 631, Visits: 1,470
About six months ago, one of the regular SSC forum experts gave me this query to use to build a Time lookup table. It works great and I use it to join it to another table to create space holders for any minute of the day for which the other table doesn't have any values.
I'd like to understand the syntax of this borrowed query, not because anyone is asking me to explain it as the Time table exists and I can rebuild it anytime, but because it bothers me that I don't understand something I borrowed (I think from dwain) to accomplish a task:

;WITH e1(n) AS(
SELECT * FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))x(n)
),
e2(n) AS(
SELECT e1.n FROM e1, e1 x
),
e4(n) AS(
SELECT e2.n FROM e2, e2 x
),
e8(n) AS(
SELECT e4.n FROM e4, e4 x
),
cteTally(n) AS(
SELECT TOP 6307204 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1
FROM e8
),
Test(n) AS(
SELECT DATEADD( minute, n, DATEADD( YEAR, -2, GETDATE()))
FROM cteTally)
select DATEADD(mi,datediff(mi,0,n),0)as cte_start_date
INTO Sandbox.dbo.Time
FROM Test
WHERE n <= DATEADD( YEAR, 4, GETDATE())



I have two questions before I think I will understand:
(1) how do 'e1 x', 'e2 x', and 'e4 x' accomplish the task of ballooning the dataset in respective prior ctes by exponent of 10? This syntax is totally unfamiliar to me.
(2) what is 'ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1' accomplishing?
Post #1567381
Posted Sunday, May 4, 2014 7:46 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:11 AM
Points: 36,994, Visits: 31,513
KoldCoffee (5/4/2014)
I'd like to understand the syntax of this borrowed query, not because anyone is asking me to explain it as the Time table exists and I can rebuild it anytime, but because it bothers me that I don't understand something I borrowed (I think from dwain) to accomplish a task:


I wish there were more people like you. I'm working on your good questions now.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1567386
Posted Sunday, May 4, 2014 8:12 PM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Monday, August 25, 2014 11:12 AM
Points: 631, Visits: 1,470
ok, looking forward to it and thanks. I see it is not so easy.
Post #1567395
Posted Sunday, May 4, 2014 9:15 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 2:37 PM
Points: 23,292, Visits: 32,021
Here is a quick explanation using comments. Expect a much more detailed explanation from Jeff, but hopefully this helps get you closer to understanding what it does and how. Also, I did make a few minor changes.

One question up front, is this supposed to generate 6 years of values?


WITH
e1(n) AS (SELECT * FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))x(n)), -- generate a dynamic table with 10 rows
e2(n) AS (SELECT a.n FROM e1 a, e1 x), -- cross join e1 with itself to generate a dynamic table with 100 rows
e4(n) AS (SELECT a.n FROM e2 a, e2 x), -- cross join e2 with itself to generate a dynamic table with 10,000 rows
e8(n) AS (SELECT a.n FROM e4 a, e4 x), -- cross join e4 with itself to generate a dynamic table with 100,000,000 rows
cteTally(n) AS (SELECT TOP 6307204 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1 FROM e8),
-- select the first 6,307,204 rows of e8 and number them sequentially starting with 0 (the -1) to 6307203 (6307204 - 1)
Test(n) AS (SELECT DATEADD( minute, n, DATEADD( YEAR, -2, GETDATE())) FROM cteTally)
-- Starting with the current date and time minus 2 years, add n minutes to the datetime
select
DATEADD(mi,datediff(mi,0,n),0)as cte_start_date -- drop (or zero) the seconds from the date time value
INTO
Sandbox.dbo.Time
FROM
Test
WHERE
n <= DATEADD( YEAR, 4, GETDATE()) -- don't think this works as expected, see question above.





Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1567400
Posted Sunday, May 4, 2014 9:25 PM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:26 PM
Points: 1,731, Visits: 4,558
KoldCoffee (5/4/2014)

(2) what is 'ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1' accomplishing?

Leaving the explanation of the Cartesian Product to Jeff, here is a quick answer to the second question:

The ROW_NUMBER function generates a dense numerical sequence starting at 1. The " - 1" simply shifts it to start at 0 (zero).
It requires an ORDER BY within the OVER clause, "(SELECT NULL)" tells the server to ignore any order or rather, not to sort it.
Post #1567403
Posted Sunday, May 4, 2014 9:59 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:11 AM
Points: 36,994, Visits: 31,513
Lynn Pettis (5/4/2014)
Expect a much more detailed explanation from Jeff...


You know me all too well, ol' friend.

Ok... here we go, KoldCoffee...

No insult intended… I'm going to start at the very bottom because I don't know what you do or do not know and I want to make sure that you understand, especially since you asked.

First, what does this do?

 SELECT 1 
UNION ALL
SELECT 1
;

That's right. It returns two rows with a "1" on each row.

-----------
1
1

(2 row(s) affected)


The reason why it returned 2 rows instead of just one unique "1" is because a UNION ALL was used instead of just a UNION which has an implicit "distinct" I it.

The problem is, there's no title on the column for us to refer to the data by. Let's fix that right now. We could add the column name in all sorts of ugly ways but let's treat the query as if it were a table by adding it to the FROM clause of a new SELECT, giving it an alias as if it were a table, and then externally declare the column name. Like this…

 SELECT d.N
FROM (
SELECT 1
UNION ALL
SELECT 1
) d (N)
;

See the "d (N)" thingy? The "d" is an alias for the result set that we're treating as if it were a table and the "(N) defines what we want to call the column. Here's the result…

N
-----------
1
1

Now, fully functional queries that return their own result sets are called "derived tables" (that's why I used the "d" alias here… anything could have been used but "d" reminds me that it's a "derived table" for this demo) or "inline views". What's really cool as of 2005 is that we can move such things into a "top down" ordering to make them more readable, a little easier to use, and a bit more flexible for other things. We can move them into a structure known as a "Common Table Expression" or "CTE", which is also classified as a "derived table" and "inline view". We basically just have to move the query from the FROM clause to the CTE and then reference the CTE from the outer SELECT. Like this…

WITH
d (N) AS
(
SELECT 1
UNION ALL
SELECT 1
)
SELECT N
FROM d
;

The next thing to do is to add more rows. Let's start with just 10. We'll also use those rows in the outer query to start counting using ROW_NUMBER(). Like this…

WITH
d (N) AS
( --=== This returns 10 rows of all 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1
)
SELECT N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1
FROM d
;

Here's the result…

N
--------------------
0
1
2
3
4
5
6
7
8
9

A couple of things to notice here. Although ROW_NUMBER() requires an ORDER BY clause, we don't actually care what the ORDER BY is actually based on especially since all the rows from the "d" CTE have a "1" in them. So it's customary (and sometimes faster) to order by a constant. ROW_NUMBER() doesn't allow a constant in the ORDER BY so we trick it by embedding a SELECT statement that has the constant of NULL.

ROW_NUMBER() always starts at the number "1". If we want this sequence of 10 numbers to start at "0" instead of "1", one way to pull it off is to subtract 1 from every number that ROW_NUMBER() makes. That's what the purpose of the "-1" is (although I'd have done it in a different manner).

As a bit of a side bar, the 10 SELECT UNION ALLs can be replaced with a single VALUES function like that found in the code you posted on SQL SERVER 2008 and above. It doesn't make things any faster but it does cut down on the clutter in the query.

Time for another question.. What does this do?

  SELECT 1
FROM (SELECT 1 UNION ALL SELECT 1) d1 (N)
CROSS JOIN (SELECT 1 UNION ALL SELECT 1) d2 (N)
;

If we look at the code, it has two "derived tables" in it and each "derived table" has two rows in it. We're doing a CROSS JOIN (also known as a "Cartesian Product") between the two "derived tables". That means for every row in one of the "derived tables", we'll return all of the rows of the other "derived table". Think of it as us multiplying the number of rows in the "derived tables" to come up with a "Product" or answer to the multiplication problem.
Since there are 2 rows in each of the "derived tables", our result set will contain 2*2 or 4 rows of "1's". Like this…

-----------
1
1
1
1

If we had 10 rows in each "derived table", we would have ended up with 10*10 or 100 rows each containing a "1".

Note the we can use the NON-ANSI version of a CROSS JOIN, which is nothing more than listing the 2 derived tables in the FROM clause separated only by a comma and no criteria. Like this…

SELECT 1
FROM (SELECT 1 UNION ALL SELECT 1) d1 (N),
(SELECT 1 UNION ALL SELECT 1) d2 (N)
;

Shifting gears a bit, the "engineering notation" for "10" is 1*10^1. The "engineering notation" for "100" is 1*10^2. The exponent tells how many times "10" has been multiplied by itself.

Let's apply all of that to our CTE so we can count to 100 using CTEs.

WITH
E1(N) AS
( --=== This returns 10 rows of all 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b) --CROSS JOIN gives us 10*10 or 100 rows
SELECT N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1
FROM E2
;

Here, we change the name of the first CTE to represent the power of 10 that it produces. The shorthand for "engineering notation" for 1*10^1 is 1E1. We just dropped the leading 1 because it's superfluous for us.

The neat thing about CTEs is that, unlike "derived tables" in a FROM clause, we can refer to them in other "derived tables". So, our second CTE (E2), creates another "derived table" by doing a CROSS JOIN on the first (E1) "derived table" (CTE) against itself and that will return 100 rows. The "a" and "b" are just different aliases that don't mean anything but are necessary or SQL Server will complain about referencing the same "derived table" more than once.

Another name for one CTE referring to another CTE is called "Cascading CTEs". You probably won't find that particular name in a book anywhere. We made up that name right here on SSC.

The final SELECT was modified to take its input from the E2 CTE and returns the values of 0 to 99.

What if we only wanted to count to 53 and didn't want to change the code for either the E1 or E2 CTEs? That's where the TOP function comes in. Like this…

WITH
E1(N) AS
( --=== This returns 10 rows of all 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b) --Cross join gives us 10*10 or 100 rows
SELECT TOP (53) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1
FROM E2
;

Which 53 numbers will it return? The ORDER BY of the ROW_NUMBER() takes care of that. It'll return the numbers 1 through 53. Since we're subtracting 1 from each number, the final result set will be 0 through 52, in this case. We could use a variable instead of a hard-coded "53" to make this cascading, counting CTE even more useful.

Now… what if we wanted a really big number like the number 6307204 that was used in the original code you posted? The answer is that we just need more CROSS JOINs in the form of additional cascading CTEs. The really cool part is that if we multiply the number of rows in E2 (which is 100 rows) times itself, we don't just multiply by 10. No, we're multiplying by 100 now. A CROSS JOIN on E2 will give use E4 or 100*100 which is the same as 1*10^4 or 10000. Why E4? If you add exponents (E2+E2), you're actually multiplying. E2 is 100 with 2 zeros after a 1. 10000 is a 1 with 4 zeros after it. We just doubled the number of zeros.

Once we have E4, it's simple to see how to get to E8, which will return E4*E4 or 10000*10000 or a 1 with 8 zeros after it. That’s 100,000,000 or a 100 MILLION!

Here's the code for that. Looks almost identical to the original code.

WITH
E1(N) AS
( --=== This returns 10 rows of all 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --CROSS JOIN gives us 10*10 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --CROSS JOIN gives us 100*100 or 10000 rows
E8(N) AS (SELECT 1 FROM E4 a, E4 b) --CROSS JOIN gives us 10000*10000 or 100000000 rows
SELECT TOP (53) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1
FROM E8
;

Notice that each new CTE refers to the previous CTE in the FROM clause and that the final SELECT refers to the last CTE.

What the ROW_NUMBER() thing is doing is that it is using the "presence of rows" from the last CTE as a "cursor" or "loop". It's not even using the "1's" from the CTE's, just the "presence of rows". This is what R. Barry Young refers to as a "pseudo cursor". It uses the very, very fast machine language "loops" that SQL Server does behind the scenes instead of an explicit loop. ROW_NUMBER() simply counts by "1's" for each row that it encounters.

Like I said, there are some optimizations that could be made here. For one (no pun intended) we can UNION ALL an explicit "0" to make counting faster by not having to subtract 1 for each count. For larger numbers, this can add up to quite a time savings.

Another performance improvement is to limit the number of "nested loops" behind the scenes. We can do that simply by doing more CROSS JOINs in the E2 CTE and cascading that effect "down". Notice that the E2 CTE has been updated to 4 CROSS JOINs and renamed to be E4.

Like this…

WITH
E1(N) AS
( --=== This returns 10 rows of all 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1
),
E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d), -- 10*10*10*10 or 10000 rows
E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 10000*10000 or 100000000 rows
SELECT 0 UNION ALL --Removes the need to subtract 1 from every count
SELECT TOP (53-1) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E8
;

Last but certainly not least, the cascading CTEs are "cascading" in appearance only. They all become part of the overall query. What that means is that they don't count up to 100 Million (in this case) and then apply the TOP (). The TOP is expressed across ALL of the CTE's until it's satisfied as to the number of rows. If the TOP() was only "3", then only the first 3 SELECT UNION ALLs would execute. E4 and E8 would be present but they would do "0" rows.

Lemme know if you have any other questions about this wonderful tool first created by Itzik Ben-Gan and improved over time not only by Itizik, but by others on this forum.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1567405
Posted Monday, May 5, 2014 9:03 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, May 7, 2014 1:46 PM
Points: 40, Visits: 83
While it won't change the functionality of anything consider altering
SELECT e1.n FROM e1, e1 x

to

SELECT e1.n
FROM e1
CROSS JOIN e1 x

As this makes it more clear a Cartesian join is requested.
Post #1567555
Posted Monday, May 5, 2014 9:45 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Monday, August 25, 2014 11:12 AM
Points: 631, Visits: 1,470
OK folks, just so you know, I've been reading and re-reading these replies, especially Jeff Moden's to make sure everything is completely clear to me. Then, after that if I still have questions I just can't figure out I'll post. It's really wonderful you are here helping me.
At this point, the recurrent question lurking in my mind is 'why do we have to count from zero' instead of '1' referring to 'SELECT N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1
'...but sometimes these things become clear after the 5th reading etc. (think tortoise and the hare)
Post #1567574
Posted Monday, May 5, 2014 10:56 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 7:44 AM
Points: 12,910, Visits: 32,015
the starting at zero vs starting at one is because the function is adding time elements to generate your matrix of data
SELECT DATEADD( minute, n, DATEADD( YEAR, -2, GETDATE())) 

if it started at 1, your starting time would be 12:01 instead of 12:00.

you could still use a 1-based tally table to generate the desired data, but then you have to make the calculation take that extra, initial minute into consideration;

SELECT DATEADD( minute, n -1, DATEADD( YEAR, -2, GETDATE())) 


so in situations where you are fiddling with dates, it's prettier to manipulate the Tally/numbers you generate, instead of the functions using the generated values.


Lowell

--There is no spoon, and there's no default ORDER BY in sql server either.
Actually, Common Sense is so rare, it should be considered a Superpower. --my son
Post #1567628
Posted Monday, May 5, 2014 3:23 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 2:25 PM
Points: 2,044, Visits: 3,059
I suggest making your code as self-documenting as possible; then you need fewer additional comments. Naturally that means avoiding variable "names" like "E", "N" and so on. Example for this code below.

Of course you may prefer to adjust the CROSS JOINs for efficiency and/or adjust the names, but make the names meaningful. Because tally tables are quite common, I keep the first two names the same, but the final use of the tally table should have a column name that identifies specifically what the data is.

;WITH
cteDigits AS (
SELECT 0 AS digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally AS (
SELECT
[1000000s].digit * 1000000 + [100000s].digit * 100000 + [10000s].digit * 10000 +
[1000s].digit * 1000 + [100s].digit * 100 + [10s].digit * 10 + [1s].digit AS minutes_increment
FROM cteDigits [1s]
CROSS JOIN cteDigits [10s]
CROSS JOIN cteDigits [100s]
CROSS JOIN cteDigits [1000s]
CROSS JOIN cteDigits [10000s]
CROSS JOIN cteDigits [100000s]
CROSS JOIN cteDigits [1000000s]
WHERE
[1000000s].digit * 1000000 + [100000s].digit * 100000 + [10000s].digit * 10000 +
[1000s].digit * 1000 + [100s].digit * 100 + [10s].digit * 10 + [1s].digit
--limit rows generated to 6 yrs' worth of minutes
<= DATEDIFF(MINUTE, '19000101', '19060101')
),
cteTime_by_minute AS (
SELECT DATEADD(MINUTE, minutes_increment, DATEADD(YEAR, -2, DATEADD(MINUTE, DATEDIFF(MINUTE, 0, GETDATE()), 0)))
AS time_by_minute
FROM cteTally t
)
SELECT time_by_minute
INTO Sandbox.dbo.Time
FROM cteTime_by_minute


Edit: Added "INTO Sandbox.dbo.Time".


SQL DBA,SQL Server MVP('07, '08, '09)
"In America, every man is innocent until proven broke!" Brant Parker
Post #1567722
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse