Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Unique runs Expand / Collapse
Author
Message
Posted Wednesday, August 29, 2012 7:45 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, December 20, 2012 1:03 PM
Points: 265, Visits: 589
I am looking for a query that will produce a list of unique runs for the following data, where EvDate is unique and only one EID is associated with a date. The following is a sample from the original data set, which as about 300K records.


CREATE TABLE tempTable (EvDate date NOT NULL, EID int NOT NULL,
CONSTRAINT PKtemp PRIMARY KEY CLUSTERED (EvDate ASC, EID ASC))
GO
CREATE UNIQUE NONCLUSTERED INDEX IXtemp ON tempTable (EvDate DESC)
GO

INSERT INTO tempTable (EvDate,EID)
SELECT '02/27/2012',1
UNION ALL
SELECT '03/05/2012',4
UNION ALL
SELECT '03/12/2012',5
UNION ALL
SELECT '03/19/2012',3
UNION ALL
SELECT '03/26/2012',2
UNION ALL
SELECT '04/02/2012',7
UNION ALL
SELECT '04/09/2012',1
UNION ALL
SELECT '04/16/2012',4
UNION ALL
SELECT '04/23/2012',9
UNION ALL
SELECT '04/30/2012',8
UNION ALL
SELECT '05/07/2012',7
UNION ALL
SELECT '05/14/2012',5
UNION ALL
SELECT '05/21/2012',3
UNION ALL
SELECT '05/28/2012',2
UNION ALL
SELECT '05/30/2012',2
UNION ALL
SELECT '06/04/2012',2
UNION ALL
SELECT '06/11/2012',4
UNION ALL
SELECT '06/18/2012',5
UNION ALL
SELECT '06/25/2012',6
UNION ALL
SELECT '07/02/2012',1
UNION ALL
SELECT '07/09/2012',3
UNION ALL
SELECT '07/16/2012',8
UNION ALL
SELECT '07/23/2012',3
UNION ALL
SELECT '07/30/2012',6
UNION ALL
SELECT '08/06/2012',5
UNION ALL
SELECT '08/13/2012',4
UNION ALL
SELECT '08/20/2012',1
UNION ALL
SELECT '08/27/2012',1


Is there a way I can find a count of the number of unique EIDs in a row, summarized by start and end dates as shown below.

UniqueRuns is the count of consecutively different EIDs. From 2/27/2012 to 4/02/2012 every EID is unique for those 6 dates. On 4/9/2012, the EID is 1, which is a repeat from 2/27/2012. So the count resets for the next unique run. That run ends after 8 dates on 5/28/2012. New run begins on 5/30/2012, but ends on that day becuase the same EID=2 is for the next date. A new run begins. So on.

I want to find the longest unique runs in a format as shown below. Ties are sorted by EndDate.

StartDate	EndDate  	UniqueRuns
04/09/2012 05/28/2012 8
06/04/2012 07/16/2012 7
02/27/2012 04/02/2012 6
07/23/2012 08/20/2012 5
05/30/12 05/30/12 1
08/27/2012 08/27/2012 1

Unlike the traditional islands and gaps problems based on dates, here it's based on the uniqueness of the EIDs.

I'm stumped as to how I can convert this into a traditional islands-and-gaps problem. I've not looked at any of the new T-SQL functions in SQL Server 2012 for something like this.

Thanks for any insights.
Post #1351978
Posted Wednesday, August 29, 2012 11:13 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 12:45 AM
Points: 3,614, Visits: 5,227
First of all, and a good thing for you because I suck at them, I don't think this is a gaps and islands problem. It looks more like a cumulative totals problem, so I propose it be handled with a quirky update. You'll need to add a helper column to your table and reverse the primary key/index clustering.

CREATE TABLE #tempTable (EvDate date NOT NULL, EID int NOT NULL
,helpergroupno INT,
CONSTRAINT PKtemp PRIMARY KEY NONCLUSTERED (EvDate ASC, EID ASC))

CREATE UNIQUE CLUSTERED INDEX IXtemp ON #tempTable (EvDate DESC )


Now add in your setup data and then run this:

DECLARE @tempstr VARCHAR(100) = ''
,@tempno INT = 1

UPDATE #tempTable
SET @tempstr = CASE
WHEN CHARINDEX('[' + CAST(EID AS VARCHAR) + ']', @tempstr) = 0
THEN @tempstr + '[' + CAST(EID AS VARCHAR) + ']'
ELSE '[' + CAST(EID AS VARCHAR) + ']' END
,@tempno = helpergroupno = CASE
WHEN @tempstr = '[' + CAST(EID AS VARCHAR) + ']'
THEN @tempno + 1
ELSE @tempno END
OPTION(MAXDOP 1)

SELECT startdate=MIN(EvDate), enddate=MAX(EvDate), [Count of EIDs]=COUNT(*)
FROM #tempTable
GROUP BY helpergroupno

DROP TABLE #tempTable


Results are:

startdate	enddate		Count of EIDs
2012-02-27 2012-04-02 6
2012-04-09 2012-05-28 8
2012-05-30 2012-05-30 1
2012-06-04 2012-07-16 7
2012-07-23 2012-08-20 5
2012-08-27 2012-08-27 1



Please advise if this is correct.

Edit: Corrected the code above to handle EID > 9 and added this explanation:

I realized that my suggestion that this is a cumulative sums problem may be a bit enigmatic, so let me say that it is except that instead of accumulating totals you're accumulating events.

If you're unclear as to how it works you can do this:
1. Add a second (VARCHAR(100)) helper column and dump the current value of @tempstr into it as I have done with @tempno in the row that follows.
2. SELECT * from the table without grouping.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1352005
Posted Thursday, August 30, 2012 1:28 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, December 20, 2012 1:03 PM
Points: 265, Visits: 589
Dwain, works perfectly. Thanks very very much.

First off, I never thought about quirky update though I used it before.

Second, your code ran in 5.1 seconds for 300K records on SQL 2012. Impressive.

--S.J.
Post #1352037
Posted Thursday, August 30, 2012 4:38 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, July 4, 2014 3:55 AM
Points: 2,836, Visits: 5,062

...
...
,@tempno = helpergroupno = CASE


...


Please note, the "three-parts" SET is not reliable. You better to split it to two of "two-part" SET's.


_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1352091
Posted Thursday, August 30, 2012 4:56 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 12:45 AM
Points: 3,614, Visits: 5,227
sjsubscribe (8/30/2012)
Dwain, works perfectly. Thanks very very much.

First off, I never thought about quirky update though I used it before.

Second, your code ran in 5.1 seconds for 300K records on SQL 2012. Impressive.

--S.J.


SJ - Glad it worked for you.

I must say, you led me down the garden path suggesting it was gaps and islands. Maybe it is but I couldn't figure it from that perspective. Once I realized QU would work, it didn't take long to get it working.

Eugene - I'm not sure I understand what you mean. I mean I get the part saying that the 3 part SET might not be reliable. But can you demonstrate a working solution using a 2 part SET?



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1352103
Posted Thursday, August 30, 2012 5:01 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, July 4, 2014 3:55 AM
Points: 2,836, Visits: 5,062
Search for "OMG" In J.M. article:
http://www.sqlservercentral.com/articles/T-SQL/68467/
It does explain the problem in details and gives good example.





_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1352110
Posted Thursday, August 30, 2012 6:00 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 12:45 AM
Points: 3,614, Visits: 5,227
Eugene Elutin (8/30/2012)
Search for "OMG" In J.M. article:
http://www.sqlservercentral.com/articles/T-SQL/68467/
It does explain the problem in details and gives good example.



Ah yes, the bible on this subject. It appears Jeff has revised it since the last time I read it. New rules including the part about the 2 part update. So I'll need to brush up.

For now though, it seems to have worked for SJ but s/he should probably take a look at the article also and at least add the TABLOCK hint if this is being done in something other than a temp table (refer to the OMG and Rules section) and it is not just a one off requirement.

Thanks for letting me know about this Eugene!



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1352130
Posted Thursday, August 30, 2012 6:39 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, July 4, 2014 3:55 AM
Points: 2,836, Visits: 5,062
No probs.
Actually it was never safe to use three part set in quirky update.
If your read BoL abnout UPDATE you will find that:


Setting Variables and Columns

Variable names can be used in UPDATE statements to show the old and new values affected, but this should be used only when the UPDATE statement affects a single record. If the UPDATE statement affects multiple records, to return the old and new values for each record, use the OUTPUT clause.


http://msdn.microsoft.com/en-us/library/ms177523(v=sql.90).aspx
Unfortunately, explicit explanation of the above behaviour/restriction was removed from BoL for SQL2008 and above, but this behaviour/restriction is still in place.

So, as you can see it's quite expected that it will not work for quirky update.
I never used "three-part" SET in quirky update myself due to the simple reason: Quirky update method is older than "three part" SET" feature of SQL Server (I think it was introduced in SQL2K), so I always used two set lines with standard two-parts set's.





_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1352157
Posted Thursday, August 30, 2012 4:55 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 7:30 PM
Points: 36,766, Visits: 31,222
Eugene Elutin (8/30/2012)

...
...
,@tempno = helpergroupno = CASE


...


Please note, the "three-parts" SET is not reliable. You better to split it to two of "two-part" SET's.


That's absolutely not true. It's a documented feature and it works just fine. What may not be "reliable" is people's understanding of it thanks to the terrible explanation that MS included in Books Online.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1352528
Posted Thursday, August 30, 2012 5:08 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 7:30 PM
Points: 36,766, Visits: 31,222
To clarify, I'm just taking exception to when people say that the 3 part update isn't "reliable" and it b oils down to what the semantics of what someone considers the word "unreliable" to actually mean.

To me, "unreliable" means that it should never be trusted to work. As I said throughout the article on running totals, one of the rules is that you first have to make it all work. Once you get it working, then it will work reliably.

It's like taking into consideration the problems that people have documented about the relatively new MERGE statement. It sometimes does strange things that cause the code to not work as expected. Does that make it "unreliable"? Perhaps. I like to think of it as it being very reliable in giving you problems if you use it a certain way and very reliable in giving you correct answers if you use it in another.

To quote the article...
It's just one option that happens to work well for running totals and running counts.


Interpret that as meaning that the 3 part update is reliable for "normal" Quirky Update running totals and counts. If you do something strange with it, then it will be "unreliable".


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1352533
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse