Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Update Gaps Expand / Collapse
Author
Message
Posted Thursday, December 6, 2012 11:27 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:17 AM
Points: 36,800, Visits: 31,261
oralinque (12/6/2012)
dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!

Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.

You two were quick too!

Thanks again!


The "quiick" comes from you making it so easy. Thank you again for your comments. I wish everyone would figure that out.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1393871
Posted Friday, December 7, 2012 12:29 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 5:45 PM
Points: 3,617, Visits: 5,237
Jeff Moden (12/6/2012)
oralinque (12/6/2012)
dwain.c & Jeff Moden, your solutions worked great! Thank you both so much!

Jeff - "Data smears" sounds good to me. Regarding the "clear requirements and readily consumable data" comment, I think it's just common courtesy. I mean, if you're asking people to help you, why not make it easier for them to help you.

You two were quick too!

Thanks again!


The "quiick" comes from you making it so easy. Thank you again for your comments. I wish everyone would figure that out.


+1 to that!



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1393896
Posted Friday, December 7, 2012 2:12 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 3:30 AM
Points: 2,422, Visits: 7,444
Jeff Moden (12/6/2012)
I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.


I thought I was the only one that referred to this as a smear or smudge (I've used both to describe this type of solution in the past ). Now you have me wondering if I picked up the term from you!



Not a DBA, just trying to learn

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better, quicker answers on SQL Server performance related questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/



If you litter your database queries with nolock query hints, are you aware of the side effects?
Try reading a few of these links...

(*) Missing rows with nolock
(*) Allocation order scans with nolock
(*) Consistency issues with nolock
(*) Transient Corruption Errors in SQL Server error log caused by nolock
(*) Dirty reads, read errors, reading rows twice and missing rows with nolock


LinkedIn | Blog coming soon (for sufficiently large values of "soon" )!
Post #1393927
Posted Friday, December 7, 2012 8:42 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, July 15, 2013 2:52 PM
Points: 89, Visits: 385
Jeff Moden (12/6/2012)
dwain.c (12/6/2012)
Jeff Moden (12/6/2012)

I don't believe the following won't be quite as fast as a Quirky Update...


Easy for you to say...


Heh... suffering from a NEC (Not Enough Coffee) problem on that one, for sure. I meant to say that "I don't believe that the following will be quite as fast as a Quirky Update" but got my tangle all toungled up.


Ha! I don't care how you speak as long as you're fluent in T-SQL.

Thanks to everyone who replied. I've added a few new tools to my toolbox.
Post #1394088
Posted Friday, December 7, 2012 11:35 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 7:51 AM
Points: 1,945, Visits: 2,864
>> Here's the SQL to create the table and populate it with data. <<

Do you really want to do FLOAT? How will you handle the rounding errors and slowness of floating point math without special hardware? The data element names are vague; pressure of what? NULL is the right choice for a missing reading because zero is a real value for reading on a busted meter, an empty pipe or whatever. I will guess that two decimal is good enough, rather than 1.79E+308 you have now. You also need to start using the ANSI/ISO Standard INSERT INTO syntax. Here is my re-write:

DROP TABLE Machine_Readings;

CREATE TABLE Machine_Readings
(machine_id INTEGER NOT NULL,
sample_date DATE NOT NULL,
PRIMARY KEY (machine_id, sample_date),
something_pressure DECIMAL (8,2) DEFAULT 0
CHECK (something_pressure >= 0.00));

INSERT INTO Machine_Readings
VALUES (60, '2012-10-02', 0.00), (60, '2012-10-03', 300.00),
(60, '2012-10-04', 300.00), (60, '2012-10-05', NULL),
(60, '2012-10-06', NULL), (60, '2012-10-07', NULL),
(60, '2012-10-08', 330.00), (60, '2012-10-09', NULL),
(60, '2012-10-10', 300.00), (97, '2012-10-02', 380.00),
(97, '2012-10-03', NULL), (97, '2012-10-04', NULL),
(97, '2012-10-05', 350.00), (97, '2012-10-06', 350.00),
(97, '2012-10-07', NULL), (97, '2012-10-08', NULL),
(97, '2012-10-09', 400.00), (97, '2012-10-10', NULL);

>> What I need to do is fill in the zero values with the value that precedes it (grouping by machine_id). Non-zero values do not change. <<

SELECT MR1.machine_id, MR1.sample_date,
COALESCE (MR1.something_pressure,
(SELECT MAX (something_pressure)
FROM Machine_Readings AS MR2
WHERE MR1.machine_id = MR2.machine_id
AND MR2.sample_date <= MR1.sample_date
AND MR2.something_pressure IS NOT NULL))
AS something_pressure
FROM Machine_Readings AS MR1;

The idea is that we look at each row. If its pressure reading IS NOT NULL, then use for this row. If its pressure reading IS NULL, then use a scalar subquery to find the most recent prior pressure reading for this machine at a row where its pressure reading IS NOT NULL.

I did another version of this recently, but I cannot find it right now.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1394180
Posted Friday, December 7, 2012 12:14 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:17 AM
Points: 36,800, Visits: 31,261
Cadavre (12/7/2012)
Jeff Moden (12/6/2012)
I don't know what others call this type of problem but I call them "data smears" because it's like you smearing strips of color down from one wet paint spot to another.


I thought I was the only one that referred to this as a smear or smudge (I've used both to describe this type of solution in the past ). Now you have me wondering if I picked up the term from you!


Dunno... I've been using the term on these forums for a long time. Might simply be that great minds think alike.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1394193
Posted Friday, December 7, 2012 9:15 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, July 15, 2013 2:52 PM
Points: 89, Visits: 385
CELKO (12/7/2012)
>> Here's the SQL to create the table and populate it with data. <<

Do you really want to do FLOAT? How will you handle the rounding errors and slowness of floating point math without special hardware? The data element names are vague; pressure of what? NULL is the right choice for a missing reading because zero is a real value for reading on a busted meter, an empty pipe or whatever. I will guess that two decimal is good enough, rather than 1.79E+308 you have now. You also need to start using the ANSI/ISO Standard INSERT INTO syntax.


This is an inherited database, and I am not allowed to change the schema except to add new objects. There are many things I would LOVE to change. The zero values are a result of a bulk load, and not real-time pressure values. The column names are not the actual schema column names, and I don't think it matters for what I was needing help with. The pressure column does allow for NULL values, however, there is a default value of 0. I assume when they loaded the data, they must not have explicitly populated that column.

Regarding your ANSI/ISO INSERT INTO comment, unless I'm mistaken, your suggestion only works on 2008 or higher. The DATE data type is also not a valid 2005 data type. Most people here are probably on 2012 at least for testing purposes, but I wanted to make it as generic as possible.

Thanks for your suggestions.
Post #1394300
Posted Saturday, December 8, 2012 9:13 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:17 AM
Points: 36,800, Visits: 31,261
Sam S Kolli (12/6/2012)
Let me say before hand that Jeff Moden's solution is better than this for the specific task you asked for; (on my laptop, my solution is 85% of the cost, whereas jeff's solution is 15% of the cost)


Oh... we have to be careful here, Sam. My query is based on a "Triangular Join" and it relies on just the right index or it can be hundreds and, sometimes, thousands of times slower and more resource intensive than a While Loop. It turns out that this solution worked perfectly because of the indexing and data available in the columns but such is not always the case.

Also, a word of caution. The costs you find in execution plans, especially the % of batch cost are fickle lovers and they can lie like you wouldn't believe. While such costs are helpful in finding potential problems, they should never be used to make the "final decision".

For example, here's some very simple code to build 30 years worth of dates using two different methods. The details, of course, are in the comments in the code.

/****************************************************************************************
Purpose:
This code demonstrates that the estimated and actual execution plans in SQL Server can
be 100% INCORRECT and that the execution plan should only be relied on to provide hints
as to what may be wrong with a query rather than an absolute indication. This code runs
in SQL Server 2005 only.

The code creates 30 years worth of dates starting with 2000-01-01 using two different
methods. The first method uses a recursive CTE and the second method uses a "Tally"
table. The output of each method is directed to a "throw-away" variable to take
display delays out of the picture.

Please check both the actual and estimated execution plans and compare the % of batch.

Please see the following article on how to build a Tally table and how they can be used
to replace certain While Loops.
http://www.sqlservercentral.com/articles/T-SQL/62867/
****************************************************************************************/
SET NOCOUNT ON
--=======================================================================================
-- Recursive method shown by (Name with-held)
--=======================================================================================
PRINT '========== Recursive method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @BitBucket DATETIME --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================
DECLARE @DateVal DATETIME
SET @DateVal = '2000-01-01'

;with mycte as
(
select @DateVal AS DateVal
union all
select DateVal + 1
from mycte
where DateVal + 1 < DATEADD(yy, 30, @DateVal)
)
select @BitBucket = d.dateval
from mycte d
OPTION (MAXRECURSION 0)

--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
GO

--=======================================================================================
-- Tally table method by Jeff Moden
--=======================================================================================
PRINT '========== Tally table method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @BitBucket DATETIME --Holds display output so display times aren't measured.

--===== Execute the code being tested ===================================================
DECLARE @StartDate AS DATETIME
SET @StartDate = '2000-01-01'

SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,30,@StartDate)))
@BitBucket = @StartDate-1+t.N
FROM dbo.Tally t
ORDER BY N

--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
GO

Here are the execution plans for the code. Notice that the % of batch is 0% for the Recursive CTE and its 100% for the Tally Table method. One would look at this and insist that Recursive CTEs are much better than the Tally Table method for doing such a thing.



But, if we look at the performance of the code, we can see that the % of Batch is 100% incorrect. The Tally Table code is actually MUCH faster than the Recursive CTE.

========== Recursive method ==========

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 2, logical reads 65749, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 391 ms, elapsed time = 449 ms.
==========================================================================================
========== Tally table method ==========

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Tally'. Scan count 1, logical reads 20, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 27 ms.
==========================================================================================


Try your code and measure it for performance. Also, be careful about using SET STATISTICS. It can also lie (not in this case, though). See the following article about how SET STATISTICS can lie about performance.
http://www.sqlservercentral.com/articles/T-SQL/91724/


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems


  Post Attachments 
ExecutionPlanLies.gif (66 views, 20.68 KB)
Post #1394374
Posted Sunday, December 9, 2012 12:34 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 5:09 PM
Points: 23,089, Visits: 31,635
Modified my code to update the table:


CREATE TABLE dbo.UpdateGaps(
MachineID int NOT NULL,
RecordedDate datetime NOT NULL,
Pressure float NULL,
CONSTRAINT PK_UpdateGaps PRIMARY KEY CLUSTERED (MachineID,RecordedDate)
)

INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-02', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-03', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-04', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-05', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-06', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-07', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-08', 330);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-09', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (60, '2012-10-10', 300);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-02', 380);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-03', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-04', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-05', 350);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-06', 350);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-07', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-08', 0);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-09', 400);
INSERT dbo.UpdateGaps (MachineID, RecordedDate, Pressure) VALUES (97, '2012-10-10', 0);
GO
select * from dbo.UpdateGaps;
go

with Base0Data as (
select
MachineID,
RecordedDate,
Pressure,
rn = row_number() over (partition by MachineID, case when Pressure = 0 then 0 else 1 end order by RecordedDate)
from
dbo.UpdateGaps
)
,BaseData as (
select
MachineID,
RecordedDate,
Pressure,
GrpDate = dateadd(dd,-rn,RecordedDate),
rn,
rn1 = row_number() over (partition by MachineID, dateadd(dd,-rn,RecordedDate) order by dateadd(dd,-rn,RecordedDate))
from
Base0Data
)
--select * from BaseData where Pressure = 0
--select
-- bd1.MachineID,
-- bd1.RecordedDate,
-- Pressure = case when bd1.Pressure = 0 then coalesce(bd2.Pressure, bd1.Pressure) else bd1.Pressure end
update up set
Pressure = case when bd1.Pressure = 0 then coalesce(bd2.Pressure, bd1.Pressure) else bd1.Pressure end
from
dbo.UpdateGaps up
inner join BaseData bd1
on up.MachineID = bd1.MachineID and up.RecordedDate = bd1.RecordedDate
left outer join BaseData bd2
on (bd1.MachineID = bd2.MachineID
and dateadd(dd, -bd1.rn1, bd1.RecordedDate) = bd2.RecordedDate);
go

select * from dbo.UpdateGaps;
go

drop table dbo.UpdateGaps;
go





Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1394383
Posted Sunday, December 9, 2012 11:09 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 7:51 AM
Points: 1,945, Visits: 2,864
Regarding your ANSI/ISO INSERT INTO comment, unless I'm mistaken, your suggestion only works on 2008 or higher. The DATE data type is also not a valid 2005 data type. Most people here are probably on 2012 at least for testing purposes, but I wanted to make it as generic as possible.


Yep, but the DML is valid back to SQL-92 Standards.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #1394405
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse