July 26, 2018 at 1:35 pm
First off thanks to all those who share their expertise on this forum. Apology #1: I am really in SQL Server 2005. An upgrade is in the works so a more preferred solution in 2016 would be a bonus. Apology #2: I have to obfuscate my data so much that it becomes very generic but hopefully I have fully demonstrated the problem.
This is really a two-part question. I have a vendor who provides data on a daily basis about certain entities. I take this data and follow business rules to generate my own calculated value.
1) If the vendor fails to provide a data point on the most recent day I want to use the most recent non-null value for that entity. Any of my datapoints can be null and this most recent non-null value logic needs to apply to any of them(DataPoint1,DataPoint2, and DataPoint3). This article seems to be state of the art but I don't believe it will not work in 2005. https://www.itprotoday.com/software-development/last-non-null-puzzle
2) Given that the value of my calculation can be sourced from any of the datapoints on any of the dates I want to be able to track which data point provided the value and from which day.
Thanks if you can help.IF OBJECT_ID('dbo.SourceData', 'U') IS NOT NULL DROP TABLE dbo.SourceData;
CREATE TABLE dbo.SourceData
(
    EntityID INT,
    ReportingDate DATETIME,
    EntityType CHAR(1),
    DataPoint1 DECIMAL(5,1), 
    DataPoint2 DECIMAL(5,1), 
    DataPoint3 DECIMAL(5,1) 
)
INSERT INTO dbo.SourceData VALUES (1,'2018-07-01','A',1.3,2.0,3.0)
INSERT INTO dbo.SourceData VALUES (2,'2018-07-01','B',3,3,3)
INSERT INTO dbo.SourceData VALUES (3,'2018-07-01','C',2.1,2.3,2.7)
INSERT INTO dbo.SourceData VALUES (4,'2018-07-01','C',2.6,2.3,2.7)
INSERT INTO dbo.SourceData VALUES (5,'2018-07-01','C',2.6,2.6,2.7)
INSERT INTO dbo.SourceData VALUES (1,'2018-07-02','A',NULL,2.0,3.0)
INSERT INTO dbo.SourceData VALUES (2,'2018-07-02','B',3,3,3)
INSERT INTO dbo.SourceData VALUES (3,'2018-07-02','C',2.1,2.3,2.7)
INSERT INTO dbo.SourceData VALUES (4,'2018-07-02','C',2.6,2.3,2.7)
INSERT INTO dbo.SourceData VALUES (5,'2018-07-02','C',2.6,2.6,NULL)
--This demonstrates my calculation but the expected results should have one record per EntityID
SELECT 
CASE EntityType 
WHEN 'A' THEN DataPoint1 
WHEN 'B' THEN COALESCE(DataPoint1,DataPoint2) 
ELSE    
    CASE WHEN DataPoint1 > DataPoint2 THEN DataPoint1 ELSE DataPoint3 END
END
AS MyCalculation, 
* 
FROM dbo.SourceData
--Expected results
SELECT '1.3' AS MyCalculation,'1' AS EntityID, '2018-07-01' AS SourceDate,'DataPoint1' AS SourceDatapoint UNION ALL
SELECT '3.0'AS MyCalculation,'2' AS EntityID,'2018-07-02' AS SourceDate,'DataPoint1' AS SourceDatapoint UNION ALL
SELECT '2.7'AS MyCalculation,'3' AS EntityID,'2018-07-02' AS SourceDate,'DataPoint3' AS SourceDatapoint UNION ALL
SELECT '2.6'AS MyCalculation,'4' AS EntityID,'2018-07-02' AS SourceDate,'DataPoint1' AS SourceDatapoint UNION ALL
SELECT '2.7'AS MyCalculation,'5' AS EntityID,'2018-07-01' AS SourceDate,'DataPoint3' AS SourceDatapoint 
July 26, 2018 at 2:04 pm
Probably something like this?
WITH CTE AS(
  SELECT 
   CASE 
    WHEN EntityType = 'A' THEN DataPoint1 
    WHEN EntityType = 'B' THEN COALESCE(DataPoint1,DataPoint2) 
    WHEN DataPoint1 > DataPoint2 THEN DataPoint1 
    ELSE DataPoint3 
   END
   AS MyCalculation, 
   CASE 
    WHEN EntityType = 'A' THEN 'DataPoint1' 
    WHEN EntityType = 'B' AND Datapoint1 IS NOT NULL THEN 'DataPoint1'
    WHEN EntityType = 'B' AND Datapoint1 IS NULL THEN 'DataPoint2'
    WHEN DataPoint1 > DataPoint2 THEN 'DataPoint1'
    ELSE 'DataPoint3' 
   END AS SourceDataPoint,
   ROW_NUMBER() OVER(PARTITION BY EntityID ORDER BY ReportingDate DESC) AS rn,
   EntityID,
   ReportingDate AS SourceDate
  FROM dbo.SourceData
  WHERE CASE 
    WHEN EntityType = 'A' THEN DataPoint1 
    WHEN EntityType = 'B' THEN COALESCE(DataPoint1,DataPoint2) 
    WHEN DataPoint1 > DataPoint2 THEN DataPoint1 
    ELSE DataPoint3 
   END IS NOT NULL
)
SELECT CTE.MyCalculation,
  CTE.EntityID,
  CTE.SourceDate,
  CTE.SourceDataPoint
FROM CTE 
WHERE CTE.rn = 1;
July 26, 2018 at 4:20 pm
Very nice, thank you. Putting aside the most recent non-null record issue I started toying with the below. The advantage being that I don't have too have a second expanded calculation which may simplify matters if the logic gets deep. Definitely kludgey since I don't have a ways to distribute unique identifiers across these 3 fields: DataPoint1ID, DataPoint2ID and DataPoint3ID.
I expect I'll go with your solution but I thought I'd share my thought experiment. Thanks again.
IF OBJECT_ID('dbo.SourceData', 'U') IS NOT NULL DROP TABLE dbo.SourceData;
CREATE TABLE dbo.SourceData
(
    EntityID INT,
    ReportingDate DATETIME,
    EntityType CHAR(1),
    DataPoint1 DECIMAL(5,1), 
    DataPoint1ID INT, 
    DataPoint2 DECIMAL(5,1), 
    DataPoint2ID INT, 
    DataPoint3 DECIMAL(5,1) , 
    DataPoint3ID INT 
)
INSERT INTO dbo.SourceData VALUES (1,'2018-07-01','A',1.3,1,2.0,11,3.0,21)
INSERT INTO dbo.SourceData VALUES (2,'2018-07-01','B',3,2,3,12,3,22)
INSERT INTO dbo.SourceData VALUES (3,'2018-07-01','C',2.1,3,2.3,13,2.7,23)
INSERT INTO dbo.SourceData VALUES (4,'2018-07-01','C',2.6,4,2.3,14,2.7,24)
INSERT INTO dbo.SourceData VALUES (5,'2018-07-01','C',2.6,5,2.6,15,2.7,25)
--Luis's solution
 SELECT 
  CASE 
  WHEN EntityType = 'A' THEN DataPoint1 
  WHEN EntityType = 'B' THEN COALESCE(DataPoint1,DataPoint2) 
  WHEN DataPoint1 > DataPoint2 THEN DataPoint1 
  ELSE DataPoint3 
  END
  AS MyCalculation, 
  CASE 
  WHEN EntityType = 'A' THEN 'DataPoint1' 
  WHEN EntityType = 'B' AND Datapoint1 IS NOT NULL THEN 'DataPoint1'
  WHEN EntityType = 'B' AND Datapoint1 IS NULL THEN 'DataPoint2'
  WHEN DataPoint1 > DataPoint2 THEN 'DataPoint1'
  ELSE 'DataPoint3' 
  END AS SourceDataPoint,
  EntityID,
  ReportingDate AS SourceDate
 FROM dbo.SourceData
 WHERE CASE 
  WHEN EntityType = 'A' THEN DataPoint1 
  WHEN EntityType = 'B' THEN COALESCE(DataPoint1,DataPoint2) 
  WHEN DataPoint1 > DataPoint2 THEN DataPoint1 
  ELSE DataPoint3 
  END IS NOT NULL
IF OBJECT_ID('tempdb..#DataPointSourceIDS') IS NOT NULL DROP TABLE #DataPointSourceIDS
IF OBJECT_ID('tempdb..#SourceIDS') IS NOT NULL DROP TABLE #SourceIDS
SELECT 
*
INTO #DataPointSourceIDS
FROM
(
    SELECT 
            Datapoint1 AS DataPoint,
            DataPoint1ID AS DataPointID,
            ReportingDate AS SourceDate,
            'Datapoint1' AS SourceDatePoint
        FROM dbo.SourceData
        UNION ALL 
        SELECT 
                Datapoint2,
                DataPoint2ID,
            ReportingDate AS SourceDate,
            'Datapoint2' AS SourceDatePoint
        FROM dbo.SourceData 
        UNION ALL 
        SELECT
            Datapoint3,
            DataPoint3ID,
            ReportingDate AS SourceDate,
            'Datapoint3' AS SourceDatePoint
        FROM dbo.SourceData 
) DPSIDS
SELECT 
CASE EntityType 
WHEN 'A' THEN DataPoint1ID 
WHEN 'B' THEN COALESCE(DataPoint1ID,DataPoint2ID) 
ELSE    
    CASE WHEN DataPoint1 > DataPoint2 THEN DataPoint1ID ELSE DataPoint3ID END
END
AS MyCalculationID,
*
INTO #SourceIDS
FROM dbo.SourceData 
SELECT 
--SID.MyCalculationID,
(SELECT DataPoint FROM #DataPointSourceIDS WHERE DataPointID = SID.MyCalculationID) AS MyCalculation,
(SELECT SourceDatePoint FROM #DataPointSourceIDS WHERE DataPointID = SID.MyCalculationID) AS SourceDataPoint,
EntityID,
(SELECT SourceDate FROM #DataPointSourceIDS WHERE DataPointID = SID.MyCalculationID) AS SourceDate
 FROM #SourceIDS SID
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply