March 1, 2016 at 3:21 pm
Performance is my over-riding concern, as I have a heavily recursive CTE that is the engine room of my application. This CTE may have to work on 4 columns of up to 7 million rows. Queries can take over 30 minutes with just 1.2 million rows.
To reduce the waiting time of most users, I save minimalist 2 column tables that capture the results of these long-running CTE recursion queries into a separate "cache" database. Thus 1000s of these two column tables, some that may contain nearly 7 million rows collect in this database. I have to maintain concurrency with the base data in the CTE specific database as new records are added. At the moment I keep and upkeep the same data in both databases.
Is there a better way to do all of this? Should I just keep all the data in one database? If so, how will my long-running CTEs interact with lots of interruptions, and won't all users then have to wait more, as I would suspect (based on what I know about queueing theory)? Clearly, the caching is a major pain, especially since the content must often be updated when new individuals are added to the records. Ideally, the CTE would just run a whole lot faster.
Recursive CTE:
USE [relationship]
GO
/****** Object: StoredProcedure [home].[proportion] Script Date: 2/24/2016 9:51:22 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: Me
-- Create date: today
-- Description: cte with normalized tbls
-- =============================================
ALTER PROCEDURE [home].[proportion]
-- Add the parameters for the stored procedure here
@id int
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Insert statements for procedure here
DELETE FROM relationship.home.persistedTemp WHERE originator = @id;
WITH PctCTE(id, percent, kind)
AS
(SELECT individual, CONVERT(DECIMAL(28,25),100.0) AS percent, model
FROM relationship.home.relationshipRecords constituent
WHERE constituent.individual = @id
UNION ALL
SELECT derivation.individual, CONVERT(DECIMAL(28,25),constituent.percent/2.0), derivation.model
FROM relationship.home.relationshipRecords AS derivation
INNER JOIN PctCTE AS constituent
ON (constituent.kind = 'M' AND
(derivation.inputKindB = constituent.id))
OR
(NOT constituent.kind = 'M' AND
(derivation.inputKindA = constituent.id))
),
mergeCTE(i, p)
AS
(SELECT id, SUM(percent)
FROM PctCTE
GROUP BY id
)
INSERT INTO relationship.home.persistedTemp
SELECT @id, tmp.i, tmp.p, w.yr
FROM mergeCTE AS tmp
LEFT JOIN relationship.home.beginning w
ON tmp.i = w.individual;
DELETE FROM relationship.home.persistedTemp WHERE originator = @id AND i = @id
END
Note: inputKindA may be used to create many individuals (1000s), inputKindB may be used to create less individuals (up to 20), and both often create no individuals.
The database associated with the CTE is called relationship and contains the following tables:
myids
record_id char(6) PK FK
reg nchar(10) PK
name nvarchar(60)
individual int FK U1
main int
FK_myids_names
IX_myids_1 individual(ASC) Unique
PK_myids record_id (ASC), reg (ASC) Unique Clustered
names
individual int PK {I}
name nvarchar(60)
PK_names individual (ASC) Unique Clustered
relationshipRecords
individual int FK
inputKindA int {I}
inputKindB int I2
model char(1) I3
FK_relationshipRecords_names
IX_relationshipRecords_B inputKindB (ASC)
IX_relationshipRecords_mdl model (ASC)
IX_relationshipRecords_id individual (ASC) Unique
IX_relationshipRecords_A inputKindA (ASC) Clustered
beginning
individual int FK
yr smallint {I}
color nvarchar(50)
FK_beginning_names
IX_beginning_culr color (ASC)
IX_beginning_id individual (ASC) Unique
IX_beginning_yr yr (ASC) Clustered
persistedTemp
originator int
i int
p decimal(28,25)
y smallint
IX_persistedTemp originator (ASC) Clustered
and record_log
record_id char(6) PK
version_id char(3)
cntry_id char(2)
record_nm nvarchar(255)
notes nvarchar(max)
updtd int
base_yr smallint
user_id nvarchar(255)
dflt_id char(15)
dflt_nm nvarchar(255)
css_ref nvarchar(255)
url_ref nvarchar(255)
bt_link nvarchar(150)
PK_record_log record_id (ASC) Unique Clustered
Finally, the database that stores the cached long-query results has the following tables:
hinfo, which holds redundant data from the names, relationshipRecords, and beginning tables above:
individual int PK U1
reg nvarchar(255)
name nvarchar(60) U1
kinds nvarchar(121)
year smallint {I}
color nvarchar(50)
model char(1)
PK_hinfo individual (ASC) Unique
IX_hinfo year (ASC) Clustered
IX_nameorder individual (ASC), name (ASC) Unique
srch_log
individual int PK FK
last_id int
last_dt datetime2(7)
hit_cnt int
lapse int
saved bit
PK_srch_log individual (ASC) Unique Clustered
and mostly this database stores many of the following type of tables:
p000001
i int PK
p decimal(28,25) I1
PK_p000001 i (ASC) Unique
IX_p000001 p (ASC) Clustered
I am at the beginning stages of a revision of my web application and the above structure is likely to change a bit in light of what I learned the first time around.
March 1, 2016 at 9:01 pm
This is nearly identical to another post. I recommend keeping any answers together on that other post, which is located at the following thread...
http://www.sqlservercentral.com/Forums/Topic1765710-3077-1.aspx
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply