Help with best approach to large number of records

  • The application that I'm designing the database is for tracking hits for individuals of which they can mine statistics. If I have 5000 web site owners tracking 30,000 hits a month....well, you get the picture. I end up with a VLDB.

    I'm wondering if I should store all the hits in a single table or create a separate table for each individual user [to speed up lookup times]?

    My concern is that with the amount of records that will need to be looked up for any given user, lookup times may get very long.

    Would SQL Server have a problem with a database with 5000 tables?

    Is there a way to really tweak out a tables indexes for high speed lookups?

    Is there an Ideal approach to this situation?

    Any insight to this would be very greatly apprecicated. TIA

    Jeffrey V. Lemire

  • quote:


    Is there a way to really tweak out a tables indexes for high speed lookups?

    Is there an Ideal approach to this situation?


    Generally, you should try a normalized solution before attempting to denormalize a db design. Possibly, you could create a table housing the raw data...then a separate table keyed on site that stores some calculated fields (total hits, etc...) A well indexed table with narrow primary keys should provide the speed you require. How is hit data stored, meaning, how are you planning to key the data...site id and what else?

  • perhaps instead of creating one table per web site owner you could create one table per month and let all the owners search the same table.

    Views might help you get over the multiple table problem...

  • Totally agree you should build it right, then if perf is an issue look for solutions. Keep the table narrow as possible. You might consider summarizing the data daily or monthly to a different table unless they truly need detail for the entire period.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • Yes, do it right, then tune. In fact you can summarize with indexed views and still keep your table 'correct', but watch out for blocking in the indexed view when doing inserts or updates.

    If blocking becomes an issue you could (perhaps) be inspired by below example, which is somewhat theoretical, difficult to program and was made most for illustrative purposes, but never the less works perfectly.

    Just copy below text into Query Analyzer and read carefully. (hope it is not to long, haven't figured out how to attach SQL scripts)

    /* 2002/11/26/JKJ Below illustrates the use of SQL Server Indexed Views,

    with no blocking on insertion of new non posted rows.

    Generally, any creation of indexed views/materialized views will introduce heavy blocking

    because updating data included in the precalculated aggregate needs to block. But often

    we have a usage (pattern?) in which some data are very static and some data are very

    dynamic. If we know the pattern, we can design the aggregate to contain only the data

    generally NOT updated, whereas heavily updated data are not including in the aggregate.

    This will allow us to use a pre-aggregated value for static data, summed with an

    'on the fly' generated aggregate for dynamic data. Latter simply to avoid blocking on

    the update of the pre-aggregate itself.

    In below example this is implemented using an isPosted Column (which is kind of speculative)

    Netto, this simple implementation will NOT block inserts of non posted transactions,

    but will block when transactions are posted, because then the aggregated sum for

    posted transactions will need to be updated.

    So tons of inserts of non posted transactions can be handled with no blocking, but when

    transactions are eventually posted, they will block (which is kind of by nature). So

    until the transactions are reflected in the pre-aggregated sum they have high concurrency.

    If the posting is delayed until the very end of all posting transactions, this is still

    a high concurrency solution.

    */

    /* Clean up */

    if exists

    (SELECT * FROM INFORMATION_SCHEMA.TABLES

    WHERE table_name = 'ledgerAccountBalanceView'

    AND table_type = 'VIEW')

    DROP VIEW ledgerAccountBalanceView

    go

    if exists

    (SELECT * FROM INFORMATION_SCHEMA.TABLES

    WHERE table_name = 'ledgerBalanceTransactions'

    AND table_type = 'BASE TABLE')

    DROP TABLE ledgerBalanceTransactions

    go

    /* Create a silly table, which both include historic (mostly posted) transactions

    and a few non posted transactions (Nature of Accounting systems should mean that

    in time, most transactions are posted)

    */

    CREATE TABLE ledgerBalanceTransactions

    (ledgerAccount VARCHAR(10),

    amount DECIMAL(20,4) NOT NULL,

    datePosted DATETIME,

    isPosted SMALLINT DEFAULT 0

    )

    go

    /* Make a lot of posted transactions */

    DECLARE @i INT

    DECLARE @j-2 INT

    SET @i = 1000

    SET NOCOUNT ON

    WHILE (@i > 0)

    BEGIN

    SET @j-2 = 100

    WHILE (@j > 0)

    BEGIN

    INSERT INTO LedgerBalanceTransactions

    (ledgerAccount, amount, datePosted, isPosted)

    VALUES

    (CAST( @i%10 AS VARCHAR(10)), @i % 10, {ts '2001-01-01 00:00:00.0'}, 1)

    SET @j-2 = @j-2 - 1

    END

    SET @i = @i -1

    END

    SET NOCOUNT OFF

    go

    /* Make a few not posted transactions */

    DECLARE @i INT

    SET @i = 100

    SET NOCOUNT ON

    WHILE (@i > 0)

    BEGIN

    INSERT INTO ledgerBalanceTransactions

    (ledgerAccount, amount, datePosted, isPosted)

    VALUES

    (CAST( @i%10 AS VARCHAR(10)), @i % 10, NULL, 0)

    SET @i = @i -1

    END

    SET NOCOUNT OFF

    go

    /* Lets index the table */

    -- DROP INDEX ledgerBalanceTransactions.ledgerBalanceTransactionsIdx

    CREATE CLUSTERED INDEX ledgerBalanceTransactionsIdx ON

    ledgerBalanceTransactions(ledgerAccount)

    CREATE NONCLUSTERED INDEX ledgerBalanceTransactionsPostIdx ON

    ledgerBalanceTransactions(isPosted)

    go

    /* Lets find the sum of all posted transactions per ledgerAccount */

    select sum(amount), ledgerAccount

    from ledgerbalancetransactions

    where isPosted = 1

    group by ledgerAccount

    go

    Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 535, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times: CPU time = 250 ms, elapsed time = 254 ms.

    /* Lets find the sum of all non posted transactions per ledgerAccount */

    select sum(amount), ledgerAccount

    from ledgerbalancetransactions

    where isPosted = 0

    group by ledgerAccount

    Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times: CPU time = 1 ms, elapsed time = 1 ms.

    /* Lets find the grand sum of all transactions per ledgerAccount */

    select sum(amount), ledgerAccount

    from ledgerbalancetransactions

    group by ledgerAccount

    go

    Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 535, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times: CPU time = 101 ms, elapsed time = 101 ms.

    /* Lets make a View for the SUM(amount) on all posted transaction */

    CREATE VIEW ledgerAccountBalanceView

    WITH SCHEMABINDING

    AS

    SELECT ledgerAccount, SUM(amount) AS amountSumPosted, COUNT_BIG(amount) ledgerCount, isPosted

    FROM dbo.ledgerBalanceTransactions

    WHERE isPosted = 1

    GROUP BY ledgerAccount, isPosted

    go

    /* Lets materialize the view into a clustered index */

    CREATE UNIQUE CLUSTERED INDEX accountBalanceViewIndex ON

    ledgerAccountBalanceView( ledgerAccount )

    go

    /* Select sum(ammount) per ledgerAccount from the View for all posted transactions */

    SELECT amountSumPosted, LedgerAccount, ledgerCount FROM ledgerAccountBalanceView

    go

    Table 'ledgerAccountBalanceView'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

    /* Lets find the sum of all posted transactions per ledgerAccount */

    select sum(amount), ledgerAccount

    from ledgerbalancetransactions

    where isPosted = 1

    group by ledgerAccount

    go

    Table 'ledgerAccountBalanceView'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

    /* Select sum(amount) per ledgeraccount for all non posted transactions */

    SELECT sum(amount) as amountSumNotPosted, ledgerAccount FROM ledgerBalanceTransactions

    WHERE isPosted = 0

    GROUP BY ledgerAccount

    go

    Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.

    /* Lets use the aggregated data to get sum of posted amounts from aggregated view plus

    the sum of non posted amounts from the transaction table itself. We need the derived

    table openT, because we want to GROUP before the join, otherwise the view sum will be

    summed per duplicate ledgerAccount row in transactions */

    SELECT SUM(openT.amount+closedT.amountSumPosted), closedT.ledgerAccount

    FROM ledgerAccountBalanceView AS closedT

    JOIN

    (SELECT SUM(amount) AS amount, ledgerAccount, isPosted

    FROM ledgerBalanceTransactions

    WHERE isPosted = 0

    GROUP BY ledgerAccount, isPosted) AS openT

    ON closedT.ledgerAccount = openT.ledgerAccount

    AND closedT.isPosted = 1 AND openT.isPosted = 0

    GROUP BY closedT.ledgerAccount

    go

    Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0.

    Table 'ledgerAccountBalanceView'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.

    /* As not all accounts will have both a historic and a current transaction, the query

    must UNION ledgerAccounts in both View and Table + ledgerAccounts only in View +

    ledgerAccount only in View */

    -- Below is the sum of open and closed transactions in both View and Table

    SELECT SUM(openT.amount+closedT.amountSumPosted), closedT.ledgerAccount

    FROM ledgerAccountBalanceView AS closedT

    JOIN

    (SELECT SUM(amount) AS amount, ledgerAccount, isPosted

    FROM ledgerBalanceTransactions

    WHERE isPosted = 0

    GROUP BY ledgerAccount, isPosted) AS openT

    ON closedT.ledgerAccount = openT.ledgerAccount

    AND closedT.isPosted = 1 AND openT.isPosted = 0

    GROUP BY closedT.ledgerAccount

    UNION ALL

    -- Below is the sum of (closed) transactions only in the View

    SELECT SUM( closedT.amountSumPosted ), closedT.ledgerAccount

    FROM ledgerAccountBalanceView AS closedT

    RIGHT OUTER JOIN ledgerBalanceTransactions AS openT

    ON closedT.ledgerAccount = openT.ledgerAccount

    AND closedT.isPosted = 1 AND openT.ledgerAccount IS NULL

    GROUP BY closedT.ledgerAccount

    HAVING closedT.ledgerAccount IS NOT NULL

    UNION ALL

    -- Below is the sum of (open) transactions only in the Table

    SELECT SUM( openT.amount ), openT.ledgerAccount

    FROM ledgerBalanceTransactions AS openT

    RIGHT OUTER JOIN ledgerAccountBalanceView AS closedT

    ON closedT.ledgerAccount = openT.ledgerAccount

    AND openT.isPosted = 0 AND closedT.ledgerAccount IS NULL

    GROUP BY openT.ledgerAccount

    HAVING openT.ledgerAccount IS NOT NULL

    go

    Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0.

    Table 'ledgerAccountBalanceView'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times: CPU time = 0 ms, elapsed time = 2 ms.

    /* To compare with the standard query */

    SELECT SUM(amount), ledgerAccount, count(*)

    FROM ledgerBalanceTransactions

    GROUP BY ledgerAccount

    go

    Table 'ledgerBalanceTransactions'. Scan count 1, logical reads 536, physical reads 0, read-ahead reads 0.

    SQL Server Execution Times: CPU time = 93 ms, elapsed time = 93 ms.

    /* Notice that this query does NOT block */

    BEGIN TRAN

    DECLARE @i INT

    SET @i = 10

    SET NOCOUNT ON

    WHILE (@i > 0)

    BEGIN

    INSERT INTO ledgerBalanceTransactions

    (ledgerAccount, amount, datePosted, isPosted)

    VALUES

    (CAST( @i%10 AS VARCHAR(10)), @i % 10, NULL, 0)

    SET @i = @i -1

    END

    SET NOCOUNT OFF

    go

    COMMIT TRAN

    /* Whereasthis query DOES block */

    BEGIN TRAN

    DECLARE @i INT

    SET @i = 10

    SET NOCOUNT ON

    WHILE (@i > 0)

    BEGIN

    INSERT INTO ledgerBalanceTransactions

    (ledgerAccount, amount, datePosted, isPosted)

    VALUES

    (CAST( @i%10 AS VARCHAR(10)), @i % 10, NULL, 1)

    SET @i = @i -1

    END

    SET NOCOUNT OFF

    go

    COMMIT TRAN

    regards

    jensk

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply