The "Numbers" or "Tally" Table: What it is and how it replaces a loop

  • mirzafahad1409 (7/29/2010)


    Hi every1, i am extremely new to SQL.Can any1 show an example using a column instead of @parameter.

    for the fetching of data between the commas.???(spliting the data).

    email me plzz mfb1409@gmail.com

    Welcome aboard!

    Just so you know, it wouldn't be appropriate to email a solution to you because then only you will benefit. We'll post the answer right here so that anyone that reads this thread may benefit.

    Second, you've apparently not read the article which gives you a solution for such a thing. After you've actually read the article (it's important that you understand how these things work), come back and we'll show you a solution that's even faster than the method in the article. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks jeff for the speedy reply.

    First i will post my probs in details.

    I have a Excel file.With columns HB_ID,Firstname,LastName,Phone_Numbers,E-mails,Companies.(This is my "Source" File).

    Now i have to create a Destination file(Should be excel and lets name it "target").In the first sheet of the Target excel file their should be 3 columns HB_ID,FirstName,LastName and in the Sheet 2 it should have HB_ID,RecordType,Value

    Now i will post a small part of the "Source" File.

    Row 1:

    HBE_IDFIRST_NAME LAST_NAME Phone_Numbers

    HBE_C001Pierre Amarenco 33140257198|33140258726

    Emails

    amarenco@ccr.jussieu.fr|pierre.amarenco@bch.aphp.fr|pierre@amarenco.netCardiovascular

    Company

    Diseases|Neurologist

    Row 2

    HBE_C_4000569RichardAmerling(212) 420-4070|(212) 420-4117ramerling@bethisraelny.org|ramerling@usa.netDialysis|Nephrology

    "Also shown in attachment"

    Now lets divide the problems.

    Problem 1:

    i have to separate The HB_ID,FirstName,LastName and Place it in Target-Sheet-1

    Problem 2:

    U will observe.The email,Phone_Numbers Columns has multiple entries separated by a PIPE "|".I have to extract this data between the PIPES "|" and load in into the Value Column in target-Sheet-2

    Problem 3:

    U will observe that in target sheet 2 the emails,phonenumbers,companies all are clubbed into a single column.

    Problem 4:

    I have to do all this in SSIS

    My thoughts:

    For

    Problem 2:

    Can we use a Derived column and the substring to separate the data

    Problem 3:

    Can we use Pivot and Unpivot.

  • mirzafahad1409 (7/31/2010)


    Thanks jeff for the speedy reply.

    First i will post my probs in details.

    I have a Excel file.With columns HB_ID,Firstname,LastName,Phone_Numbers,E-mails,Companies.(This is my "Source" File).

    Now i have to create a Destination file(Should be excel and lets name it "target").In the first sheet of the Target excel file their should be 3 columns HB_ID,FirstName,LastName and in the Sheet 2 it should have HB_ID,RecordType,Value

    Now i will post a small part of the "Source" File.

    Row 1:

    HBE_IDFIRST_NAME LAST_NAME Phone_Numbers

    HBE_C001Pierre Amarenco 33140257198|33140258726

    Emails

    amarenco@ccr.jussieu.fr|pierre.amarenco@bch.aphp.fr|pierre@amarenco.netCardiovascular

    Company

    Diseases|Neurologist

    Row 2

    HBE_C_4000569RichardAmerling(212) 420-4070|(212) 420-4117ramerling@bethisraelny.org|ramerling@usa.netDialysis|Nephrology

    "Also shown in attachment"

    Now lets divide the problems.

    Problem 1:

    i have to separate The HB_ID,FirstName,LastName and Place it in Target-Sheet-1

    Problem 2:

    U will observe.The email,Phone_Numbers Columns has multiple entries separated by a PIPE "|".I have to extract this data between the PIPES "|" and load in into the Value Column in target-Sheet-2

    Problem 3:

    U will observe that in target sheet 2 the emails,phonenumbers,companies all are clubbed into a single column.

    Problem 4:

    I have to do all this in SSIS

    My thoughts:

    For

    Problem 2:

    Can we use a Derived column and the substring to separate the data

    Problem 3:

    Can we use Pivot and Unpivot.

    A few things to notice:

    You should start a new thread in the SSIS forum if you're looking for a SSIS solution.

    Also, please read and follow the advice given in the first article in my signature on how to provide information / sample data.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (8/1/2010)


    A few things to notice:

    You should start a new thread in the SSIS forum if you're looking for a SSIS solution.

    Also, please read and follow the advice given in the first article in my signature on how to provide information / sample data.

    I agree with what Lutz said above. You'll also get better answers more quickly because only the people that have read the article and responded on this discussion are likely to see your question and I don't even use SSIS.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks, will do that.

  • I just thought I'd share a head-smacking moment with you. It's been touched-on previously in this thread, but I don't think anyone's come right out and demonstrated it.

    What happened?

    This one stored procedure I was writing for an SSRS 2008 report demonstrating Column Groups (i.e.: pivot tables in the UI instead of in SQL) was generating the expected 10,000 rows in SSMS but when called by the least-privileged user in Reporting Services, or the Query Designer in BIDS, it was only returning 9 rows. Because of the "no tables" restriction imposed by security policy local to the DB it was generating a Tally table on the fly (every other proc in this DB bounces through a Linked Server to source data from the real back end DB on another physical server). The proc was just generating random data as per:

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[sprTablixTest]') AND type in (N'P', N'PC'))

    DROP PROCEDURE [dbo].[sprTablixTest]

    GO

    SET ANSI_NULLS ON

    GO

    SET ANSI_WARNINGS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE procedure [dbo].[sprTablixTest]

    AS

    BEGIN

    set nocount on

    select top 10000 identity(int, 1, 1) as [ID]

    into #TallyTable

    from syscolumns sc1, syscolumns sc2

    set nocount off

    select[ID]

    ,[Item] = replicate(char(65+floor(5*rand(321*[ID]))), 10)

    ,[Month] = convert(datetime, '2010-'+ right('0'+convert(nvarchar(10),1+floor(12*rand(654*[ID]))),2) + '-01')

    ,[Value] = round((100000*rand(987*[ID]))/100, 2)

    from #TallyTable

    order by [Item], [Month], [Value]

    drop table #TallyTable

    END -- sprTablixTest

    GO

    GRANT EXECUTE ON [dbo].[sprTablixTest] TO [TKReports]

    Jeff and others can probably see the issue already.

    So, after getting a few more coffees and working on something else for a while I came back and found the cause: I was referencing the local DB's syscolumns table instead of the master..syscolumns table. You can see the difference in the following query:

    select @@version

    select [syscolumns]=count(*) from syscolumns

    select [dbo.syscolumns]=count(*) from dbo.syscolumns

    select [sys.syscolumns]=count(*) from sys.syscolumns

    select [master..syscolumns]=count(*) from master..syscolumns

    select [master.dbo.syscolumns]=count(*) from master.dbo.syscolumns

    select [master.sys.syscolumns]=count(*) from master.sys.syscolumns

    Which as the lesser-privileged user returns:

    Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (X64)

    Mar 29 2009 10:11:52

    Copyright (c) 1988-2008 Microsoft Corporation

    Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: ) (VM)

    (1 row(s) affected)

    syscolumns

    3

    (1 row(s) affected)

    dbo.syscolumns

    3

    (1 row(s) affected)

    sys.syscolumns

    3

    (1 row(s) affected)

    master..syscolumns

    11949

    (1 row(s) affected)

    master.dbo.syscolumns

    11949

    (1 row(s) affected)

    master.sys.syscolumns

    11949

    (1 row(s) affected)

    And as sa returns:

    Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (X64)

    Mar 29 2009 10:11:52

    Copyright (c) 1988-2008 Microsoft Corporation

    Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: ) (VM)

    (1 row(s) affected)

    syscolumns

    486

    (1 row(s) affected)

    dbo.syscolumns

    486

    (1 row(s) affected)

    sys.syscolumns

    486

    (1 row(s) affected)

    master..syscolumns

    12588

    (1 row(s) affected)

    master.dbo.syscolumns

    12588

    (1 row(s) affected)

    master.sys.syscolumns

    12588

    (1 row(s) affected)

    As sa, 486 * 486 rows was no problem (236,196 > 10,000). But 3 * 3 rows does not quite get to 10,000.

    There are two morals to this story:

    1. just as if you're writing C/C++/C#/whatever code for desktop applications, test your code as an ordinary user and not just run everything as admin (sa).

    2. make sure you're using the right tables when generating Tally data.

  • AnthonyR (8/30/2010)


    I just thought I'd share a head-smacking moment with you. It's been touched-on previously in this thread, but I don't think anyone's come right out and demonstrated it.

    Thanks for sharing the problems you ran into and, especially, the solutions you came up with, Anthony

    Heh... yes, it is a problem and that's why I demonstrated by using Master.dbo.syscolumns. That was actually before I finally loaded SQL Server 2005 on my box at home. If you're using 2k5 or above, I recommend using Master.sys.All_Columns instead. If you do a full installation, it's guaranteed to have at least 4,000 rows in it.

    For those that may need to generate on-the-fly Tally tables, consider the original method that Ben-Gan published in one of his books with a little "SSC Flavor" added to it...

    --===== "Inline" CTE Driven "Tally Table” produces values up to

    -- 10,000... enough to cover VARCHAR(8000)

    WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4)

    SELECT * FROM cteTally --"action query" would go here

    For most things, though, a permanent Tally table is better (with exceptions, of course).

    I'll be back on the random number things in a minute or two...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Anthony,

    Here's a direct replacement for your sprTablixTest with three very important differences (that you may or may not want)...

    1) It creates different data every time you use it instead of the same data everytime a your proc current does.

    2) It has no need of creating an actual temp table.

    3) It only runs in 2k5 or better.

    CREATE PROCEDURE [dbo].[sprTablixTest]

    AS

    SET NOCOUNT OFF

    ;

    WITH cteTally AS

    (

    SELECT TOP (10000)

    [N] = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM Master.sys.All_Columns ac1

    CROSS JOIN Master.sys.All_Columns ac2

    )

    SELECT

    [N],

    [Item] = REPLICATE(CHAR(ABS(CHECKSUM(NEWID()))%5+65),10),

    [Month] = DATEADD(mm,ABS(CHECKSUM(NEWID()))%12,'2010'),

    [Value] = CAST(ABS(CHECKSUM(NEWID()))%100000/100.0 AS DECIMAL(9,2))

    FROM cteTally

    ORDER BY [Item], [Month], [Value]

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Using NEWID() as you have is a better way to get random data out of SQL if you don't want repeatability, but getting repeatable random-ish data out of this proc is acceptable for demonstration purposes.

    Thanks and kind regards,

    Anthony.

  • AnthonyR (8/30/2010)


    Hi Jeff,

    Using NEWID() as you have is a better way to get random data out of SQL if you don't want repeatability, but getting repeatable random-ish data out of this proc is acceptable for demonstration purposes.

    Thanks and kind regards,

    Anthony.

    Ah... Understood. Thanks for the feedback, Anthony.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi, mirzafahad1409,

    To parse sophisticated text data (with multiple rules), you should use high level language like C# (it is easy to write and debug), you could embed C# into SSIS. You are doing hard work with a hammer instead of jack-hammer.

    Jeff wrote an excellent article. Two thumbs up.

    Jason
    http://dbace.us
    πŸ˜›

  • jswong05 (1/7/2011)


    Hi, mirzafahad1409,

    To parse sophisticated text data (with multiple rules), you should use high level language like C# (it is easy to write and debug), you could embed C# into SSIS. You are doing hard work with a hammer instead of jack-hammer.

    C# is a high level language? :ermm: Must be a new definition of high level language! πŸ˜€

    Tom

  • jswong05 (1/7/2011)


    Hi, mirzafahad1409,

    To parse sophisticated text data (with multiple rules), you should use high level language like C# (it is easy to write and debug), you could embed C# into SSIS. You are doing hard work with a hammer instead of jack-hammer.

    Jeff wrote an excellent article. Two thumbs up.

    Thanks for the feedback, Jason. I do have to agree with what Tom said, though... C# isn't usually considered to be a high level language. Most people I know consider it to be just a bit above assembly language and the libraries folks built for it are one of the things that make it so useful.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Great article, thank you.

    I have a quick question - is it possible to have a total column for each year if the date range spans multiple years?

  • The total (SUM) column in the example I gave is works by day and will work across many years but I suspect you mean something different. Can you be a bit more specific please?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 301 through 315 (of 511 total)

You must be logged in to reply to this topic. Login to reply