Load large fact table

  • Hi All

    I need your help.
    I am working in a project to create a datawarhouse in azure sql server
    I'am trying to load about 31 millions rows in a fact table
    This the stored procedure I 've created to do this.


    CREATE PROCEDURE [Dwh].[AlimFctSurvey]

    AS
    BEGIN

        SET NOCOUNT ON;

    DECLARE @StartDate Date = (SELECT (ISNULL (MAX(LoadingDate), ( SELECT MIN (DATE) from [Dwh].[DimDate] )) ) FROM [Dwh].[FactSurvey] ) 

    INSERT INTO [Dwh].[FactSurvey] with (TABLOCK)
    ([IdDate]
    ,[IdRespondant]
    ,[IdResponse]
    ,[IdCountry]
    , IdModule
    ,[Count]
    ,[LoadingDate]
    , LineId
    , FileId
    )

    SELECT ISNULL (B.IdDate, -1)IdDate
    , ISNULL (C.IdRespondant, -1)IdRespondant
    , ISNULL (E.IdResponse, -1)IdResponse
    , ISNULL (D.IdCountry, -1)IdCountry
    , ISNULL (F.IdModule, -1)IdModule
    , [Count] = 1
    , Loadingdate = Getdate()
    , LineId
    , FileId

    FROM [Ods].[Sas_Output] A

    LEFT JOIN [Dwh].[DimDate] B ON B.IdDate = 100 * cast (A.Mois as int) + 1

    LEFT JOIN [Dwh].[DimRespondant] C ON A.[Respondant_FunctionnalKey]= C.Respondant_FunctionnalKey

    LEFT JOIN [Dwh].[DimCountry] D ON A.Country_FunctionnalKey= D.Country_FunctionnalKey       

    LEFT JOIN Dwh.DimResponse E on E.Response_FunctionnalKey = a.Response

    LEFT JOIN Dwh.DimModule F on F.Module_FunctionnalKey = a.Module 

    WHERE 
    1=1
    and A.LoadingDate > @StartDate 

    AND FILETYPE = 'MAIN'
    AND isnull ([FLAG_RA1A], 0) = 0

    AND MOIS <= '201803'

    END


    All tables in the queries have a columnstore index
    I don't know why the query runs for hours without loading the table.
    Do you have an idea ?
    Thank you for your help

  • I suspect this join condition is the problem. It has to perform the casting and + 1, and *100 on every single record in the sas_output table. The dimdate table is much smaller right? Probably better to rewrite the condition to perform all the conversions and such on B.IdDate instead of on A.Mois. Also, look up the word Sargeable.

    B.IdDate = 100 * cast (A.Mois as int) + 1

  • Rekonn - Tuesday, February 12, 2019 9:44 PM

    I suspect this join condition is the problem. It has to perform the casting and + 1, and *100 on every single record in the sas_output table. The dimdate table is much smaller right? Probably better to rewrite the condition to perform all the conversions and such on B.IdDate instead of on A.Mois. Also, look up the word Sargeable.

    B.IdDate = 100 * cast (A.Mois as int) + 1

    Thank you Rekonn.

    I make the changes and I'll come back and tell you if it works.
    Someone has an other suggestion?

  • Lidou123 - Wednesday, February 13, 2019 4:03 AM

    Rekonn - Tuesday, February 12, 2019 9:44 PM

    I suspect this join condition is the problem. It has to perform the casting and + 1, and *100 on every single record in the sas_output table. The dimdate table is much smaller right? Probably better to rewrite the condition to perform all the conversions and such on B.IdDate instead of on A.Mois. Also, look up the word Sargeable.

    B.IdDate = 100 * cast (A.Mois as int) + 1

    Thank you Rekonn.

    I make the changes and I'll come back and tell you if it works.
    Someone has an other suggestion?

    You could create a temporary table from dwh.DimDate:
    SELECT CAST((IdDate-1)/100 AS SameTypeOfColumnAsSasOutputMois) AS Mois, IdDate
      INTO #DimDate
      FROM [Dwh].[DimDate] ;

    ALTER TABLE #DimDate
      ADD CONSTRAINT PK_#DimDate PRIMARY KEY CLUSTERED (Mois, IdDate)

    Then in your join you'd just change it to this:LEFT JOIN #DimDate B ON B.Mois = A.Mois
    It should be slightly faster.

  • And the best way if you show us the execution plan of this query.

  • Lidou123 - Wednesday, February 13, 2019 4:03 AM

    Rekonn - Tuesday, February 12, 2019 9:44 PM

    I suspect this join condition is the problem. It has to perform the casting and + 1, and *100 on every single record in the sas_output table. The dimdate table is much smaller right? Probably better to rewrite the condition to perform all the conversions and such on B.IdDate instead of on A.Mois. Also, look up the word Sargeable.

    B.IdDate = 100 * cast (A.Mois as int) + 1

    Thank you Rekonn.

    I make the changes and I'll come back and tell you if it works.
    Someone has an other suggestion?

    I'm curious, did that make a difference?

  • Rekonn - Thursday, February 14, 2019 10:40 PM

    Lidou123 - Wednesday, February 13, 2019 4:03 AM

    Rekonn - Tuesday, February 12, 2019 9:44 PM

    I suspect this join condition is the problem. It has to perform the casting and + 1, and *100 on every single record in the sas_output table. The dimdate table is much smaller right? Probably better to rewrite the condition to perform all the conversions and such on B.IdDate instead of on A.Mois. Also, look up the word Sargeable.

    B.IdDate = 100 * cast (A.Mois as int) + 1

    Thank you Rekonn.

    I make the changes and I'll come back and tell you if it works.
    Someone has an other suggestion?

    I'm curious, did that make a difference?

    Hi
    I corrected some things:
    - I simplified the join with DimDate. (B.IdDate = A.IdDate)
    - I also realized that I had duplicates in DimResponse, which multiplied my lines out.
    By correcting these 2 points, I was able to populate my Fact table.

    Thank you so much.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply