loop and query CSV files in a folder using union all query to form resultant table on server

  • Hi,

    I am trying to run a UNION ALL query in SQL SERVER 2014 on multiple large CSV files - the result of which i want to get into a table in SQL Server. below is the query which works in MSAccess but not on SQL Server 2014:

    SELECT * INTO tbl_ALLCOMBINED FROM OPENROWSET

    (

    'Microsoft.JET.OLEDB.4.0' , 'Text;Database=D:\Downloads\CSV\;HDR=YES',

    'SELECT t.*, (substring(t.[week],3,4))*1 as iYEAR,

    ''SPAIN'' as [sCOUNTRY], ''EURO'' as [sCHAR],

    IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,

    ((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),

    ((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week

    FROM [DATABASE_SPAIN_EURO.CSV] as t, ( SELECT t3.[Level],t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description],

    min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as first_week_on_sale,

    max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as last_week_on_sale,

    (max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) -

    min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)))+1 as weeks_on_sale

    FROM [DATABASE_SPAIN_EURO.CSV] as t3

    WHERE t3.[Sales Value with Innovation] is NOT NULL

    and t3.[Sales Value with Innovation] <>0

    and t3.[Level]=''Item''

    GROUP BY t3.[Level], t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description]

    ) as t2

    WHERE

    t.[Level] = t2.[Level]

    and t.[Category] = t2.[Category]

    and t.[Manufacturer] = t2.[Manufacturer]

    and t.[Brand] = t2.[Brand]

    and t.[Description] = t2.[Description]

    and t.[Sales Value with Innovation] is NOT NULL

    and t.[Sales Value with Innovation] <>0

    and t2.first_week_on_sale >=1

    and t2.weeks_on_sale <=52

    UNION ALL

    SELECT t.*, (substring(t.[week],3,4))*1 as iYEAR,

    ''SPAIN'' as [sCOUNTRY], ''EURO'' as [sCHAR],

    IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,

    ((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),

    ((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week

    FROM [DATABASE_FRANCE_EURO.CSV] as t, ( SELECT t3.[Level],t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description],

    min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as first_week_on_sale,

    max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as last_week_on_sale,

    (max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) -

    min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)))+1 as weeks_on_sale

    FROM [DATABASE_FRANCE_EURO.CSV] as t3

    WHERE t3.[Sales Value with Innovation] is NOT NULL

    and t3.[Sales Value with Innovation] <>0

    and t3.[Level]=''Item''

    GROUP BY t3.[Level], t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description]

    ) as t2

    WHERE

    t.[Level] = t2.[Level]

    and t.[Category] = t2.[Category]

    and t.[Manufacturer] = t2.[Manufacturer]

    and t.[Brand] = t2.[Brand]

    and t.[Description] = t2.[Description]

    and t.[Sales Value with Innovation] is NOT NULL

    and t.[Sales Value with Innovation] <>0

    and t2.first_week_on_sale >=1

    and t2.weeks_on_sale <=52

    ')

    What i need is:

    1] to create the resultant tbl_ALLCOMBINED table

    2] transform this table using PIVOT command with following transformation as shown below:

    PAGEFIELD: set on Level = 'Item'

    COLUMNFIELD: Sale_Week (showing 1 to 52 numbers for columns)

    ROWFIELD: sCOUNTRY, sCHAR, CATEGORY, MANUFACTURER, BRAND, DESCRIPTION, EAN (in this order)

    DATAFIELD: 'Sale Value with Innovation'

    3] Can the transformed form show columnfields >255 columns i.e. if i want to show all KPI values in datafield?

    P.S: the CSV's contain the same number of columns and datatype but the columns are >100, so i dont think it will be feasible to use a stored proc to create a table specifying that number of columns.

    can anyone please help me with a solution asap?

  • Quick question, can you provide some sample data?

    😎

  • I am trying to run a UNION ALL query in SQL SERVER 2014 on multiple large CSV files - the result of which i want to get into a table in SQL Server. below is the query which works in MSAccess but not on SQL Server 2014:

    what error messages are you getting?

    what is the reasoning for doing union and calcs on the csvs?

    maybe better to import each csv to a single staging table in SQL and work from there?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • is this correct....

    UNION ALL

    SELECT t.*, (substring(t.[week],3,4))*1 as iYEAR,

    ''SPAIN'' as [sCOUNTRY], ''EURO'' as [sCHAR],

    IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,

    ((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),

    ((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week

    FROM [DATABASE_FRANCE_EURO.CSV] as t,

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • the above queries work fine in sql server now, but donot know how to automate job schedule to scan network folder for csv files and run those queries on the csv files.

    i think i should not go ahead with PIVOTING the 2 tables that are produced after running the above queries on the csv file, but rather work with those 2 tables themselves as they contain all the KPI's starting after column 'EAN'.

    BTW, as per your earlier suggestion, i am reading on cubes and think they might be the right approach - however, donot know how to create a cube from a single denormalized table containing everything, but not having a primary key.

    please advice.

  • sifar786 (7/5/2014)


    the above queries work fine in sql server now, but donot know how to automate job schedule to scan network folder for csv files and run those queries on the csv files.

    i think i should not go ahead with PIVOTING the 2 tables that are produced after running the above queries on the csv file, but rather work with those 2 tables themselves as they contain all the KPI's starting after column 'EAN'.

    BTW, as per your earlier suggestion, i am reading on cubes and think they might be the right approach - however, donot know how to create a cube from a single denormalized table containing everything, but not having a primary key.

    please advice.

    here is one method for bringing files into SQL.........

    http://www.mssqltips.com/sqlservertip/2874/loop-through-flat-files-in-sql-server-integration-services/

    in one of your previous posts you mentioned passing parameters back to a SQL sproc from excel.....

    http://sqlwithmanoj.wordpress.com/2009/09/24/create-parameterized-excel-refreshable-reports/

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Thanks for the links mate. 🙂

    BTW, i am currently looking at this Cube Tutorial. But it doesnt explain how to create and deploy cubes made from a single table. In my case, the 2 tables, viz., tbl_BrandYearWise & tbl_ItemWeekwise tables.

    Any ideas on how to create cubes from a single denormalized table ?

  • sifar786 (7/6/2014)


    Thanks for the links mate. 🙂

    BTW, i am currently looking at this Cube Tutorial. But it doesnt explain how to create and deploy cubes made from a single table. In my case, the 2 tables, viz., tbl_BrandYearWise & tbl_ItemWeekwise tables.

    Any ideas on how to create cubes from a single denormalized table ?

    google says...

    http://stackoverflow.com/questions/14756846/ssas-cube-from-a-flat-table%5B/url%5D%5B/b%5D

    [b][url]http://social.msdn.microsoft.com/Forums/sqlserver/en-US/f8771384-faa4-471e-bb9f-f90971af4a0a/want-to-create-cube-using-single-table">

    http://stackoverflow.com/questions/14756846/ssas-cube-from-a-flat-table%5B/url%5D%5B/b%5D

    http://social.msdn.microsoft.com/Forums/sqlserver/en-US/f8771384-faa4-471e-bb9f-f90971af4a0a/want-to-create-cube-using-single-table

    btw....are you doing any aggregation at all...ie SUM/AVG of the KPis...or grouping by Year/Month etc....or is this just transposing week rows to columns for readability in excel?......

    also can you explain the rationale around this part of your code...

    IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,

    ((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),

    ((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • [font="Verdana"]Hi,

    If you see the CSV's they dont have a Sale_week column. I am adding it to the query, also the sCountry, sChar (Character language), iYear columns. Each CSV contains 3 yrs (156 weeks) data for a particular Country & Char. So there are 2 CSV's for say, FRANCE viz., DATABASE_FRANCE_EURO.CSV & DATABASE_FRANCE_GLOBAL.CSV. likewise such CSV's for other countries. Also, in each csv fields are same, however Level field contains 2 types of data, Brand level & Item level. Brand level contains consolidated summary totals of Items in each brand. Item level is the detail breakup.

    what i am doing is clubbing all the CSV data together using UNION ALL and also adding the above columns to create 2 outputs. The Brand level output - we are only looking at the years, so 3 yrs(156 weeks) as per data (iYear column). The Item level output - we want to fit the 3yrs (156 weeks) data into a 52 week window. The idea is - any Innovation (Description field or SKU) will run only for not more than 52 weeks from the week of launch. So in above SQL code, i am just applying rules:

    1] if any Innovation that starts from 1st week of launch and <=52, then it may have had some weeks in previous years. So shift it to column 52-count of sku+1 (Sale_week).

    2] else any Innovation that does not start from 1st week of launch and <=52 then just number its column sequentially. (Sale_week).

    Exceptions: if you filter on Level='Item', you will see some sku's are having >52 weeks data & 0 & blanks. So i am excluding 0 & blanks & just picking up the maximum sku (<=52).

    Outputs: dropdown Filters are on Country Category, Manufacturer, Brand and aggregation done after applying each such filter.

    Select:

    Your CountrySPAIN

    Your CategoryJUICES

    Your ManufacturerA.J. BARD

    Your BrandRUBICON

    Calculations:

    Data is grabbed from ItemWeekWise table

    Country

    Count the number of SKUs in the country with more than 13 weeks of data

    For the selected week x compute cumulative values for below KPIs for each of the SKU in the country:

    1. Sales Value with Innovation

    2. Sales Units with Innovation

    3. Weighted Distribution

    Based on the above data calculate for each SKU:

    4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution

    5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution

    "Find the below percentiles for each of the above 5 KPIs

    90

    50

    10"

    "Find the percent rank of the below values using the cumulative array for the respective KPI:

    Cumulative Value Sales (entered by the user)

    Cumulative Unit Sales (entered by the user)

    Weighted Distribution (entered by the user)

    Values Sales Velocity (entered by the user)

    Unit Sales Velocity (entered by the user)"

    Category

    Count the number of SKUs in the category with more than 13 weeks of data

    For the selected week x compute cumulative values for below KPIs for each of the SKU in the category:

    1. Sales Value with Innovation

    2. Sales Units with Innovation

    3. Weighted Distribution

    based on the above data calculate for each SKU:

    4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution

    5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution

    "Find the below percentiles for each of the above 5 KPIs

    90

    50

    10"

    "Find the percent rank of the below values using the cumulative array for the respective KPI:

    Cumulative Value Sales (entered by the user)

    Cumulative Unit Sales (entered by the user)

    Weighted Distribution (entered by the user)

    Values Sales Velocity (entered by the user)

    Unit Sales Velocity (entered by the user)"

    Manufacturer

    Count the number of SKUs in the manufacturer with more than 13 weeks of data

    For the selected week x compute cumulative values for below KPIs for each of the SKU under the manufacturer:

    1. Sales Value with Innovation

    2. Sales Units with Innovation

    3. Weighted Distribution

    based on the above data calculate for each SKU:

    Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution

    4.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution

    "5.Find the below percentiles for each of the above 5 KPIs

    90

    50

    10"

    "Find the percent rank of the below values using the cumulative array for the respective KPI:

    Cumulative Value Sales (entered by the user)

    Cumulative Unit Sales (entered by the user)

    Weighted Distribution (entered by the user)

    Values Sales Velocity (entered by the user)

    Unit Sales Velocity (entered by the user)"

    Brand

    Count the number of SKUs in the brand with more than 13 weeks of data

    For the selected week x compute cumulative values for below KPIs for each of the SKU under the brand:

    1. Sales Value with Innovation

    2. Sales Units with Innovation

    3. Weighted Distribution

    based on the above data calculate for each SKU:

    4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution

    5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution

    "Find the below percentiles for each of the above 5 KPIs

    90

    50

    10"

    "Find the percent rank of the below values using the cumulative array for the respective KPI:

    Cumulative Value Sales (entered by the user)

    Cumulative Unit Sales (entered by the user)

    Weighted Distribution (entered by the user)

    Values Sales Velocity (entered by the user)

    Unit Sales Velocity (entered by the user)"

    This is just one output sheet, there are 5 more output sheets. LOL 🙂

    Let me know if something is not clear.[/font]

  • think you are going to be busy here.....:w00t::w00t:

    there are some regular posters on here that are far more experienced than I in BI...hopefully one will pick this up as well.

    out of interest....when you have all the data together for each country...how many distinct products (descriptions) do you have?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Once i pivot the tbl_ItemWeekwise, i get around ~61,000 distinct sku's.

    if you can refer my post to anyone with BI know-how that would be faster.

    Thanks.

  • sifar786 (7/6/2014)


    [font="Verdana"]Hi,

    Outputs: dropdown Filters are on Country Category, Manufacturer, Brand and aggregation done after applying each such filter.

    Select:

    Your CountrySPAIN

    Your CategoryJUICES

    Your ManufacturerA.J. BARD

    Your BrandRUBICON

    Calculations:

    Data is grabbed from ItemWeekWise table

    Country

    Count the number of SKUs in the country with more than 13 weeks of data

    For the selected week x compute cumulative values for below KPIs for each of the SKU in the country:

    1. Sales Value with Innovation

    2. Sales Units with Innovation

    3. Weighted Distribution

    Based on the above data calculate for each SKU:

    4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution

    5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution

    "Find the below percentiles for each of the above 5 KPIs

    90

    50

    10"

    "Find the percent rank of the below values using the cumulative array for the respective KPI:

    Cumulative Value Sales (entered by the user)

    Cumulative Unit Sales (entered by the user)

    Weighted Distribution (entered by the user)

    Values Sales Velocity (entered by the user)

    Unit Sales Velocity (entered by the user)"

    [/font]

    do have an example of the output you are looking to provide?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply