Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

loop and query CSV files in a folder using union all query to form resultant table on server Expand / Collapse
Author
Message
Posted Friday, June 27, 2014 12:07 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, September 1, 2014 8:15 AM
Points: 18, Visits: 268
Hi,

I am trying to run a UNION ALL query in SQL SERVER 2014 on multiple large CSV files - the result of which i want to get into a table in SQL Server. below is the query which works in MSAccess but not on SQL Server 2014:

SELECT * INTO tbl_ALLCOMBINED FROM OPENROWSET
(
'Microsoft.JET.OLEDB.4.0' , 'Text;Database=D:\Downloads\CSV\;HDR=YES',
'SELECT t.*, (substring(t.[week],3,4))*1 as iYEAR,
''SPAIN'' as [sCOUNTRY], ''EURO'' as [sCHAR],
IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week
FROM [DATABASE_SPAIN_EURO.CSV] as t, ( SELECT t3.[Level],t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description],
min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as first_week_on_sale,
max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as last_week_on_sale,
(max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) -
min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)))+1 as weeks_on_sale
FROM [DATABASE_SPAIN_EURO.CSV] as t3
WHERE t3.[Sales Value with Innovation] is NOT NULL
and t3.[Sales Value with Innovation] <>0
and t3.[Level]=''Item''
GROUP BY t3.[Level], t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description]
) as t2
WHERE
t.[Level] = t2.[Level]
and t.[Category] = t2.[Category]
and t.[Manufacturer] = t2.[Manufacturer]
and t.[Brand] = t2.[Brand]
and t.[Description] = t2.[Description]
and t.[Sales Value with Innovation] is NOT NULL
and t.[Sales Value with Innovation] <>0
and t2.first_week_on_sale >=1
and t2.weeks_on_sale <=52
UNION ALL
SELECT t.*, (substring(t.[week],3,4))*1 as iYEAR,
''SPAIN'' as [sCOUNTRY], ''EURO'' as [sCHAR],
IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week
FROM [DATABASE_FRANCE_EURO.CSV] as t, ( SELECT t3.[Level],t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description],
min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as first_week_on_sale,
max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) as last_week_on_sale,
(max(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)) -
min(((substring(t3.[week],3,4) - 2011) * 52) + right(t3.[week],2)))+1 as weeks_on_sale
FROM [DATABASE_FRANCE_EURO.CSV] as t3
WHERE t3.[Sales Value with Innovation] is NOT NULL
and t3.[Sales Value with Innovation] <>0
and t3.[Level]=''Item''
GROUP BY t3.[Level], t3.[Category],t3.[Manufacturer],t3.[Brand],t3.[Description]
) as t2
WHERE
t.[Level] = t2.[Level]
and t.[Category] = t2.[Category]
and t.[Manufacturer] = t2.[Manufacturer]
and t.[Brand] = t2.[Brand]
and t.[Description] = t2.[Description]
and t.[Sales Value with Innovation] is NOT NULL
and t.[Sales Value with Innovation] <>0
and t2.first_week_on_sale >=1
and t2.weeks_on_sale <=52
')

What i need is:
1] to create the resultant tbl_ALLCOMBINED table

2] transform this table using PIVOT command with following transformation as shown below:
PAGEFIELD: set on Level = 'Item'
COLUMNFIELD: Sale_Week (showing 1 to 52 numbers for columns)
ROWFIELD: sCOUNTRY, sCHAR, CATEGORY, MANUFACTURER, BRAND, DESCRIPTION, EAN (in this order)
DATAFIELD: 'Sale Value with Innovation'

3] Can the transformed form show columnfields >255 columns i.e. if i want to show all KPI values in datafield?

P.S: the CSV's contain the same number of columns and datatype but the columns are >100, so i dont think it will be feasible to use a stored proc to create a table specifying that number of columns.

can anyone please help me with a solution asap?
Post #1586986
Posted Saturday, July 5, 2014 3:43 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 10:20 PM
Points: 1,933, Visits: 5,038
Quick question, can you provide some sample data?
Post #1589486
Posted Saturday, July 5, 2014 6:32 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 1,919, Visits: 19,305
I am trying to run a UNION ALL query in SQL SERVER 2014 on multiple large CSV files - the result of which i want to get into a table in SQL Server. below is the query which works in MSAccess but not on SQL Server 2014:


what error messages are you getting?

what is the reasoning for doing union and calcs on the csvs?
maybe better to import each csv to a single staging table in SQL and work from there?


______________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Post #1589506
Posted Saturday, July 5, 2014 6:38 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 1,919, Visits: 19,305
is this correct....


UNION ALL
SELECT t.*, (substring(t.[week],3,4))*1 as iYEAR,
''SPAIN'' as [sCOUNTRY], ''EURO'' as [sCHAR],
IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52,
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week
FROM [DATABASE_FRANCE_EURO.CSV] as t,


______________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Post #1589508
Posted Saturday, July 5, 2014 6:50 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, September 1, 2014 8:15 AM
Points: 18, Visits: 268
the above queries work fine in sql server now, but donot know how to automate job schedule to scan network folder for csv files and run those queries on the csv files.

i think i should not go ahead with PIVOTING the 2 tables that are produced after running the above queries on the csv file, but rather work with those 2 tables themselves as they contain all the KPI's starting after column 'EAN'.

BTW, as per your earlier suggestion, i am reading on cubes and think they might be the right approach - however, donot know how to create a cube from a single denormalized table containing everything, but not having a primary key.

please advice.

Post #1589509
Posted Sunday, July 6, 2014 2:50 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 1,919, Visits: 19,305
sifar786 (7/5/2014)
the above queries work fine in sql server now, but donot know how to automate job schedule to scan network folder for csv files and run those queries on the csv files.

i think i should not go ahead with PIVOTING the 2 tables that are produced after running the above queries on the csv file, but rather work with those 2 tables themselves as they contain all the KPI's starting after column 'EAN'.

BTW, as per your earlier suggestion, i am reading on cubes and think they might be the right approach - however, donot know how to create a cube from a single denormalized table containing everything, but not having a primary key.

please advice.




here is one method for bringing files into SQL.........
http://www.mssqltips.com/sqlservertip/2874/loop-through-flat-files-in-sql-server-integration-services/


in one of your previous posts you mentioned passing parameters back to a SQL sproc from excel.....
http://sqlwithmanoj.wordpress.com/2009/09/24/create-parameterized-excel-refreshable-reports/



______________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Post #1589610
Posted Sunday, July 6, 2014 3:05 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, September 1, 2014 8:15 AM
Points: 18, Visits: 268
Thanks for the links mate.

BTW, i am currently looking at this Cube Tutorial. But it doesnt explain how to create and deploy cubes made from a single table. In my case, the 2 tables, viz., tbl_BrandYearWise & tbl_ItemWeekwise tables.

Any ideas on how to create cubes from a single denormalized table ?

Post #1589611
Posted Sunday, July 6, 2014 3:39 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 1,919, Visits: 19,305
sifar786 (7/6/2014)
Thanks for the links mate.

BTW, i am currently looking at this Cube Tutorial. But it doesnt explain how to create and deploy cubes made from a single table. In my case, the 2 tables, viz., tbl_BrandYearWise & tbl_ItemWeekwise tables.

Any ideas on how to create cubes from a single denormalized table ?



google says...


http://stackoverflow.com/questions/14756846/ssas-cube-from-a-flat-table


http://social.msdn.microsoft.com/Forums/sqlserver/en-US/f8771384-faa4-471e-bb9f-f90971af4a0a/want-to-create-cube-using-single-table

btw....are you doing any aggregation at all...ie SUM/AVG of the KPis...or grouping by Year/Month etc....or is this just transposing week rows to columns for readability in excel?......

also can you explain the rationale around this part of your code...

IIf( t2.first_week_on_sale = 1 and t2.weeks_on_sale <=52, 
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + (1 + 52 - t2.weeks_on_sale),
((((substring(t.[week],3,4) - 2011) * 52) + right(t.[week],2)) - t2.first_week_on_sale) + 1 ) as Sale_Week





______________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Post #1589612
Posted Sunday, July 6, 2014 4:36 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, September 1, 2014 8:15 AM
Points: 18, Visits: 268
Hi,

If you see the CSV's they dont have a Sale_week column. I am adding it to the query, also the sCountry, sChar (Character language), iYear columns. Each CSV contains 3 yrs (156 weeks) data for a particular Country & Char. So there are 2 CSV's for say, FRANCE viz., DATABASE_FRANCE_EURO.CSV & DATABASE_FRANCE_GLOBAL.CSV. likewise such CSV's for other countries. Also, in each csv fields are same, however Level field contains 2 types of data, Brand level & Item level. Brand level contains consolidated summary totals of Items in each brand. Item level is the detail breakup.

what i am doing is clubbing all the CSV data together using UNION ALL and also adding the above columns to create 2 outputs. The Brand level output - we are only looking at the years, so 3 yrs(156 weeks) as per data (iYear column). The Item level output - we want to fit the 3yrs (156 weeks) data into a 52 week window. The idea is - any Innovation (Description field or SKU) will run only for not more than 52 weeks from the week of launch. So in above SQL code, i am just applying rules:

1] if any Innovation that starts from 1st week of launch and <=52, then it may have had some weeks in previous years. So shift it to column 52-count of sku+1 (Sale_week).
2] else any Innovation that does not start from 1st week of launch and <=52 then just number its column sequentially. (Sale_week).

Exceptions: if you filter on Level='Item', you will see some sku's are having >52 weeks data & 0 & blanks. So i am excluding 0 & blanks & just picking up the maximum sku (<=52).

Outputs: dropdown Filters are on Country Category, Manufacturer, Brand and aggregation done after applying each such filter.

Select:
Your Country SPAIN
Your Category JUICES
Your Manufacturer A.J. BARD
Your Brand RUBICON

Calculations:
Data is grabbed from ItemWeekWise table
Country
Count the number of SKUs in the country with more than 13 weeks of data
For the selected week x compute cumulative values for below KPIs for each of the SKU in the country:
1. Sales Value with Innovation
2. Sales Units with Innovation
3. Weighted Distribution
Based on the above data calculate for each SKU:
4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution
5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution
"Find the below percentiles for each of the above 5 KPIs
90
50
10"
"Find the percent rank of the below values using the cumulative array for the respective KPI:
Cumulative Value Sales (entered by the user)
Cumulative Unit Sales (entered by the user)
Weighted Distribution (entered by the user)
Values Sales Velocity (entered by the user)
Unit Sales Velocity (entered by the user)"

Category
Count the number of SKUs in the category with more than 13 weeks of data
For the selected week x compute cumulative values for below KPIs for each of the SKU in the category:
1. Sales Value with Innovation
2. Sales Units with Innovation
3. Weighted Distribution
based on the above data calculate for each SKU:
4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution
5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution
"Find the below percentiles for each of the above 5 KPIs
90
50
10"
"Find the percent rank of the below values using the cumulative array for the respective KPI:
Cumulative Value Sales (entered by the user)
Cumulative Unit Sales (entered by the user)
Weighted Distribution (entered by the user)
Values Sales Velocity (entered by the user)
Unit Sales Velocity (entered by the user)"

Manufacturer
Count the number of SKUs in the manufacturer with more than 13 weeks of data
For the selected week x compute cumulative values for below KPIs for each of the SKU under the manufacturer:
1. Sales Value with Innovation
2. Sales Units with Innovation
3. Weighted Distribution
based on the above data calculate for each SKU:
Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution
4.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution
"5.Find the below percentiles for each of the above 5 KPIs
90
50
10"
"Find the percent rank of the below values using the cumulative array for the respective KPI:
Cumulative Value Sales (entered by the user)
Cumulative Unit Sales (entered by the user)
Weighted Distribution (entered by the user)
Values Sales Velocity (entered by the user)
Unit Sales Velocity (entered by the user)"

Brand
Count the number of SKUs in the brand with more than 13 weeks of data
For the selected week x compute cumulative values for below KPIs for each of the SKU under the brand:
1. Sales Value with Innovation
2. Sales Units with Innovation
3. Weighted Distribution
based on the above data calculate for each SKU:
4.Values Sales Velocity =Cumulative Sales Value with Innovation/ Cumulative Weighted Distribution
5.Unit Sales Velocity=Cumulative Sales Units with Innovation/ Cumulative Weighted Distribution
"Find the below percentiles for each of the above 5 KPIs
90
50
10"
"Find the percent rank of the below values using the cumulative array for the respective KPI:
Cumulative Value Sales (entered by the user)
Cumulative Unit Sales (entered by the user)
Weighted Distribution (entered by the user)
Values Sales Velocity (entered by the user)
Unit Sales Velocity (entered by the user)"

This is just one output sheet, there are 5 more output sheets. LOL

Let me know if something is not clear.
Post #1589617
Posted Sunday, July 6, 2014 5:15 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:50 PM
Points: 1,919, Visits: 19,305
think you are going to be busy here.....

there are some regular posters on here that are far more experienced than I in BI...hopefully one will pick this up as well.

out of interest....when you have all the data together for each country...how many distinct products (descriptions) do you have?


______________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Post #1589620
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse