Automation to find element path in xml files

  • Hi,

    I am trying to automate some manual process where i have to read some attributes and insert their element path from xml files stored in disk.

    Here i am trying to create a scenario with 2 xml files.In original scenario there can be 30+ files. All these xml files are stored at one place.

    Following is the xml structure for emp and emp1 xml files

    --emp.xml

    <?xml version="1.0" encoding="utf-8"?>

    <EmployeeDetails RunDate="2016-03-07">

    <Employees EmployeesCount="3">

    <Employee Name="Ravi" Age="26" City="Bangalore" />

    <Employee Name="Raj" Age="29" City="Delhi" />

    <Employee Name="Amit" Age="22" City="Mumbai" />

    </Employees>

    </EmployeeDetails>

    --emp1.xml

    <?xml version="1.0" encoding="utf-8"?>

    <EmployeeDocument>

    <EmployeeDetails RecordCount="3" RunDate="2016-03-07"/>

    <Employees>

    <Employee>

    <ID>"35621"</ID>

    <Name>"Ravi"</Name>

    <Age>"25"</Age>

    <City>"Bangalore"</City>

    </Employee>

    <Employee>

    <ID>"336561"</ID>

    <Name>"Raj"</Name>

    <Age>"45"</Age>

    <City>"kolkata"</City>

    </Employee>

    <Employee>

    <ID>"896561"</ID>

    <Name>"Amit"</Name>

    <Age>"34"</Age>

    <City>"chennai"</City>

    </Employee>

    </Employees>

    </EmployeeDocument>

    As you have seen both xml file structure is different.

    I have to get the RunDate,EmployeesCount and RecordCount attributes path. As you can see RunDate will be constant in all file structures.But when come to count it will be different for files here it is EmployeesCount and RecordCount.

    Manually i can identify the count name in each file, so that it can be passed as param and find the element path.

    After finding the path i want to store this into a table

    Create table #Validationrule

    (

    Filename varchar(200),

    Rulename varchar(1000),

    ElementPath varchar(4000)

    )

    The final output should be like this

    Filename Rulename ElementPath

    Emp DateValidation EmployeeDetails/@RunDate

    Emp CountValidation EmployeeDetails/Employees|EmployeeDetails/Employees/@EmployeesCount

    Emp1 DateValidation EmployeeDocument/EmployeeDetails/@RunDate

    Emp1 CountValidation EmployeeDocument/Employees|EmployeeDocument/EmployeeDetails/@RecordCount

    The attributes for which i have to find element path should pass as csv

    in the above example it should be like this

    @param varchar(max) = 'RunDate,EmployeesCount,RecordCount'

    If any further info needed pls let me know

    Thanks in advance

  • Here's some code to get you started. Note my comments:

    -- Two variables designed to emulate your XML files

    DECLARE @emp XML =

    '<?xml version="1.0" encoding="utf-8"?>

    <EmployeeDetails RunDate="2016-03-07">

    <Employees EmployeesCount="3">

    <Employee Name="Ravi" Age="26" City="Bangalore" />

    <Employee Name="Raj" Age="29" City="Delhi" />

    <Employee Name="Amit" Age="22" City="Mumbai" />

    </Employees>

    </EmployeeDetails>';

    DECLARE @emp1 XML =

    '<?xml version="1.0" encoding="utf-8"?>

    <EmployeeDocument>

    <EmployeeDetails RecordCount="3" RunDate="2016-03-07"/>

    <Employees>

    <Employee>

    <ID>"35621"</ID>

    <Name>"Ravi"</Name>

    <Age>"25"</Age>

    <City>"Bangalore"</City>

    </Employee>

    <Employee>

    <ID>"336561"</ID>

    <Name>"Raj"</Name>

    <Age>"45"</Age>

    <City>"kolkata"</City>

    </Employee>

    <Employee>

    <ID>"896561"</ID>

    <Name>"Amit"</Name>

    <Age>"34"</Age>

    <City>"chennai"</City>

    </Employee>

    </Employees>

    </EmployeeDocument>';

    -- Your table

    IF OBJECT_ID('tempdb..#Validationrule') IS NOT NULL DROP TABLE #Validationrule;

    Create table #Validationrule

    (

    Filename varchar(200),

    Rulename varchar(1000),

    ElementPath varchar(4000)

    );

    -- You'll have to decide how to iterate through the XML files

    -- ITERATION #1: Emp

    INSERT #Validationrule

    SELECT 'Emp', 'DateValidation', xRD.RD.value('(@RunDate)[1]','varchar(100)')

    FROM (VALUES (@emp))t(x)

    CROSS APPLY x.nodes('//EmployeeDetails') xRD(RD)

    UNION ALL

    SELECT 'Emp', 'CountValidation', CAST(COUNT(*) AS varchar(100))

    FROM (VALUES (@emp))t(x)

    CROSS APPLY x.nodes('//Employees/Employee') xCV(CV);

    -- ITERATION #2: Emp1

    INSERT #Validationrule

    SELECT 'Emp', 'DateValidation', xRD.RD.value('(@RunDate)[1]','varchar(100)')

    FROM (VALUES (@emp1))t(x)

    CROSS APPLY x.nodes('//EmployeeDetails') xRD(RD)

    UNION ALL

    SELECT 'Emp', 'CountValidation', CAST(COUNT(*) AS varchar(100))

    FROM (VALUES (@emp1))t(x)

    CROSS APPLY x.nodes('//Employees/Employee') xCV(CV);

    -- OUTPUT:

    SELECT * FROM #Validationrule;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • thanks Alan for giving some idea. Here what i had tried till now, it is just in initial stage

    --file path location input param

    DECLARE @Filepath varchar(max) = 'C:\Files'

    --load the file in temp table

    CREATE TABLE #File_Temp(ID int identity, Filename varchar(max))

    declare @cmd varchar(350)

    set @cmd = 'DIR /B /A-D ' + @Filepath

    INSERT INTO #File_Temp

    --exec master.dbo.xp_dirtree @Filepath,1,11,1

    exec master.dbo.xp_cmdshell @cmd

    SELECT * FROM #File_Temp

    --To get the no of file in that path for looping purpose

    DECLARE @Count int

    SELECT @Count = max(ID) From #File_Temp

    WHERE Filename IS NOT NULL

    SELECT @Count AS counts

    DECLARE @i int = 1

    DECLARE @File_xml xml

    WHILE(@i <= @Count)

    BEGIN

    DECLARE @Filename varchar(max)

    SELECT @Filename = Filename from #File_Temp

    DECLARE @path varchar(max)

    SET @path = ''''+@Filepath+'\'+@Filename+''''

    --to load the first file into xml variable

    SELECT @File_xml = CONVERT (XML, BulkColumn)

    FROM OPENROWSET (BULK @path, SINGLE_BLOB) AS XmlData;

    END

    DROP TABLE #File_Temp

    in the above code @File_xml is not loaded as due to error trying to rectifying that by writing dynamic query but still there is issue

    DECLARE @File_xml xml

    DECLARE @sql nvarchar(max)

    SET @sql =

    'SELECT CONVERT (XML, BulkColumn)

    FROM OPENROWSET (BULK' +@path+', SINGLE_BLOB) AS XmlData'

    EXEC @sql

    once the first xml file is loaded into the variable then i think the next step will be search the xml files and locate the root path of the element.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply