problem importing data from a huge xml file 1g

  • Hi,

    Here is my code it works fine with small xml files but when the xml file is of size 1g it does not work .

    Can someone help me I want to figure out a solution

    USE [SP3]

    GO

    /****** Object: StoredProcedure [dbo].[usp_user_longimportInfo_FileName_trial1] Script Date: 05/10/2010 16:09:59 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ---- Good One ---- for content

    ALTER procedure [dbo].[usp_user_longimportInfo_FileName_trial1]

    (@XMLFileName as nvarchar(100))

    as

    begin

    set @XMLFileName='c:\trial\shortperson.xml'

    create table #tempPerson ([UserID] [int] IDENTITY(1,1) NOT NULL,

    [Email] [nvarchar](50) NULL,

    [Pass] [nvarchar](20) NULL,

    [FirstName] [nvarchar](20) NULL,

    [LastName] [nvarchar](40) NULL,

    [DisplayName] [nvarchar](50) NULL,

    [Profile] [nvarchar](max) NULL,

    [DisplayEmail] [nvarchar](50) NULL,

    [CellPhone] [nvarchar](20) NULL,

    [UpdatedBy] [int] NULL,

    [UpdateDate] [datetime] NULL,

    [Deleted] [bit] NULL)

    DECLARE @TempCurrentTime datetime

    DECLARE @userid int

    Declare @UpdateDate datetime

    Declare @Email nvarchar(50)

    DECLARE @pass NVARCHAR(20)

    DECLARE @FirstName NVARCHAR(20)

    DECLARE @LastName nvarchar(40)

    Declare @DisplayName nvarchar(50)

    Declare @Organization int

    Declare @UserAccessRole int

    Declare @OrgUserID nchar(12)

    Declare @OrgPassword nvarchar(50)

    Declare @OrgUserName nvarchar(50)

    declare @mySQL nvarchar(max)

    declare @PersonXML xml

    declare @ParamDefinition nvarchar(500)

    SELECT @TempCurrentTime = GETDATE()

    select @UpdateDate=@TempCurrentTime

    -- retrieve the file content as xml

    set @mysql=N'select @PersonXML=

    CONVERT(xml, BulkColumn, 2) FROM

    OPENROWSET(Bulk '''+ @XMLFileName+''', SINGLE_BLOB ) [rowsetresults]'

    Set @ParamDefinition = '@XMLFileName nvarchar(max),@PersonXML XML out'

    Execute sp_Executesql@mySql,

    @ParamDefinition,

    @XMLFileName,

    @PersonXML out

    insert into #tempPerson (Email,Pass,FirstName,LastName,DisplayName,[Profile],DisplayEmail,CellPhone,UpdatedBy,UpdateDate,deleted)

    (SELECT TempXML.Node.value('(email)[1]','nvarchar(50)') as Email,

    ' ',

    TempXML.Node.value('(name/n/given)[1]', 'nVARCHAR(50)') as FirstName,

    TempXML.Node.value('(name/n/family)[1]', 'VARCHAR(50)') as LastName,

    TempXML.Node.value('(name/fn)[1]','nvarchar(50)') as DisplayName,

    ' ',

    TempXML.Node.value('(email)[1]','nvarchar(50)') as DisplayEmail,

    ' ',1,GETDATE(),0

    FROM @PersonXML.nodes('/enterprise/person') TempXML (Node))

    select * from #tempPerson

    insert into dbo.longPersonError

    select USERID, Email

    from(

    MERGE SP.UserTrial

    USING (SELECT Email,Pass,FirstName,LastName,DisplayName,[Profile],DisplayEmail,CellPhone,UpdatedBy,UpdateDate,deleted from #tempPerson) AS source

    ON (SP.UserTrial.Email = source.Email)

    WHEN Not MATCHED THEN

    INSERT (Email,Pass,FirstName,LastName,DisplayName,[Profile],DisplayEmail,CellPhone,UpdatedBy,UpdateDate,deleted)

    VALUES (source.Email,source.Pass,source.FirstName,source.LastName,source.DisplayName,source.[Profile],source.DisplayEmail,source.CellPhone,source.UpdatedBy,source.UpdateDate,source.deleted)

    when matched then

    UPDATE SET SP.UserTrial.Email = SP.UserTrial.Email

    OUTPUT $ACTION, deleted.userid,deleted.Email ) AS CHANGES (Action,userid,email)

    WHERE Action = 'UPDATE';

    end

  • Not 100% sure but I believe you've simply hit the wall on the XML datatype. It can only hold 2GB of normal text for the XML. If it's Unicode Text, then you only get a half of that... 1GB.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Any solution for that

  • sohairzaki (5/10/2010)


    Any solution for that

    It depends... where does the data come from and why can't it be split up a bit at the source?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I used ssis xml task xpath the original file was of the following form

    ______________

    My input file has the following format

    <enterprise>

    <person>

    <sourcedid>

    <source>111111</source>

    <id>22222</id>

    </sourcedid>

    <name>

    <fn>xxxxxxx</fn>

    <n>

    <family>yyyyy</family>

    <given>zzzzzz</given>

    </n>

    </name>

    <demographics>

    <gender>2</gender>

    </demographics>

    <email>xxxxxx@fffff.edu</email>

    <adr>

    <street>cccccc</street>

    <locality>ccccc</locality>

    <region>cccc</region>

    <pcode>ccccccc</pcode>

    </adr>

    <academics>

    <academicmajor>gggggg</academicmajor>

    <customrole>hhhhhh</customrole>

    <customrole>dddddd</customrole>

    <customrole>cccccc</customrole>

    </academics>

    </person>

    <person>

    </person>

    </enterprise>

    ______________________

    out put was in the form of

    <ResultRootNode>

    <fn> xxxx</fn><family>cccc</family><email>xxxxx<email><customrole>ggggg</customrole>

    <customrole>kkkk</customrole>

    <customrole>ffffff</customrole>

    <fn> hhhh</fn><family>jjjj</family><email>kkkk<email><customrole>ggggg</customrole>

    <customrole>sssss</customrole><customrole>wwww</customrole>

    </ResultRootNode>

    _________________

    I want my output to be in the form of

    <ResultRootNode>

    <person>

    <fn> xxxx</fn><family>cccc</family><email>xxxxx<email><customrole>ggggg</customrole>

    <customrole>kkkk</customrole>

    <customrole>ffffff</customrole>

    </person>

    <person>

    <fn> hhhh</fn><family>jjjj</family><email>kkkk<email><customrole>ggggg</customrole>

    <customrole>sssss</customrole><customrole>wwww</customrole>

    </person>

    </ResultRootNode>

    ____________________

    see attached document for more explanation

    Thanks for your help in advance

  • You never stated what the error message is or what about the large file doesn't work.

    I'm just thinking that when dealing with 1 GB sizes XML files, if you only require a much smaller amount of data from specific nodes, then query it out at the application level and then pass just that to a stored procedure, perhaps as multiple calls.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thanks

  • Another reason might be the way you actually query the data.

    Instead of going three levels deep in your xml query I'd rather use CROSS APPLY (as recommended in the other thread reagarding the very same issue).

    I have to second Eric regarding the missing error message. It would help a lot if you could post it so we can narrow down the problem.

    Sometimes the term "it does not work" simply relates to an extreme long execution time causing the query to be stopped manually... 😉



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • sohairzaki (5/11/2010)


    I used ssis xml task xpath the original file was of the following form

    ______________

    My input file has the following format

    <enterprise>

    <person>

    <sourcedid>

    <source>111111</source>

    <id>22222</id>

    </sourcedid>

    <name>

    <fn>xxxxxxx</fn>

    <n>

    <family>yyyyy</family>

    <given>zzzzzz</given>

    </n>

    </name>

    <demographics>

    <gender>2</gender>

    </demographics>

    <email>xxxxxx@fffff.edu</email>

    <adr>

    <street>cccccc</street>

    <locality>ccccc</locality>

    <region>cccc</region>

    <pcode>ccccccc</pcode>

    </adr>

    <academics>

    <academicmajor>gggggg</academicmajor>

    <customrole>hhhhhh</customrole>

    <customrole>dddddd</customrole>

    <customrole>cccccc</customrole>

    </academics>

    </person>

    <person>

    </person>

    </enterprise>

    ______________________

    out put was in the form of

    <ResultRootNode>

    <fn> xxxx</fn><family>cccc</family><email>xxxxx<email><customrole>ggggg</customrole>

    <customrole>kkkk</customrole>

    <customrole>ffffff</customrole>

    <fn> hhhh</fn><family>jjjj</family><email>kkkk<email><customrole>ggggg</customrole>

    <customrole>sssss</customrole><customrole>wwww</customrole>

    </ResultRootNode>

    _________________

    I want my output to be in the form of

    <ResultRootNode>

    <person>

    <fn> xxxx</fn><family>cccc</family><email>xxxxx<email><customrole>ggggg</customrole>

    <customrole>kkkk</customrole>

    <customrole>ffffff</customrole>

    </person>

    <person>

    <fn> hhhh</fn><family>jjjj</family><email>kkkk<email><customrole>ggggg</customrole>

    <customrole>sssss</customrole><customrole>wwww</customrole>

    </person>

    </ResultRootNode>

    ____________________

    see attached document for more explanation

    Thanks for your help in advance

    Thanks for that but what I meant by "source" was where does the data actually come from? Who ever it is, you need to have them limit the size of their files/XML Docs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My code takes long to execute. I need help to reduce the execution time

    1) I have an xml document big one I reduced the size using xslt and ssis

    2) i used set @mysql=N'select @PersonXML=

    CONVERT(xml, BulkColumn, 2) FROM

    OPENROWSET(Bulk '''+ @XMLFileName+''', SINGLE_BLOB ) [rowsetresults]'

    to retrieve the content of the xml file to an xml variable

    3) used a temporary table then merge from temporary to target table because my target table has a unique key on email so I wanted to log the ones which were not inserted and log those duplicates in a log table

    here is my code is there a better way to make it faster

    USE [SP3]

    GO

    /****** Object: StoredProcedure [dbo].[usp_user_longimportInfo_FileName_trial1] Script Date: 05/11/2010 13:54:53 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ---- Good One ---- for content

    ALTER procedure [dbo].[usp_user_longimportInfo_FileName_trial1]

    (@XMLFileName as nvarchar(100))

    as

    begin

    set @XMLFileName='c:\trialewnewlongperson.xml'

    create table #tempPerson ([UserID] [int] IDENTITY(1,1) NOT NULL,

    [Email] [nvarchar](50) NULL,

    [Pass] [nvarchar](20) NULL,

    [FirstName] [nvarchar](20) NULL,

    [LastName] [nvarchar](40) NULL,

    [DisplayName] [nvarchar](50) NULL,

    [Profile] [nvarchar](max) NULL,

    [DisplayEmail] [nvarchar](50) NULL,

    [CellPhone] [nvarchar](20) NULL,

    [UpdatedBy] [int] NULL,

    [UpdateDate] [datetime] NULL,

    [Deleted] [bit] NULL)

    DECLARE @TempCurrentTime datetime

    DECLARE @userid int

    Declare @UpdateDate datetime

    Declare @Email nvarchar(50)

    DECLARE @pass NVARCHAR(20)

    DECLARE @FirstName NVARCHAR(20)

    DECLARE @LastName nvarchar(40)

    Declare @DisplayName nvarchar(50)

    Declare @Organization int

    Declare @UserAccessRole int

    Declare @OrgUserID nchar(12)

    Declare @OrgPassword nvarchar(50)

    Declare @OrgUserName nvarchar(50)

    declare @mySQL nvarchar(max)

    declare @PersonXML xml

    declare @ParamDefinition nvarchar(500)

    SELECT @TempCurrentTime = GETDATE()

    select @UpdateDate=@TempCurrentTime

    -- retrieve the file content as xml

    set @mysql=N'select @PersonXML=

    CONVERT(xml, BulkColumn, 2) FROM

    OPENROWSET(Bulk '''+ @XMLFileName+''', SINGLE_BLOB ) [rowsetresults]'

    Set @ParamDefinition = '@XMLFileName nvarchar(max),@PersonXML XML out'

    Execute sp_Executesql@mySql,

    @ParamDefinition,

    @XMLFileName,

    @PersonXML out

    insert into #tempPerson (Email,Pass,FirstName,LastName,DisplayName,[Profile],DisplayEmail,CellPhone,UpdatedBy,UpdateDate,deleted)

    (SELECT TempXML.Node.value('(email)[1]','nvarchar(50)') as Email,

    ' ',

    TempXML.Node.value('(firstname)[1]', 'nVARCHAR(50)') as FirstName,

    TempXML.Node.value('(lastname)[1]', 'VARCHAR(50)') as LastName,

    TempXML.Node.value('(displayname)[1]','nvarchar(50)') as DisplayName,

    ' ',

    TempXML.Node.value('(email)[1]','nvarchar(50)') as DisplayEmail,

    ' ',1,GETDATE(),0

    FROM @PersonXML.nodes('/enterprise/person') TempXML (Node))

    select * from #tempPerson

    insert into dbo.longPersonError

    select USERID, Email

    from(

    MERGE SP.UserTrial

    USING (SELECT Email,Pass,FirstName,LastName,DisplayName,[Profile],DisplayEmail,CellPhone,UpdatedBy,UpdateDate,deleted from #tempPerson) AS source

    ON (SP.UserTrial.Email = source.Email)

    WHEN Not MATCHED THEN

    INSERT (Email,Pass,FirstName,LastName,DisplayName,[Profile],DisplayEmail,CellPhone,UpdatedBy,UpdateDate,deleted)

    VALUES (source.Email,source.Pass,source.FirstName,source.LastName,source.DisplayName,source.[Profile],source.DisplayEmail,source.CellPhone,source.UpdatedBy,source.UpdateDate,source.deleted)

    when matched then

    UPDATE SET SP.UserTrial.Email = SP.UserTrial.Email

    OUTPUT $ACTION, deleted.userid,deleted.Email ) AS CHANGES (Action,userid,email)

    WHERE Action = 'UPDATE';

    end

  • where shall i use the crossapply

    thanks a lot

  • sohairzaki (5/11/2010)


    where shall i use the crossapply

    thanks a lot

    Use it like the sample I posted in one of your other threads ("Node.value and insert the result to an existing table").

    The sad part of having an OP posting the same issue in multiple threads is the OP himself/herself losing track of the numerous posts and the answers provided...

    @jeff: I need to walk the hippo... 😉



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply