Problem with an XML and complex logic?

  • Hi!

    I will be pleased if you can help me about my task.

    I have an input XML, which I have to parse in the database. There will be such XML input file everyday and its size will be around 2GB.

    Here it is and example (There are some general tags with data, which are not so important for now and I will skip them):

    '<Analytic>

    <SynthRefN>323023522058533599262474377795</SynthRefN>

    <SynthCode>SRCIBA</SynthCode>

    <TransferDescription>it is of no importance</TransferDescription>

    <AnalyticRefN>267168</AnalyticRefN>

    <Credit>

    <AnalyticAccount>

    <AccountNumber>UIO89991551400576783</AccountNumber>

    </AnalyticAccount>

    <SyntAccount>

    <AccountNumber>UIO89991551400576783</AccountNumber>

    </SyntAccount>

    <Amount>120</Amount>

    <LocalAmount>120</LocalAmount>

    <Currency>975</Currency>

    </Credit>

    <Debit>

    <AnalyticAccount>

    <AccountNumber>000011978530101011</AccountNumber>

    </AnalyticAccount>

    <SyntAccount>

    <AccountNumber>000011978530101011</AccountNumber>

    </SyntAccount>

    <Amount>120</Amount>

    <LocalAmount>120</LocalAmount>

    <Currency>975</Currency>

    </Debit>

    <PostingDate>2013-06-04</PostingDate>

    </Analytic>

    '

    I will also have a predefined configuration xml file, which will be stored in the database. It will have some rules for transformation of the input xml data.

    The main fields which we have to take attention from the Input XML are:

    <SynthCode></SynthCode>

    <Credit>

    <AnalyticAccount>

    <AccountNumber></AccountNumber>

    </AnalyticAccount>

    and

    <Debit>

    <AnalyticAccount>

    <AccountNumber></AccountNumber>

    </AnalyticAccount>

    The value for the both the credit and debit <AccountNumber></AccountNumber> sections can come in three different formats:

    1) Like this -> UIO89991551400576783. Starting always with 3 Symbols (not numbers) and with a fixed length. Always!

    2) Like this -> 000011978530101011. Fixed length and only with numbers.

    3) Like this -> 1197. This we can consider as a template (I will explain later).

    The main idea is that we should extract some info in the form of fields from this <AccountNumber></AccountNumber> sections. I will explain how.

    If we have the first format of the <AccountNumber></AccountNumber> -> UIO89991551400576783 (rules are the same for both the Credit and Debit sections), we have to get the needed information as fields from other database through openquery()...

    If we have the second format of the <AccountNumber></AccountNumber> -> 000011978530101011, the needed fields are hidden inside this number and the rules are:

    select

    SUBSTRING(000011978530101011, 1, 6) as field1,

    SUBSTRING(000011978530101011, 7, 3) as field2,

    SUBSTRING(000011978530101011, 10, 4) as field3,

    SUBSTRING(000011978530101011, 14, 2) as field4,

    SUBSTRING(000011978530101011, 16, 3) as field5

    If we have the third format <AccountNumber></AccountNumber> -> a short number like 1197, we have to extract the fields from the other section. To explain: In the first two formats of the data, we do the same for the Credit and for the Debit sections.

    In the third format, if (for example) the Credit section <AccountNumber></AccountNumber> value is of the second format 000011978530101011, and the Debit section <AccountNumber></AccountNumber> value is of the third format like 1197, we have to get the information for the Debit section from the Credit section. Same goes for Debit/Credit and Credit/Debit case. If one of them is in third format, it should use the other section to extract the information.

    The basic rule will be - replace the SUBSTRING(000011978530101011, 10, 4) of the second format with the 4 digit-number 1197 in third format. This way, for the debit section we will have 000011978 1197 01011. Now it is actually a second format and we can substring it to gain the needed fields.

    My question is: how to do all this checking and rules on a set-based query, because as I said, the file is huge (2GB) and it will be input every day?

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply