Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Problem with an XML and complex logic?


Problem with an XML and complex logic?

Author
Message
ivo.st.ivanov
ivo.st.ivanov
Forum Newbie
Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)Forum Newbie (8 reputation)

Group: General Forum Members
Points: 8 Visits: 167
Hi!

I will be pleased if you can help me about my task.

I have an input XML, which I have to parse in the database. There will be such XML input file everyday and its size will be around 2GB.

Here it is and example (There are some general tags with data, which are not so important for now and I will skip them):

'<Analytic>
<SynthRefN>323023522058533599262474377795</SynthRefN>
<SynthCode>SRCIBA</SynthCode>
<TransferDescription>it is of no importance</TransferDescription>
<AnalyticRefN>267168</AnalyticRefN>
<Credit>
<AnalyticAccount>
<AccountNumber>UIO89991551400576783</AccountNumber>
</AnalyticAccount>
<SyntAccount>
<AccountNumber>UIO89991551400576783</AccountNumber>
</SyntAccount>
<Amount>120</Amount>
<LocalAmount>120</LocalAmount>
<Currency>975</Currency>
</Credit>
<Debit>
<AnalyticAccount>
<AccountNumber>000011978530101011</AccountNumber>
</AnalyticAccount>
<SyntAccount>
<AccountNumber>000011978530101011</AccountNumber>
</SyntAccount>
<Amount>120</Amount>
<LocalAmount>120</LocalAmount>
<Currency>975</Currency>
</Debit>
<PostingDate>2013-06-04</PostingDate>
</Analytic>
'

I will also have a predefined configuration xml file, which will be stored in the database. It will have some rules for transformation of the input xml data.

The main fields which we have to take attention from the Input XML are:

<SynthCode></SynthCode>

<Credit>
<AnalyticAccount>
<AccountNumber></AccountNumber>
</AnalyticAccount>

and

<Debit>
<AnalyticAccount>
<AccountNumber></AccountNumber>
</AnalyticAccount>

The value for the both the credit and debit <AccountNumber></AccountNumber> sections can come in three different formats:

1) Like this -> UIO89991551400576783. Starting always with 3 Symbols (not numbers) and with a fixed length. Always!

2) Like this -> 000011978530101011. Fixed length and only with numbers.

3) Like this -> 1197. This we can consider as a template (I will explain later).


The main idea is that we should extract some info in the form of fields from this <AccountNumber></AccountNumber> sections. I will explain how.

If we have the first format of the <AccountNumber></AccountNumber> -> UIO89991551400576783 (rules are the same for both the Credit and Debit sections), we have to get the needed information as fields from other database through openquery()...

If we have the second format of the <AccountNumber></AccountNumber> -> 000011978530101011, the needed fields are hidden inside this number and the rules are:

   select
      SUBSTRING(000011978530101011, 1, 6) as field1,
      SUBSTRING(000011978530101011, 7, 3) as field2,
      SUBSTRING(000011978530101011, 10, 4) as field3,
      SUBSTRING(000011978530101011, 14, 2) as field4,
      SUBSTRING(000011978530101011, 16, 3) as field5
      
If we have the third format <AccountNumber></AccountNumber> -> a short number like 1197, we have to extract the fields from the other section. To explain: In the first two formats of the data, we do the same for the Credit and for the Debit sections.

In the third format, if (for example) the Credit section <AccountNumber></AccountNumber> value is of the second format 000011978530101011, and the Debit section <AccountNumber></AccountNumber> value is of the third format like 1197, we have to get the information for the Debit section from the Credit section. Same goes for Debit/Credit and Credit/Debit case. If one of them is in third format, it should use the other section to extract the information.

The basic rule will be - replace the SUBSTRING(000011978530101011, 10, 4) of the second format with the 4 digit-number 1197 in third format. This way, for the debit section we will have 000011978 1197 01011. Now it is actually a second format and we can substring it to gain the needed fields.

My question is: how to do all this checking and rules on a set-based query, because as I said, the file is huge (2GB) and it will be input every day?
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search