|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Saturday, May 11, 2013 8:17 AM
Points: 460,
Visits: 2,521
|
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Sunday, May 19, 2013 4:00 AM
Points: 533,
Visits: 2,285
|
|
I really enjoy these XML workshops.
Of course, the parsing of delimited data is an important issue for data feeds. If data can be converted into XML first, using AWK or GREP or whatever, it then becomes much easier to gulp it into SQL Server. The biggest questions I've had are: performance. Whenever I've used this XML technology for data feeds, or passing data between routines, it has been very fast, but others using a very similar system have reported it as being very slow. I'm not at all sure why the difference. Resilience Maybe I've been unlucky, but I've had many feeds to deal with that occasionally spit out data that crashes the simplest, and most obvious, data import systems. I reckon that the best systems can isolate corrupt or incorrect data before the bulk insert and allow inspection of it by the DBA
Any thoughts?
Best wishes,
Phil Factor Simple Talk
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Saturday, May 11, 2013 8:17 AM
Points: 460,
Visits: 2,521
|
|
I agree with you on the first point. Wherever I used the XML approach in our applications, it just worked well. However, I have read some posts where people complained about performance problems. I am not sure if it is because they are using it in an incorrect manner or something else. Or it could be the volume of data...i am not sure.
On the next point, do you think a schema could help?
jacob
.
|
|
|
|
|
Mr or Mrs. 500
      
Group: General Forum Members
Last Login: Sunday, May 19, 2013 4:00 AM
Points: 533,
Visits: 2,285
|
|
A typical problem I've had to face in the past might be that I've got a million or so rows of data from a switch that have to be imported. If they are not imported, then the business runs the risk of leaving a fraud or intrusion undetected. Right in the middle of the million rows is a record or two that is mangled. Rejecting the file isn't an option. The import routine needs to be able to import all the good stuff and leave the bad stuff in a 'limbo' file for manual intervention. Cleaning the data manually before import isn't a good idea either as such things usually are scheduled for the early hours of the morning when the server isn't so busy. Could a schema solve this sort of problem by filtering 'sheep-from-goats' on a record-by-record basis, rather than a document basis?
I'd love to know what causes slow XML processing but, like you, I'll have to wait until it happens to me! I ran some timings a while back with the various parameter-passing techniques and found XML to be as fast as the 'helper-table'/'number table' technique, which is far faster than any iterative technique.
Best wishes,
Phil Factor Simple Talk
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Saturday, May 11, 2013 8:17 AM
Points: 460,
Visits: 2,521
|
|
I understand the problem now. A schema will validate an entire document. I do not think that a schema can be used to filter "bad records" and process the "good" ones. I guess the only option available is to query the XML data and retrieve the "good" records (and retrieve the "bad" onese and dumb to a table or XML file for manual review).
regards Jacob
.
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Yesterday @ 9:57 PM
Points: 32,906,
Visits: 26,790
|
|
Right in the middle of the million rows is a record or two that is mangled. Rejecting the file isn't an option. The import routine needs to be able to import all the good stuff and leave the bad stuff in a 'limbo' file for manual intervention.
BCP will do just that very nicely... second in speed only to Bulk Insert which does not have such a capability.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Monday, January 30, 2012 5:12 AM
Points: 40,
Visits: 143
|
|
| This method is good. But it adding 6 additional characters to the real data value. It means, suddenly your string will not fit into allocated number of characters. You have to be careful.
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Saturday, May 11, 2013 8:17 AM
Points: 460,
Visits: 2,521
|
|
I dont think this approach is good for Large Chunks of data. It is handy when you have a small piece of delimited string and you want to break it into a relational table quickly.
.
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Friday, May 17, 2013 12:22 PM
Points: 10,571,
Visits: 11,871
|
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Wednesday, November 11, 2009 2:15 PM
Points: 1,
Visits: 18
|
|
| I agree it's an interesting "concept", but in the real world; where I have to process MILLIONS of records in a routine, I don't see it working....
|
|
|
|