Click here to monitor SSC
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in
Home       Members    Calendar    Who's On

Add to briefcase ««12

Serialising tabular and relational data. Expand / Collapse
Posted Tuesday, March 15, 2011 11:15 AM



Group: General Forum Members
Last Login: 2 days ago @ 2:52 PM
Points: 2,537, Visits: 2,963
While it would be nice to have a format that meets all of Phil's expectations, I don't know how practical it would be for vendors to create and support such a format for what you would gain from such a standard. The biggest problem I've had in my 17 years in IT, is that the tools we use for ETL, in an attempt to become more powerfull or flexible, have actually made it harder to work with the formats we do have available.

CSV, tab, pipe symbol delimited files, or even fixed width files may not have metadata and may not support unicode, but are still frequently used because they are the lowwest common denominator, and will do for the vast majority of situations. However, thinking back to the early-mid 90's in my FoxPro days, it seemed you could read in and export out these text files, or even spreadsheets or other simple formats quite easily compared with trying to do the same thing in DTS with SQL Server, and is even more difficult in SSIS. The ETL tools are what is failing us and make our lives more difficult than it needs to be
Post #1078532
Posted Tuesday, March 15, 2011 12:01 PM

Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Wednesday, December 7, 2016 3:10 PM
Points: 662, Visits: 2,926
Well, what I have in mind is a strategic attack on the problem, once we are agreed as to what the problem is, and we find enough enthusiastic people to help out.
1/ First phase. Work towards a standard that is simple and resilient, by discussion and experiment. Use the JSON philosophy of keeping the standard as bullet-proof as possible. I think that one might even get a backward compatibility with the real CSV. Test it out with code that we can throw together quickly; get something that meets the core requirements well. Learn from history (e.g. the over-complexity of YAML, the ridiculous volubility of XML). Try informal tests such as copying AdventureWorks to PostgreSQL, and back again, and then comparing them.
2/ Define the first draft standard. Test it until it really hurts.
3/ Publish v1 of the draft standard
4/ Create good parsers in a variety of languages, and ways of testing files for correctness.
5/ Develop ETL tools for SQL Server and other RDBMS, and conversion tools.
6/ Campaign for filters for the main office tools.
7/ When it becomes worthwhile for ordinary folks to use it then publicize it.

I have certain questions
Should the standard support merged cells (by row and columns) to support spreadsheet import/export and representation as HTML tables.
Should a file allow more than one 'result' or table?
What form should foreign references take?
How should one specify constraints?
...and so on...

Best wishes,

Phil Factor
Simple Talk
Post #1078568
Posted Tuesday, March 15, 2011 2:29 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, March 27, 2011 7:53 AM
Points: 2, Visits: 16
Phil, I think I agree with your approach, and I would count myself as one of the enthusiastic people.
To have my say on your questions:
- I wouldn't support merged cells. I think it would cause too much confusion in situations that didn't involve spreadsheets.
- The format should allow more than one table, otherwise it's not adding enough value
- Foreign key references and constraints should themselves take the form of tables in the file.

Here's another example:

1|Joe Brighton|2 Simple Terrace[nl]Manchester
2|Mary Bondi|18 Ocean Road[nl]Sydney
3|Sam Shore|21 Breaker Way[nl]Adelaide

1|The Road to Brighton|1
2|Brighton and beyond|1
3|Back to Bondi|2
4|By the shore|3

This example illustrates a few more points about the RDX format:
- The foreign key constraints are defined by the reserved table RdxForeignKeys. You can see that the AuthorId in the Books table refers to the AuthorId in the Authors table.
- Other constraints (datatypes, primary keys, value ranges) could similarly be defined by other reserved tables defined in the standard. This has the benefit of not mixing up the formatting of data within the file. Everything just remains a table.
- Blank lines are supported by the format by being completely ignored. They are useful for separating tables though.
- The [nl] escape sequence represents my current thinking on how to embed a new line character within a record. I prefer it to as it seems easier to read, which I think is a key requirement for the standard. Other escape sequences would be needed to represent the other special characters in the format.
Post #1078657
Posted Thursday, May 12, 2011 9:40 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, April 26, 2016 7:43 PM
Points: 4, Visits: 158
I think that was EDI was/is/was supposed to be. Including the metadata (i.e. XSD) along with the data seems to conflict with the file being compact, and easy to be read by commodity tools. For many of my needs, where 100's of the same signature data is being processed, it's less about communicating the meta data with each file as much as asserting that the file meets some standard of metadata.

With this standard that is being proposed, how about the ability to either 1) include the metadata inline the file, or 2) reference a URI for the metadata?
Post #1108180
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse