Value to Column Relationships ?

  • I have parsed data which has several "segments" and a few hundred fields total. I have a table which list the segments. And I have another table (we'll call it FIELD_NAMES) which lists the field names and the IDs of the fields. The critical column from that table is "FIELD_ID". So there will be a few hundred values (field IDs) in the "FIELD_ID" column of the FIELD_NAMES table.

    Now let's think about where to store the parsed data. I could design the data storage around the data type. So if we know that we have 3 data types, then we'll have 3 tables (we'll call them HUGE_TABLE1, HUGE_TABLE2, HUGE_TABLE3) like this:

    HUGE_TABLE1:

    AUTO_NUMBER (bigint) (primary key)

    TRANSACTION_NO (uniqueidentifier)

    FIELD_ID (char (3))

    D_VALUE (float)

    HUGE_TABLE2:

    AUTO_NUMBER (bigint) (primary key)

    TRANSACTION_NO (uniqueidentifier)

    FIELD_ID (char (3))

    C_VALUE (varchar (200))

    HUGE_TABLE3:

    AUTO_NUMBER (bigint) (primary key)

    TRANSACTION_NO (uniqueidentifier)

    FIELD_ID (char (3))

    N_VALUE (int)

    Then we'd put the float data in the D_VALUE field of HUGE_TABLE1, the character data in C_VALUE field of HUGE_TABLE2, and the numeric data in N_VALUE column of HUGE_TABLE3.

    The value for the TRANSACTION_NO would recur as many times as there are fields that carried data in that transaction. The TRANSACTION_NO plus FIELD_ID columns together would be unique.

    If I designed it like that, then the relationship is simple. I just create the relationship between the FIELD_ID column in HUGE_TABLE, and the FIELD_ID column in the FIELD_NAMES table.

    But a problem is the amount of data coming in. From one company we could easily see 100,000 of these transactions per day. If they have 200 fields per transaction, then you're up to 20,000,000 rows of data per day from that one company. so that's over 7 billion records per year from that one company. OK, so that is manageable for one company. But with multiple companies' transaction coming thru, it gets to be a lot of rows. Also, the length of the data varies between fields. So the column holding the values would not have a good fit with the size of the data. (But the length is no more than 200 for any field.)

    So I'd like to set up the tables differently. When the data is parsed I'm hoping to put the value of each field into a column which holds only data for that field. We could make one table with a column for each of the few hundred fields. But that would be unwieldy. and not all of the fields are sent with each transaction. So there would be a lot of empty columns on any given row.

    So a better way might be to make one table for each segment of the transaction. In a given segment's table there is a column for each of that segment's fields in the parsed data. So a given table might look like this. Let's just say for example that 3 of the fields are in a segment called "Insurance". So we might give the table the name SEG_INSURANCE. And the 3 fields in that segment have IDs of DOR, FAD, and COJ. (The real field IDs are assigned by a standards-making entity.) We know the exact size and data types of each of these fields. So

    SEG_INSURANCE:

    AUTO_NUMBER (bigint) (primary key)

    TRANSACTION_NO (uniqueidentifier)

    DOR (varchar (17))

    FAD (int)

    COJ (char (2))

    The value for TRANSACTION_NO would occur only once in this table. So we could make it the primary key if we wished. With this design we go from 20,000,000 rows of data per day to 100,000 for the hypothetical company. And also we have the data divided by field and segment, with data types and sizes fitting nicely.

    Now what I don't know how to handle is the relationships. my column names of DOR, FAD and COJ are not column names in another table. they are values from the FIELD_ID column in the FIELD_NAMES table. I only have an easy relationship for the TRANSACTION_NO column. but it seems like I should have some relationship between DOR and another table, between FAD and another table, and between COJ and another table. What am I not understanding here?

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply