Content metadata

  • I'm the DBA for a small but soon-to-get-much-larger SQL Server database. There is a growing need to implement some content metadata, preferably using a set of elements from a recognised metadata standard (e.g. Dublic Core).

    We need the metadata stored in the database, because we have a reporting tool that works on the database and needs to be able to extract the metadata in order to put caveats, dates and other metadata in our reports.

    However, I'm not sure what the best way to implement this is. Most metadata-related articles and research on the net relate to metadata for webpages, rather than in databases. And most metadata, like Dublin Core, tends to be implemented in XML.

    I was planning on building a special set of tables using a STAR-schema-like design, to allow for flexible use of metadata elements. If I do implement content metadata this way, should I store the metadata element values as text, or as XML data types?

    What is the standard practice (if any) for implementing content metadata in SQL tables? Are there any articles or guides on implementing Dublin Core (or other metadata standards) in SQL Server tables?

  • I don't think you could generalize a solution for metadata, as it depends an awful lot on what you intend to actually do with that information. Just saying you want to report on it is kind of like saying I have no idea what I need, but I know I need something in my reports that's not actually data in my database, but for some reason has to be referred to as metadata instead. Thus I think the 1st question to ask, is, why metadata? Also, what kind of data is in this database, and why, exactly, is it going to soon grow significantly in size, and how much of that growth will be the "metadata"? I've seen many situations where someone just "decided" that certain information was "metadata", when in reality, that information should have been properly classified as data that belonged in the "database proper", if you will. If you don't know why you refer to it as metadata, that might suggest that maybe it really isn't. Unfortunately, you've provided zero details on what you're objectives are and similarly, no details on where you are today, so I'm not even sure what you expect to achieve. Some suggestions on what to provide in order to get a better answer:

    1.) What does the database do, how big is it today, how big is it supposed to get, and why?

    2.) Are there any existing performance issues that would be exacerbated by data or metadata growth?

    3.) What is the relationship between this "metadata" and the existing database information, and how much more data does this metadata represent?

    4.) What are your detailed objectives for the metadata (e.g., what existing or new business processes will it support)?

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Starting with SQL 2000 Microsoft Introduction "Extended Properties", a brief description of what they can be utilized for is:

    Using Extended Properties on Database Objects

    Microsoft® SQL Server™ 2000 introduces extended properties that users can define on various objects in a database. These extended properties can be used to store application-specific or site-specific information about the database objects. Because the property is stored in the database, all applications reading the property can evaluate the object in the same way. This helps enforce consistency in the way data is treated by all of the programs in the system.

    Each extended property has a user-defined name and value. The value of an extended property is a sql_variant value that can contain up to 7,500 bytes of data. Individual database objects can have multiple extended properties.

    Possible uses of extended properties include:

    Specifying a caption for a table, view, or column. All applications can then use the same caption in a user interface that displays information from that table, view, or column.

    Specifying an input mask for a column so that all applications can validate data before executing a Transact-SQL statement.

    Specifying formatting rules for displaying the data in a column.

    Recording a description of specific database objects that applications can display to users.

    Specifying the size and window location at which a column should be displayed

    You could start reading here:

    http://technet.microsoft.com/en-us/library/aa214766(SQL.80).aspx

    to determine if this functionality meets your needs.

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • >1.) What does the database do, how big is it today, how big is it supposed to get, and why?

    The database stores statistics relating to social data, e.g. "% of population that fulfills Criteria X". It is accessed by a few custom-built, in-house tools that display the data in several different ways. It is currently around 5 gigabytes in size. We expect to get lots of new statistical data over the next year; at this time, we estimate the database may grow as large as 100 gigabytes over the next year.

    >2.) Are there any existing performance issues that would be exacerbated by data or metadata growth?

    Not at this time, as the data is generally accessed by our in-house tools using small, controlled queries.

    >3.) What is the relationship between this "metadata" and the existing database information, and how much more data does this metadata represent?

    The "content metadata" I refer to is information that describes the statistical data we store in our database. For example, for a particular statistical value - say, "% of population that fulfills criteria X" - there would be content metadata attached to that information, like: The source of that statistic, the time period that statistic is valid for, and a list of caveats regarding use of that statistic.

    So, I refer to it as "metadata" because it is information that describes the statistical data that populates our database, rather than being statistical information itself. The purpose of this metadata is to provide context for anyone extracting information from the database - part of the metadata will be provided to outside users (e.g. the valid time period), while other parts of the metadata are primarily for us, the individuals who maintain the database (e.g. the source of the data).

    For every statistical unit of information in our database (potentially 100 gigs worth over the next year) there will be some associated metadata. It is not exactly 1-to-1, however, as most of the metadata will relate to large sets of the statistical data.

    >4.) What are your detailed objectives for the metadata (e.g., what existing or new business processes will it support)?

    Two main objectives: First, to provide an audit trail for us, so that we always know enough information about our statistical data sets for us to be able to rely on them as a legitimate pieces of data. e.g. in six months time, if a user asks us how we derived a particular statistic, we have the metadata stored to be able to provide them with all the information necessary.

    Second, part of the metadata will be provided to users with our in-house reporting software. For example, users may be provided with the caveats of the data, and the dates it is valid, as part of a query to the database. This will produce better, more qualified reports for our users.

    I hope this provides some better context for my question.

  • Sounds like you have some metadata that qualifies to go right into the database next to the information itself, and some that might be more at the table level. For example, if you have statistical data in the same table where record #123456 is valid for a different set of dates than record #654321, then you'll need to do more than just add to the Description information to the table's metadata. I would suggest adding a field that would simply be a foreign key into a metadata table. The metadata table would then have an IDENTITY field as it's primary key, to which the foreign key in your statistics tables refers. It would also contain the valid date range (a start date and end date, and any other metadata deemed necessary, and again, if the number of unique pieces of metadata isn't terribly large, you can likely add fields to this table as needed. Given a 100GB primary set of data, any extra "wasted space" from unused fields is likely to be a nit in comparison.

    What this concept does is provide the ability to join to your metadata and as a result, be easily able to segment it by it's metadata, or to choose from among the various sets of data based on the metadata. This might work especially well if the number of records in any given table for which identical metadata applies is large.

    Ideally, you'll only need one metadata table, but given your extremely well detailed description of your scenario, I suspect you'll know if it makes sense to get to more than one. There's always the danger that more than one could start meaning that the number will always grow, but I'd avoid that if at all possible.

    Given your anticipated growth, adding just one int field to the existing tables is probably a nit, and you'll likely have far more to worry about from the growh to a 100GB database than from the metadata.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • In terms of metadata management, there are several approaches that do work.

    Additionally there are a range of tools available, some very expensive and some available in a more turn-key fashion

    especially for the Microsoft shop. Search for Metadata Management Software on Google.

    Here are some good tips when approaching your metadata integration effort.

    Very difficult to develop in house – most efforts fail as it gets unwieldy quick. Lots of coding involved.

    Try to identify the the key sources of your metadata and find a tool to capture that automatically within a repository.

    Some repositories support Dublin and other standards inherently.

    Understand the problem you are trying to solve. Make a list.

    Get help from the experts, your chances for success are excellent. Training , Blogs, Webinars, Consulting.

    Use a phased approach.

    Don’t try to boil the ocean, you’ll get burned.

    Break activities down by technical vs business.

    Always directly address business requirements solution.

    Manually maintaining metadata is not desirable.

    Some example goals could be as follows:

    Automatically capture all metadata from SQL objects, SSIS, SSAS and SSRS.

    Deliver Impact analysis and search capabilities.

    Collaboration features via a web portal can provide an easy way to fill in the blanks over time.

    Initial roll-out should target 100% automated solution.

    Good body of knowledge for metadata management can be found here.

    http://www.infolibcorp.com/metadata-management/

    http://www.infolibcorp.com/blog/

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply