Summarizing Imported Data

  • Comments posted to this topic are about the item Summarizing Imported Data

  • You've made a good start. I tend to query each column separately as I'm doing a conversion using templates, especially since value distributions are extremely helpful. Each data type has its quirks and custom aspects to look at. For example, for datetime data, I like to know if it is strictly dates or dates with times. And I like to know if the data fits in a smalldatetime. Knowing empty strings vs. nulls is important too.

    Below is a template I use for imported datetime data interrogation

    -- interrogate DateTime column

    -- to see if it is a Time column, nullable, or an expected range

    select

    COUNT(*) as CountRows,

    sum(case when <colname,,> is null then 1 else 0 end) as NullsCount,

    sum(case when CONVERT(varchar(12), <colname,,>, 114) = '00:00:00:000' then 1 else 0 end) as DateNoTimeCount,

    sum(case when CONVERT(varchar(12), <colname,,>, 114) > '00:00:00:000' then 1 else 0 end) as DateTimeCount,

    sum(case when <colname,,> < '1/1/1900' then 1 else 0 end) as Before1900Count,

    sum(case when <colname,,> between '1/1/1900' and '1/1/2000' then 1 else 0 end) as BetweenCount,

    sum(case when <colname,,> >= '1/1/2000' then 1 else 0 end) as After2000Count

    from

    <tablename,,>

    -- most frequent values

    select top 100

    <colname,,>,

    COUNT(*) as frequencyCount

    from

    <tablename,,>

    group by

    <colname,,>

    having

    COUNT(*) > 1

    order by

    COUNT(*) desc

  • Very well written.

    Regards,

    Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)

    http://basitaalishan.com
  • Thanks, Bill. That's a great method for interrogating the datetime column. I'll bet you've amassed quite a collection of such in your work.

    Do you primarily run these as needed and keep notes elsewhere on your conversion process, or run the interrogations once (each time a new import is done) and store results to tables? I used to always do the former, but lately have been finding benefit from keeping all in one place. Plus, it forces me to document my queries more rigorously, which is always a good thing. 🙂

    Thanks again for the read, and for the comment!

  • Storing your results in tables is a good idea; I just get once-and-done conversion tasks from new customers. The templates are sufficient for that. I really like using templates and have about 200 that I use regularly for database change requests from developers. It would be nice to find a shared library of templates somewhere.

    Even better than templates though...

    Custom procedures are big wasters of time for maintenance. Over 90% of procedures in my database projects are generated and maintained by 20 simple template procs which work together as a system. They use a template language for formatting and replaceable parameters. Just say "no" to codesmith, simplify and save a bundle.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply