SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Summarizing Imported Data


Summarizing Imported Data

Author
Message
paddythegeek
paddythegeek
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 56
Comments posted to this topic are about the item Summarizing Imported Data
Bill Talada
Bill Talada
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2800 Visits: 2125
You've made a good start. I tend to query each column separately as I'm doing a conversion using templates, especially since value distributions are extremely helpful. Each data type has its quirks and custom aspects to look at. For example, for datetime data, I like to know if it is strictly dates or dates with times. And I like to know if the data fits in a smalldatetime. Knowing empty strings vs. nulls is important too.

Below is a template I use for imported datetime data interrogation

-- interrogate DateTime column
-- to see if it is a Time column, nullable, or an expected range
select
COUNT(*) as CountRows,
sum(case when <colname,,> is null then 1 else 0 end) as NullsCount,
sum(case when CONVERT(varchar(12), <colname,,>, 114) = '00:00:00:000' then 1 else 0 end) as DateNoTimeCount,
sum(case when CONVERT(varchar(12), <colname,,>, 114) > '00:00:00:000' then 1 else 0 end) as DateTimeCount,
sum(case when <colname,,> < '1/1/1900' then 1 else 0 end) as Before1900Count,
sum(case when <colname,,> between '1/1/1900' and '1/1/2000' then 1 else 0 end) as BetweenCount,
sum(case when <colname,,> >= '1/1/2000' then 1 else 0 end) as After2000Count
from
<tablename,,>


-- most frequent values
select top 100
<colname,,>,
COUNT(*) as frequencyCount
from
<tablename,,>
group by
<colname,,>
having
COUNT(*) > 1
order by
COUNT(*) desc



Basit Farooq
Basit Farooq
SSC Veteran
SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)SSC Veteran (275 reputation)

Group: General Forum Members
Points: 275 Visits: 818
Very well written.

Regards,

Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)

http://basitaalishan.com
paddythegeek
paddythegeek
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 56
Thanks, Bill. That's a great method for interrogating the datetime column. I'll bet you've amassed quite a collection of such in your work.

Do you primarily run these as needed and keep notes elsewhere on your conversion process, or run the interrogations once (each time a new import is done) and store results to tables? I used to always do the former, but lately have been finding benefit from keeping all in one place. Plus, it forces me to document my queries more rigorously, which is always a good thing. Smile

Thanks again for the read, and for the comment!
Bill Talada
Bill Talada
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2800 Visits: 2125
Storing your results in tables is a good idea; I just get once-and-done conversion tasks from new customers. The templates are sufficient for that. I really like using templates and have about 200 that I use regularly for database change requests from developers. It would be nice to find a shared library of templates somewhere.

Even better than templates though...

Custom procedures are big wasters of time for maintenance. Over 90% of procedures in my database projects are generated and maintained by 20 simple template procs which work together as a system. They use a template language for formatting and replaceable parameters. Just say "no" to codesmith, simplify and save a bundle.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search