statistic on varchar(max) column

  • Is there any point in keeping auto-generated stats on either varchar(max) or large varchar columns ( e.g. varchar(3000) ) ? The data in these columns is either free-form notes entered by users or xml data )

    I'm finding some of our auto-created statistics take a very long time to update with "update statistics." We use fullscan but I'd rather not get into why we do that --- the decision followed extensive testing.

    The updates to these stats also coincide with some of the 15-second IO messages in the sql log ( tempdb ). Some of these stats are on "note" type fields with lots of free-form text and one other one includes about 15 columns ( that one was a database tuning advisor creation ).

    I noticed that the histogram steps in some of these stats are either empty, or a pretty messy collection of free-form text data.

  • I've decided to exclude 3 specific statistics from our stats maintenance jobs -- these almost always cause the 15-second IO messages and take a long time to update at fullscan. They may be auto-updated from time-to-time.

    Removing them practically requires an act of Congress, so I may hold off on that effort for a while.

    I confirmed that when running update Statistics on one of these with no sample percentage specified, the sampling rate was .2 %

    Records sampled: 604275

    Record count: 231,741,201

  • Do you use those varchar(max) (or other heavy types) columns in WHERE conditions?

    Do you have Full Text search implemented?

    Igor Micev,My blog: www.igormicev.com

  • I doubt the varchar(max) and other long varchar columns are used in where clauses, except rarely on an adhoc basis when somebody was doing research ( which is probably when the stat was auto-created ). We do have two columns under full-text catalogs -- and I was wondering about the use of statistics on those columns.

  • If you are using them in the where clauses then the stats will auto created and you might know it already and you really do not have control on it.

    From what I see either you don't want to update those statistics or change the sampling rate.

    1. For changing the sampling rate : Check Trace flag 2371

    2. If you decide not to update those stats then add a step in the job to drop those stats and it won't update during that window, how ever will get recreated if you run those queries again.

    DROP STATISTICS table.statistics_name

    [font="Tahoma"]
    --SQLFRNDZ[/url]
    [/font]

  • Since sql re-creates those _wa_sys stats using hexidecimal for the table's object id and the column's object id, it seems to always use the same same, so I just excluded those named statistics from my custom job.

    We're now using update stats with fullscan because we found that it runs multi-threaded, unlike lesser percentages. Plus, for our database, the only way to beat the speed of running fullscan was to drop below 30% sample rate which produces poor stats for our application.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply