Changing data type to optimize performance

Question

Changing data type to optimize performance

nadersam

SSCertifiable

Points: 6542
More actions
March 13, 2016 at 4:08 am

#311991

Hi All,
I have read many articles recommending use of varchar instead on nvarchar as first one only reserves the amount of data saved in field instead of reserving the whole field size like in case of nvarchar.
I assume this will lead to queries consuming less IO , performing better on network also the table sizes will be less.
I created an alter script to do that change and attached it in this post.
Please let me know if there are any precautions to take care of, also should i recreate the views using those fields or no need to do that?
Your replies are highly appreciated.
Thanks in advance.
Nader

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 1

Hi Nader,

I would be very careful before replacing all nvarchar columns with varchar columns. What if the database actually has to store data that requires a Unicode character set?

Another important consideration is that the script you generate will consume lots of resources, cause major blocking, and will leave the database extremely fragmented. Is that really worth the (probably small) performance gain?

What I would do instead of running this script is to go over the tables, find the columns that *and* take up a lot of space, *and* are guaranteed never to need characters outside of standard ASCII, and then manually create a script to change them. If there are multiple such columns in a single table, I would create a new table, copy the data, rename it and then drop the old table instead of running multiple ALTER TABLE statements. I might even do that for tables with just a single column to change.

If you still want to use your script, then I suggest that you set up a good test first. Create a "play" database, put in some tables, ensure that you cover all datatypes you want to test (include silly things such as nvarchar(1), special cases such as nvarchar(max), nullable and non-nullable columns, and data types that should be untouched such as varchar. Also consider what you want to do with nchar(xx) columns.

Finally, I am 95% sure that ALTER TABLE ... ALTER COLUMN will fail if the column is included in or referenced by a foreign key, and it might also fail if the column is indexed or subject to a CHECK constraint. Not sure about that, but it's easy enough for you to create some test cases and check.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

nadersam SSCertifiable Points: 6542 More actions · Answer 2

Hi Hugo,

Thanks for your reply.

You are right if column is included in an index , the index should be dropped first , change done then index re created.

Regarding the script, i could optimize it more but you mentioned that the gain is not that significant.

I thought it would make a good improvement but if not then no need to do it.

For fragmentation , i could rebuild indexes on table after change.

Now i need a recommendation please to do that or not, if there is no need for field to be nvarchar.

Thanks for reply

Nader

Hugo Kornelis SSC Guru Points: 64780 More actions · Answer 3

nadersam (3/13/2016)
Now i need a recommendation please to do that or not, if there is no need for field to be nvarchar.

Starting with the obvious: For any new work, always ensure that you only use nvarchar or nchar if you *need* Unicode. That prevents getting into this situation later.

For existing columns that are nvarchar but do not need to be, there is no easy standard solution. Are you currently running into disk space issues becuase of this? Do you find that your buffer pool is under constant pressure and you suspect that this may be a solution? Are you experiencing any other issues that you suspect might be related? In all those cases, I think you have a case for taking the one-time hit to change the columns. Find the worst offenders (looking at average data length and number of rows in the table, and if it's buffer pool related also at frequency of use of the table) and fix them first, then check if your problems are now gone.

If you have no serious problems but suspect that your performance might suffer, then you could consider changing just the very worst columns, but in this case I would absolutely test the effect of the change first before ever starting to plan it for production, and of course run the change script during a planned downtime.

If you have no issues at all related to nvarchar but just consider doing this "because you read it somewhere", then don't bother.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

nadersam SSCertifiable Points: 6542 More actions · Answer 4

Hugo Kornelis (3/13/2016)
nadersam (3/13/2016)
Now i need a recommendation please to do that or not, if there is no need for field to be nvarchar.
Do you find that your buffer pool is under constant pressure and you suspect that this may be a solution? Are you experiencing any other issues that you suspect might be related?.

Yes Hugo, i noticed that the page life expectancy is getting very low, i checked missing indexes and didn't find much is needed so that idea came up to me specially i have read about it many times before.

Any way to detect if this is what causing the low page life expectancy?

Thanks again.

Nader

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 5

nadersam (3/13/2016)
Hi All,
I have read many articles recommending use of varchar instead on nvarchar as first one only reserves the amount of data saved in field instead of reserving the whole field size like in case of nvarchar.

[font="Arial Black"]That's actually and horribly incorrect. [/font] VARCHAR stores 1 byte per character for the width of the data + 2 btyes to keep track of the length. NVARCHAR stores 2 bytes per character for the width of the data + 2 bytes for length tracking like VARCHAR does. [font="Arial Black"]NVARCHAR does NOT reserve "the whole field size". [/font] Only CHAR() and NCHAR() reserve "the whole field size".

I assume this will lead to queries consuming less IO , performing better on network also the table sizes will be less.

Yes, all of that is true simply because NVARCHAR does take roughly twice the number of bytes to get the same information across. I especially hate it when someone uses NVARCHAR on columns that will only ever contain digits and symbols like spaces or dashes.

HOWEVER, such a blanket change could really put the screws to performance if there's a front end because of a problem that everyone supposedly knows about but, it seems, that everyone forgets about. ORMs like Entity Framework, Linq2SQL, and others, frequently and mistakenly generate SQL code where the passed data-types and/or parameters are NVARCHAR for strings. [font="Arial Black"]Due to data-type mismatches and "Data-Type Precedence", that will cause any and all front-end generated queries to suddenly do index scans or table scans instead of index seeks, which will obviously cause HUGE performance problems. [/font] The reason why this happens in the NVARCHAR passed parameters have a high data type precedence and it will force the column it's being compared to be completely scanned because each row of the column must first be converted to NVARCHAR (in this case) before it can use the criteria in the query. Yes, this problem can be fixed at the ORM but usually requires an "annotation" of the correct datatypes to fix the problem because ORMs are actually pretty ignorant in what they do.

To summarize, I wouldn't change a bloody thing until you have an extremely deep and all encompassing understanding of the problems and fixes that are required in the front end code, in stored procedures, in the structure of the objects (including things like indexes, constraints, defaults, triggers, etc, etc).

Please let me know if there are any precautions to take care of, also should i recreate the views using those fields or no need to do that?

How'd I do? 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 6

Shifting gears a bit, if you really want to improve performance, bite the bullet and shake out the rug. Find the top 10 worst performing queries on the server and fix those. Wash, rinse, repeat, until no performance issues are perceived. It'll actually be much more effective and easy to do than changing data-types all over the place.

Supposed "long running" aren't usually the biggest problem when it comes to the front end. People look for things like queries that take over 1 second to run and fix those and then wonder why the front end hasn't seen a performance improvement. What you have to look for is the total CPU and IO used by high frequency usage queries. Yes, the "long running" queries will still need to eventually be fixed but a query that consumes "only" 250ms and "only" consumes 20-40,000 reads per execution that runs 50 or 100 thousand times per day is going to be where you can really get performance improvement.

There are plenty of scripts on the internet to help you find these problems but, until you find one that you like, you can just open SSMS, open the Object Explorer (press the {f8} key to get there if you don't know how), right click on the instance name, select {Reports}, {Standard Reports}, and then follow your nose to the performance related reports that report on TOP QUERIES BY TOTAL CPU TIME and TOP UERIES BY TOTAL IO. It finds both stored procedures, ORM generated code, and ad hoc queries that may be generated by "managed" code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 7

nadersam (3/13/2016)
Hugo Kornelis (3/13/2016)
nadersam (3/13/2016)
Now i need a recommendation please to do that or not, if there is no need for field to be nvarchar.
Do you find that your buffer pool is under constant pressure and you suspect that this may be a solution? Are you experiencing any other issues that you suspect might be related?.
Yes Hugo, i noticed that the page life expectancy is getting very low, i checked missing indexes and didn't find much is needed so that idea came up to me specially i have read about it many times before.
Any way to detect if this is what causing the low page life expectancy?
Thanks again.
Nader

Careful now. PLE isn't necessarily some magical indication of any problem. Yes, it's something to be considered but, in the absence of other information such as buffer hit ratios, etc, it should be considered with a grain of salt. Check the reports that I mentioned in my previous post and start there.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

nadersam SSCertifiable Points: 6542 More actions · Answer 8

Jeff Moden (3/13/2016)
Supposed "long running" aren't usually the biggest problem when it comes to the front end. People look for things like queries that take over 1 second to run and fix those and then wonder why the front end hasn't seen a performance improvement. What you have to look for is the total CPU and IO used by high frequency usage queries. Yes, the "long running" queries will still need to eventually be fixed but a query that consumes "only" 250ms and "only" consumes 20-40,000 reads per execution that runs 50 or 100 thousand times per day is going to be where you can really get performance improvement.

Thank you very much Jeff, that's a bit of a shift there, during last period i worked on the slowest SQLs specially ones causing very high CPU and running for long time which caused very bad performance for all system during execution.

I also worked on indexes and merged some of the duplicate ones.

PLE went to a good number but all of sudden without any updates in system its started falling down aggressively from average of 400 -500 to 50-100.

As you mentioned i will work on queries that run lots of time but not necessarily slow and monitor.

If you have any ideas why PLE dropped suddenly like that please let me know.

Note:Index rebuild plan fails more frequently recently but i do it manually.

Thanks again.

nadersam SSCertifiable Points: 6542 More actions · Answer 9

Jeff Moden (3/13/2016)
nadersam (3/13/2016)
Hugo Kornelis (3/13/2016)
nadersam (3/13/2016)
Now i need a recommendation please to do that or not, if there is no need for field to be nvarchar.
Do you find that your buffer pool is under constant pressure and you suspect that this may be a solution? Are you experiencing any other issues that you suspect might be related?.
Yes Hugo, i noticed that the page life expectancy is getting very low, i checked missing indexes and didn't find much is needed so that idea came up to me specially i have read about it many times before.
Any way to detect if this is what causing the low page life expectancy?
Thanks again.
Nader
Careful now. PLE isn't necessarily some magical indication of any problem. Yes, it's something to be considered but, in the absence of other information such as buffer hit ratios, etc, it should be considered with a grain of salt. Check the reports that I mentioned in my previous post and start there.

The buffer hit ratio is always 1 or 0.99, and the buffer cache hit ratio base is highly variant.

I use below sql to measure them along with other metrics i put in a scheduled job that runs every 10 mins.

SELECT @Buffercachehitratio=

cast(CAST (A.cntr_value1 AS NUMERIC) / CAST (B.cntr_value2 AS NUMERIC) as decimal (18,3)) ,@Buffercachehitratiobase=cntr_value2

FROM

(

SELECT cntr_value AS cntr_value1

FROM sys.dm_os_performance_counters

WHERE object_name = 'SQLServer:Buffer Manager'

AND counter_name = 'Buffer cache hit ratio'

Thanks again

Nader

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

nadersam (3/14/2016)
The buffer hit ratio is always 1 or 0.99, and the buffer cache hit ratio base is highly variant.

Buffer cache hit ratio is nearly useless as a counter.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass