Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

Gain Space Using XML data type Expand / Collapse
Author
Message
Posted Thursday, July 23, 2009 12:14 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Tuesday, August 27, 2013 11:09 PM
Points: 162, Visits: 263
Comments posted to this topic are about the item Gain Space Using XML data type
Post #757986
Posted Tuesday, August 4, 2009 1:28 AM
SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Monday, November 24, 2014 12:40 AM
Points: 652, Visits: 1,894
Hi,

I notice you're using very short tag names in your xml presumably to reduce space used.

If you use a typed xml column i.e. associate it with an xsd, the length of the tags shouldn't matter, and you may get other associated benefits. When I've some time I'll do a comparison.

Regards,

David McKinney.
Post #764616
Posted Tuesday, August 4, 2009 1:53 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, September 14, 2012 1:26 AM
Points: 1, Visits: 32
Hi,

You can reduce the space similarly without XML. You can use the table instead of the XML column.

CREATE TABLE Rainfall_LocDate
(LocDateId int IDENTITY(1, 1)
,LocID int
,Date datetime
,CONSTRAINT PK_Rainfall_LocDate PRIMARY KEY NONCLUSTERED(LocID,Date)
)

CREATE TABLE Rainfall_Data
(LocDateId int
,Hour tinyint
,Rainfall numeric(10,8)
,CONSTRAINT PK_Rainfall_Data PRIMARY KEY NONCLUSTERED(LocDId,Hour)
)

Regards,

Pavel Krticka
Post #764631
Posted Tuesday, August 4, 2009 2:53 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, October 22, 2014 12:42 PM
Points: 4, Visits: 81
The space savings are impressive and it does lend itself to a number of real-world situations but you also need to highlight the shortcomings of the XML data type as shown in SQL books online

http://msdn.microsoft.com/en-us/library/ms189887.aspx

Limitations of the xml Data Type
Note the following general limitations that apply to the xml data type:

The stored representation of xml data type instances cannot exceed 2 GB.

It cannot be used as a subtype of a sql_variant instance.

It does not support casting or converting to either text or ntext. Use varchar(max) or nvarchar(max) instead.

It cannot be compared or sorted. This means an xml data type cannot be used in a GROUP BY statement.

It cannot be used as a parameter to any scalar, built-in functions other than ISNULL, COALESCE, and DATALENGTH.

It cannot be used as a key column in an index. However, it can be included as data in a clustered index or explicitly added to a nonclustered index by using the INCLUDE keyword when the nonclustered index is created
Post #764660
Posted Tuesday, August 4, 2009 4:21 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, November 28, 2013 8:10 AM
Points: 2, Visits: 46
A very interesting article indeed!

However, I think that you shouldn't compare query performance without any index in the conventional table.
If you add an index (code below) and rerun your query, your query performance will be about 1000 times better. (Yet, the space requirement will also increase remarkably.)

Rgds,
Pexi

CREATE NONCLUSTERED INDEX [IX_Rainfall_Con_Date_Hour] ON [Rainfall_Con] 
(
[Date] ASC,
[Hour] ASC,
[LocID] ASC,
[Rainfall] ASC
)WITH (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]

Post #764698
Posted Tuesday, August 4, 2009 4:53 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 4:51 AM
Points: 9,928, Visits: 11,199
This script is closely based on that in the article.

I would strongly encourage the author of this article to try it out

Nice idea for an article, but size and speed are not XML strengths!

Paul

-- *****************************************************************************************
-- WARNING: This script contains commands to clear the server-wide data and procedure caches
-- Do not run on anything other than a TEST instance!
--
-- Please ensure Actual Execution Plan is OFF for representative results
-- Total test time is 10 to 60 seconds (including test data creation), depending on hardware
-- *****************************************************************************************
USE tempdb;
GO
--
-- Drop the test tables if they exist
--
IF OBJECT_ID(N'dbo.Rainfall_Con', N'U') IS NOT NULL DROP TABLE dbo.Rainfall_Con
IF OBJECT_ID(N'dbo.Rainfall_XML', N'U') IS NOT NULL DROP TABLE Rainfall_XML;
IF OBJECT_ID(N'tempdb..#BitBucket', N'U') IS NOT NULL DROP TABLE #BitBucket;
GO
--
-- Create the conventional table
--
CREATE TABLE dbo.Rainfall_Con
(
[LocID] INTEGER NOT NULL,
[Date] DATETIME NOT NULL,
[Hour] TINYINT NOT NULL,
[Rainfall] DECIMAL(10,8) NOT NULL,
);
GO
--
-- Populate the conventional table with one year's data for 100 locations
-- This should complete in less than 30 seconds
--
DECLARE @Rows BIGINT;
SET @Rows = 100 * 365 * 24; -- 24 hours' data for 365 days for 100 locations

RAISERROR('Creating test data...', 0, 1) WITH NOWAIT;

;WITH Numbers (N)
AS (
-- Dynamically create rows numbered 0 to 875999
SELECT TOP (@Rows)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
FROM master.sys.all_columns C1,
master.sys.all_columns C2,
master.sys.all_columns C3
)
INSERT dbo.Rainfall_Con WITH (TABLOCKX)
(
[LocID],
[Date],
[Hour],
[Rainfall]
)
SELECT LocID = (N / 365 / 24) + 1,
[Date] = DATEADD(DAY, (N / 24) % 366 , '2008-01-01 00:00:00.000'),
[Hour] = N % 24,
[Rainfall] = RAND(CHECKSUM(NEWID())) * 10 -- A new random value for each row
FROM Numbers;
GO
--
-- Now create the primary key
-- Doing this *after* inserting the data results in a compact and contiguous index
-- If the index were to exist during the insert, index pages may split resulting in
-- a lower data density, and significant fragmentation
--
RAISERROR('Creating index...', 0, 1) WITH NOWAIT;

ALTER TABLE dbo.Rainfall_Con
ADD CONSTRAINT PK_Rainfall_Con
PRIMARY KEY CLUSTERED([Date], [Hour], [LocID])
WITH (FILLFACTOR = 100);
GO
--
-- Create the XML table
--
CREATE TABLE dbo.Rainfall_XML
(
[LocID] INTEGER NOT NULL,
[Date] DATETIME NOT NULL,
[Rainfall] XML NOT NULL,
);
GO
--
-- Populate the XML table from the conventional table
--
RAISERROR('Creating XML data...', 0, 1) WITH NOWAIT;

INSERT Rainfall_XML WITH (TABLOCKX)
SELECT [LocID],
[Date],
(
-- The hourly data as XML
SELECT H = [Hour],
V = [Rainfall]
FROM dbo.Rainfall_Con A WITH (READUNCOMMITTED)
WHERE A.[LocID] = B.[LocID]
AND A.[Date] = B.[Date]
FOR XML RAW('RN'), ROOT('RT')
)
FROM Rainfall_Con B WITH (READUNCOMMITTED)
GROUP BY
[LocID],
[Date];
GO
--
-- Add the primary key now, for the same reasons as before
--
RAISERROR('Creating index...', 0, 1) WITH NOWAIT;

ALTER TABLE dbo.Rainfall_XML
ADD CONSTRAINT PK_Rainfall_XML
PRIMARY KEY CLUSTERED([Date], [LocID])
WITH (FILLFACTOR = 100);
GO
--
-- Show the space used by each table
--
-- Results on my system:
--
-- Conventional : 876,000 rows, 27016KB reserved, 26856KB data, 96KB index, 64KB unused
-- XML : 36,500 rows, 36504KB reserved, 36504KB data, 112KB index, 64KB unused
--
-- ### XML representation is 35% *larger* ###
--
EXECUTE sp_spaceused Rainfall_Con;
EXECUTE sp_spaceused Rainfall_XML;
GO
--
-- This table is used to capture the results from the test
-- SELECT statements. This avoids any delays in sending the
-- rows to the client from skewing the results
--
CREATE TABLE #BitBucket
(
[LocID] INTEGER NOT NULL,
[Date] DATETIME NOT NULL,
[Hour] TINYINT NOT NULL,
[Rainfall] DECIMAL(10,8) NOT NULL,
);
GO
--
-- TEST START
--
RAISERROR('Starting test', 0, 1) WITH NOWAIT;
--
-- Remove all ad-hoc SQL query plans from cache
--
DBCC FREESYSTEMCACHE (N'SQL Plans');
GO
--
-- Start with a cold data cache
-- Each implementation is run 10 times
-- One time with a cold cache, then nine times with a warm cache
-- This seems more representative to me, feel free to change it
--
DBCC DROPCLEANBUFFERS;
GO
/* SQL */
INSERT #BitBucket WITH (TABLOCKX)
SELECT TP.[LocID],
TP.[Date],
TP.[Hour],
TP.[Rainfall]
FROM dbo.Rainfall_Con TP WITH (READUNCOMMITTED)
WHERE TP.[Date] = '20080106'
AND TP.[Hour] = 15
ORDER BY
LocID ASC;
GO 10
--
-- For fairness, recreate the dump table for the second run
-- TRUNCATE TABLE might give the second run an unfair advantage
-- since mixed and uniform extent allocations would not need
-- to be re-done
--
DROP TABLE #BitBucket;
GO
CREATE TABLE #BitBucket
(
[LocID] INTEGER NOT NULL,
[Date] DATETIME NOT NULL,
[Hour] TINYINT NOT NULL,
[Rainfall] DECIMAL(10,8) NOT NULL,
);
GO
--
-- Second run - XML
--
DBCC DROPCLEANBUFFERS;
GO
/* XML */
INSERT #BitBucket WITH (TABLOCKX)
SELECT TP.LocID,
TP.[Date],
[Hour] = Rainfall.value('(/RT/RN[@H=15]/@H)[1]','tinyint'),
Rainfall_Con = Rainfall.value('(/RT/RN[@H=15]/@V)[1]','numeric(10,8)')
FROM dbo.Rainfall_XML TP WITH (READUNCOMMITTED)
WHERE TP.[Date] = '20080106'
ORDER BY
LocID;
GO 10
--
-- Show the total execution resource usage for all ten runs
--
SELECT query = LEFT(QT.[text], 9),
execution_plan = QP.query_plan,
run_count = QS.execution_count,
total_cpu_time_μs = QS.total_worker_time,
total_logical_reads = QS.total_logical_reads,
total_elapsed_time_μs = QS.total_elapsed_time,
avg_cpu_time_μs = QS.total_worker_time / QS.execution_count,
avg_logical_reads = QS.total_logical_reads / QS.execution_count,
avg_elapsed_time_μs = QS.total_elapsed_time / QS.execution_count
FROM sys.dm_exec_query_stats QS
CROSS
APPLY sys.dm_exec_sql_text (QS.[sql_handle]) QT
CROSS
APPLY sys.dm_exec_query_plan (QS.[plan_handle]) QP
WHERE QT.[text] LIKE '%Rainfall%'
AND QT.[text] NOT LIKE '%dm_exec_query_stats%'
ORDER BY
QS.last_execution_time ASC;
GO
--
-- Sample results:
--
-- SQL Averages: CPU 683μs; Elapsed time: 683μs; Logical reads: 114
-- XML Averages: CPU 80078μs; Elapsed time: 83691μs; Logical reads: 126

--
-- Tidy up
--
GO
IF OBJECT_ID(N'dbo.Rainfall_Con', N'U') IS NOT NULL DROP TABLE dbo.Rainfall_Con
IF OBJECT_ID(N'dbo.Rainfall_XML', N'U') IS NOT NULL DROP TABLE Rainfall_XML;
IF OBJECT_ID(N'tempdb..#BitBucket', N'U') IS NOT NULL DROP TABLE #BitBucket;
GO
RAISERROR('Test run complete', 0, 1) WITH NOWAIT;
--
-- END SCRIPT
--





Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #764706
Posted Tuesday, August 4, 2009 5:53 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 7:16 AM
Points: 4,614, Visits: 11,021
Nice article, I learned something about XML in SQL Server today.
As Paul pointed out, it's impossible that XML storage needs less space than conventional storage.
With appropriate indexing, there's no way that XML works faster than a relational table.

I suggest that you add some more comments to your article to avoid misleading somebody on this topic.

Nice idea, anyway.

Gianluca


--
Gianluca Sartori

How to post T-SQL questions
spaghettidba.com
@spaghettidba
Post #764727
Posted Tuesday, August 4, 2009 6:45 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, September 17, 2014 9:19 AM
Points: 1,307, Visits: 4,508
If you make your primary key in both tables CLUSTERED (which would be a reasonable choice in this case given that some of the queries are ORDERing/GROUPing by locid), the storage requirements are somewhat different... XML now uses more storage.

name           rows    reserved    data     index_size	unused
Rainfall_XML 36600 36744 KB 36600 KB 112 KB 32 KB

name rows reserved data index_size unused
Rainfall_Con 878400 27080 KB 26928 KB 96 KB 56 KB




Post #764751
Posted Tuesday, August 4, 2009 6:53 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Monday, November 17, 2014 12:50 PM
Points: 13,872, Visits: 9,598
Where I've found the XML data type useful has been in situations where you can't control schema changes to a relation (table) and want to record all data from it even if columns are added, modified or removed.

Logging triggers, or archiving processes in these cases can take advantage of "select * from table for xml" and insert into an XML column in a log/archive/audit table.

For semi-structured data that has variable schema elements, XML is pretty ideal. Beyond that, relational data structures are better.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #764757
Posted Tuesday, August 4, 2009 7:54 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Sunday, August 17, 2014 3:10 PM
Points: 2,787, Visits: 6,098
Another thing to point out about XML's use of space is that, barring compression, the data is being stored as characters wrapped in tags, making storage of integer, numeric, or floating point data LESS efficient.

Datatype   SQL Length             XML                              XML Length
--------- ----------- ---- ------------
TinyInt 1 Byte <V>255</V> 10
SmallInt 2 Bytes <V>32767</V> 12
Int 4 Bytes <V>2147483647</V> 17
FLOAT 8 Bytes <V>-2.23.40E-308</V> 20


In the author's example, he chose numeric(10,8) to store the rainfall amount, which would require only 9 bytes for SQL to store, but 18 for XML to store (90.12345678).

The author also chose to impose an [hour] column on the conventional table, which was unnecessary, because it could have been included as part of the [date] column.

Finally, he could have chosen to have a separate column for each hour in his schema for the conventional table, rather than making it one row per hour. (That wouldn't have been my first choice, because I feel it lacks flexibility in manipulating the data. But it is essentially the approach that was taken with the table that used XML.)



__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? -- Stephen Stills
Post #764807
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse