Check 'exist', case insensitively in Xquery

Question

Post reply

Check 'exist', case insensitively in Xquery

IT researcher

SSCertifiable

Points: 7616

May 11, 2021 at 12:41 pm

#3881863

I am using SQL Server 2008 R2. Below is the query used to get the count and the time taken for that query.

--time taken for table without any xml indexing
SELECT COUNT(*) from [largexml-schema-nodoc].dbo.Table_noindex WHERE bookdata.exist('(/bookstore/book/author)[fn:upper-case(.)=fn:upper-case(''Erik T. Ray'')]') = 1
--CPU time = 10797 ms,  elapsed time = 10994 ms.
SELECT COUNT(*) from [largexml-schema-nodoc].dbo.Table_noindex WHERE bookdata.exist('(/bookstore/book/author)[.=''Erik T. Ray'']') = 1
--CPU time = 3312 ms,  elapsed time = 3298 ms.


--time taken for table with all types secondary xml indexing
SELECT COUNT(*) from [largexml-schema-nodoc].dbo.tblallindex WHERE bookdata.exist('(/bookstore/book/author)[fn:upper-case(.)=fn:upper-case(''Erik T. Ray'')]') = 1
--CPU time = 7282 ms,  elapsed time = 7465 ms.
SELECT COUNT(*) from [largexml-schema-nodoc].dbo.tblallindex WHERE bookdata.exist('(/bookstore/book/author)[.=''Erik T. Ray'']') = 1
--CPU time = 453 ms,  elapsed time = 461 ms.

*****100000 rows with XML col value ****
 "<bookstore>
  <book category=""COOKING"">
    <title lang=""en"">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30</price>
  </book>
  <book category=""CHILDREN"">
    <title lang=""en"">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category=""WEB"">
    <title lang=""en"">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
  </book>
  <book category=""WEB"">
    <title lang=""en"">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>"

Is there any option to check 'exist', case insensitively ?

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

ykhabins Grasshopper Points: 14 More actions · Answer 1

XML is case sensitive by definition. There is no escape from it.

Two points to mention.

(1) You can try the following approach by using a different XPath predicate like below.

(2) Additionally, SQL Server 2012 introduced a new 3rd type of XML index: Selective XML Index

You can see a good example of it here:

https://www.red-gate.com/simple-talk/sql/learn-sql-server/precision-indexing-basics-of-selective-xml-indexes-in-sql-server-2012/

SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, bookdata XML);
INSERT INTO @tbl (bookdata) VALUES
(N'<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>');
-- DDL and sample data population, end

DECLARE @author VARCHAR(30) = 'Erik T. Ray';
SET @author = UPPER(@author);

-- slow
SELECT COUNT(*) 
FROM @tbl
WHERE bookdata.exist('(/bookstore/book/author)[fn:upper-case(.)=fn:upper-case(''Erik T. Ray'')]') = 1;

-- much faster
SELECT COUNT(*) 
FROM @tbl
WHERE bookdata.exist('/bookstore/book/author/text()[upper-case(.) = sql:variable("@author")]') = 1;

This reply was modified 4 years, 10 months ago by ykhabins.

IT researcher SSCertifiable Points: 7616 More actions · Answer 2

Below query tested and time updated

-- with ucase
SELECT COUNT(*) from tblallindex 
WHERE bookdata.exist('/bookstore/book/author/text()[upper-case(.) = ''ERIK T. RAY'']') = 1;
-- CPU time = 5781 ms,  elapsed time = 5903 ms.

-- without ucase
SELECT COUNT(*) from tblallindex 
WHERE bookdata.exist('/bookstore/book/author/text()[. = ''Erik T. Ray'']') = 1;
-- CPU time = 94 ms,  elapsed time = 100 ms.

text() doesn't solve (without schema case),

when schema is enabled, text() is not allowed, but same speed as text().

Mr. Brian Gale One Orange Chip Points: 25014 More actions · Answer 3

Possibly a dumb question but would it be faster to do this on the application side? Not sure how much data you are pulling (in MB/GB) if you just did a full data dump of the XML to the application layer and let the application handle searching and sorting the data.

Or, alternatively, is it possible to take your XML and put it into relational and easy to use tables and just have a view that pulls the data back as XML for use by the application?

I am not saying don't do it in SQL if it makes sense, but XML is rarely efficient in SQL Server and can be challenging to debug and get coded up JUST right (based on my experience). I am thinking that if you can pull the XML directly, pass it to the application and let the application handle the searching and sorting, it may be easier as I imagine that the application is parsing the XML into a format that it can handle (table, array(s), dictionary, list, etc) and .NET is generally good at working with its objects in memory. Having the data in-memory on the application side means that processing doesn't need to re-query the database with each new search that is required on the XML.

Now on the other hand, if the application requires XML input and it gets that from the database and changing the application is not an option, storing your data in the database in a table format with columns/tables like "Author" and "Title" rather than storing all of the data in a big XML blob will make searches easier as well as inserts and updates. With your big blobs of XML, finding duplicate or near duplicate data becomes challenging.

And if you had it all in the database in a nice relational manner with author names and book titles, your SQL queries for searching will be MUCH easier to write and maintain without needing to parse through XML. And if the application needs an XML result, SQL can convert the table results to XML and the application would have no idea that the database design changed!

Mind you, I know changing a "working" system into something maintainable like that can be challenging and time consuming and since it adds no "benefit" from an end user perspective, getting approvals to work on changes like that may not be approved by your boss.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

ykhabins Grasshopper Points: 14 More actions · Answer 4

Dear Brian Gale,

The OP just needs to migrate off the current SQL Server 2008R2 to a newer version.

Starting from SQL Server 2012, it has a very powerful new index: Selective XML Index.

It allows to create an index based on the XPath expessions. Exactly what is needed for the scenario.