There are billions of VARCHAR-array-splitting routines out there to compensate for the lack of true Arrays and Array Parameters in SQL Server. In yesterday's blog I talked about passing in and splitting an array with more than 8,000 characters [the VARCHAR max]. Other times you might know that your input array will always be smaller than the VARCHAR upper size limit. In those cases, we can try to squeeze a little extra efficiency out of the user-defined function by eliminating some of the extra logic required to process a TEXT parameter. A lot of the routines out there use a lot more IF ... ELSE logic and string manipulations than they need to, so I've tried to minimize that in my code. With that in mind, here's the YASSAS small array splitter that accepts a VARCHAR(8000):
CREATE FUNCTION dbo.fn_SmallSplit (@sText VARCHAR(8000), @sDelim VARCHAR(20))RETURNS @ret TABLE( Idx INT IDENTITY(0, 1) NOT NULL PRIMARY KEY, Value VARCHAR(8000))ASBEGIN DECLARE @i INT SELECT @i = 1 DECLARE @j INT WHILE @i <= LEN(@sText) + 1 BEGIN SELECT @j = CASE WHEN CHARINDEX(@sDelim, @sText, @i) <= 0 THEN DATALENGTH(@sText) + 1 ELSE CHARINDEX(@sDelim, @sText, @i) END INSERT INTO @ret (Value) VALUES (LTRIM(RTRIM(SUBSTRING(@sText, @i, @j - @i)))) SELECT @i = @j + DATALENGTH(@sDelim) END RETURNEND
GO
I recently sat down to try to figure the best way to pass a comma-delimited array to a SQL Server 2000 stored procedure. The catch? It had to handle arrays larger than 8,000 bytes - the upper limit for VARCHAR data. SQL Guru and newsgroup regular Erland Sommarskog posted a great piece on parsing arrays, demonstrating and comparing several methods at http://www.sommarskog.se/arrays-in-sql.html. Unfortunately all his methods run up on the same VARCHAR limitations. So I turned my attention to SQL Server's TEXT datatype. I hit a bunch of brick walls there: You can't declare TEXT variables. You can't manipulate TEXT as easily as VARCHAR data. You can't use TEXTPTR() in a user-defined function. Etc.
Despite the problems, I knew I was on the right track. I realized early on that you can use SUBSTRING() on TEXT data. You can index into the TEXT data as far as you want, but SUBSTRING() only returns a VARCHAR once again maxed out at 8,000 bytes.
After hacking out my own half-working version using PATINDEX() and SUBSTRING(), I found a great example called "Breaking the 8000 Byte Limit" at http://weblogs.sqlteam.com/davidm/archive/2003/12/12/655.aspx. DavidM's version uses a "Numbers" table of sequential numbers to index into the TEXT parameter, and has a fixed delimiter (","). I converted his version to one that uses the simple looping structure I initially started with to break apart the TEXT parameter. This way I didn't have to create the Numbers table. The primary difference between his version and the one I came up with is his use of the nested CASE statement, and the clever use of the ABS() and SIGN() functions which are more efficient than the IF statements I originally implemented. DavidM's version is also an inline function, which is probably the reason for the Numbers table. Because I wanted to avoid the Numbers table, I had to create a Table-Valued Function.
With this function, you can pass in a delimited array of up to 2+ billion bytes; although I wouldn't recommend it. I ran tests of arrays with 8,000, 32,000 and 64,000 bytes with delimiters of varying lengths (one, two and three characters). The items in the list should each be less than 8,000 characters in length; since the function extracts the data in the form of VARCHARs. This limit is imposed by SQL Server. Regardless of the total length of the list in characters, the fewer items in the list, the faster the function will run (less INSERTs). In my tests, a 64,000 byte list with 1,600 items was parsed 10 times faster than a 64,000 byte list with 9,400 items.
Here's the function. Enjoy:
CREATE FUNCTION dbo.BigSplit( @List TEXT, @Delimiter VARCHAR(20))RETURNS @ret TABLE ( Number INT IDENTITY(0, 1) NOT NULL PRIMARY KEY, Value VARCHAR(8000))ASBEGIN
DECLARE @Index INT SELECT @Index = 1 DECLARE @Item VARCHAR(8000)
WHILE @Index < (DATALENGTH(@List) + DATALENGTH(@Delimiter)) BEGIN IF ((SUBSTRING(@List, @Index - DATALENGTH(@Delimiter), DATALENGTH(@Delimiter)) = @Delimiter AND @Index > 1) OR @Index = 1) BEGIN SELECT @Item = SUBSTRING(@List, @Index, CASE SIGN(CHARINDEX(@Delimiter, @List, @Index) - @Index) WHEN -1 THEN CASE PATINDEX('%' + @Delimiter + '%', SUBSTRING(@List, @Index, ABS(CHARINDEX(@Delimiter, @List, @Index) - @Index))) WHEN 0 THEN DATALENGTH(@List) - @Index + 1 ELSE PATINDEX('%' + @Delimiter + '%', SUBSTRING(@List, @Index, ABS(CHARINDEX(@Delimiter, @List, @Index) - @Index))) - 1 END ELSE ABS(CHARINDEX(@Delimiter, @List, @Index) - @Index) END) INSERT INTO @ret (Value) VALUES (@Item) END SELECT @Index = @Index + DATALENGTH(@Item) + DATALENGTH(@Delimiter) END RETURNENDGO
A while back a SQL newsgroup poster asked a question about sorting chapter numbers for a book in SQL. As anyone who's picked up a programming book or technical paper online knows, it's common practice to number chapters in the format '1.1', '2.1.3', '3.4.5.6', etc. The problem occurs when you have chapter numbers that contain two digits or more per segment, as demonstrated here:
CREATE TABLE #Chapters ( ID INT NOT NULL IDENTITY(1, 1), Chapter VARCHAR(20)) INSERT INTO #Chapters (Chapter) SELECT '1.1.0' UNION SELECT '1.1.10' UNION SELECT '2.1.0' UNION SELECT '2.2.0' UNION SELECT '3.1.1' UNION SELECT '3.10.1' UNION SELECT '3.2.1' UNION SELECT '1.10.1' UNION SELECT '11.1.0' SELECT Chapter FROM #Chapters ORDER BY Chapter DROP TABLE #Chapters
The result is:
Chapter ------- 1.1.0 1.1.10 1.10.1 11.1.0 2.1.0 2.2.0 3.1.1 3.10.1 3.2.1
We know that chapter 11.1.0 should not come before Chapter 2.1.0, and that Chapter 3.10.1 should not come before Chapter 3.2.1. But SQL is sorting based on string character codes; therefore our list comes out sorted incorrectly. How can we fix this? My solution is to left-pad each segment of the string with zeroes. So '3.10.1' becomes '003.010.001'. By left-padding all of our Chapter numbers in this manner, we get the correct sorting action. The following UDF shows how to left-pad strings in this format:
CREATE FUNCTION dbo.udf_SortableNumber (@chapter_number VARCHAR(500)) RETURNS VARCHAR(500) AS BEGIN DECLARE @temp VARCHAR(500) SET @temp = '' DECLARE @i INTEGER DECLARE @j INTEGER SET @i = 0 WHILE (@i < LEN(@chapter_number)) BEGIN SET @j = CHARINDEX('.', @chapter_number, @i + 1) IF @j = 0 BEGIN SET @j = LEN(@chapter_number) + 1 END SET @temp = @temp + RIGHT('000' + SUBSTRING(@chapter_number, @i + 1, @j - @i - 1), 3) + '.' SET @i = @j END RETURN @temp END
To incorporate this UDF into our previous example:
CREATE TABLE #Chapters ( ID INT NOT NULL IDENTITY(1, 1), Chapter VARCHAR(20), SortableChapter VARCHAR(20)) INSERT INTO #Chapters (Chapter) SELECT '1.1.0' UNION SELECT '1.1.10' UNION SELECT '2.1.0' UNION SELECT '2.2.0' UNION SELECT '3.1.1' UNION SELECT '3.10.1' UNION SELECT '3.2.1' UNION SELECT '1.10.1' UNION SELECT '11.1.0' UPDATE #Chapters SET SortableChapter = dbo.udf_SortableNumber(Chapter) SELECT Chapter FROM #Chapters ORDER BY SortableChapter DROP TABLE #Chapters
Chapter ------- 1.1.0 1.1.10 1.10.1 2.1.0 2.2.0 3.1.1 3.2.1 3.10.1 11.1.0
As we can see, the list is properly sorted this time around. This UDF also works to sort IP Addresses properly, so that '127.0.0.1' is listed after '96.10.1.10'.
This is a follow-up to the calculating Easter entry I posted yesterday. In this one we calculate "floating holidays", such as Thanksgiving (U.S.) and Memorial Day. There was an issue with the previous version, pointed out to me by David Markham. In this new version I've simplified the code a bit. The first function is udf_DateSerial. This function accepts an integer year, month, and day, and returns a corresponding DATETIME value. I borrowed the name of this function from the old VB function, which performs a similar task.
CREATE FUNCTION dbo.udf_DateSerial ( @Year INT, @Month INT, @Day INT)RETURNS DATETIMEASBEGIN DECLARE @Temp VARCHAR(20); SET @Temp = RIGHT('0000' + CAST(@Year AS VARCHAR(4)), 4) + RIGHT('00' + CAST(@Month AS VARCHAR(2)), 2) + RIGHT('00' + CAST(@Day AS VARCHAR(2)), 2); RETURN CAST(@Temp AS DATETIME);ENDGO
CREATE
(
@Year
@Month
@Day
)
RETURNS
AS
BEGIN
END
The next function, udf_GetDayOfWeek assigns an integer value to every day of week (Sunday = 1, Monday = 2, etc.) Note that the SQL DATEPART function performs a similar task, but it can start with Sunday = 0 instead of Sunday = 1 depending on your server settings.
CREATE FUNCTION dbo.udf_GetDayOfWeek ( @DayName VARCHAR(30))RETURNS INTASBEGIN RETURN CASE @DayName WHEN 'Sunday' THEN 1 WHEN 'SUN' THEN 1 WHEN 'Monday' THEN 2 WHEN 'MON' THEN 2 WHEN 'Tuesday' THEN 3 WHEN 'TUE' THEN 3 WHEN 'Wednesday' THEN 4 WHEN 'WED' THEN 4 WHEN 'Thursday' THEN 5 WHEN 'THU' THEN 5 WHEN 'Friday' THEN 6 WHEN 'FRI' THEN 6 WHEN 'Saturday' THEN 7 WHEN 'SAT' THEN 7 END;ENDGO
@DayName
The udf_DowOccurrence function has been simplified considerably. It accepts a year, month, the name of the day of week, and the occurrence number. The date returned is the nth occurrence of the day of the week for the given month and year.
CREATE FUNCTION dbo.udf_DowOccurrence
@DayOfWeek
@Occurrence
WHEN @Diff >= 0
THEN 1
ELSE 0
We can use the udf_DowOccurrence function to calculate floating U.S. holidays as follows:
-- Calculate American Thanksgiving Holiday: The 4th Thursday in November SELECT dbo.udf_DowOccurrence(2006, 11, 'THU', 4); -- Calculate Martin Luther King, Jr. Holiday: The 3rd Monday in January SELECT dbo.udf_DowOccurrence(2055, 1, 'MON', 3); -- Calculate President's Day: The 3rd Monday in February SELECT dbo.udf_DowOccurrence(2048, 2, 'MON', 3); -- Calculate Memorial Day: The last Monday in May. Note that since May -- can have 4 or 5 Mondays, we calculate the first Monday in June then -- subtract 7 days from it SELECT DATEADD(dd, -7, dbo.udf_DowOccurrence(2003, 6, 'MON', 1));SELECT DATEADD(dd, -7, dbo.udf_DowOccurrence(2009, 6, 'MON', 1)); GO
Updated Code Below.
I recently had to create a SQL table representing a Calendar with various holidays noted on it for work scheduling purposes. One of the first challenges was figuring out on which day Easter falls each year. I came up with two options:
I found a few sites that list all dates for Easter by year. One of the better ones is at the Astronomical Society of South Australia (http://www.assa.org.au/edm.html).
I decided to implement the formula method and calculate the dates on the fly. The formula I implemented is a simplified version of Carter's Method, which is available here: http://www.smart.net/~mmontes/carter.html. This method calculates Easter dates between 1900 C.E. and 2099 C.E. One thing to note is that when most of these formulas use the MOD() function, they are not talking about the results of the T-SQL MOD Operator (%), which is not a "true modulus", but rather the "remainder of division". "True modulus" and "remainder of division" give us completely different results when one of the operands is negative. So our first step is to define a "true modulus" function:
CREATE FUNCTION dbo.udf_modulo(@x INT, @y INT) RETURNS INT AS BEGIN DECLARE @mod INT SET @mod = @x - @y * FLOOR(CAST(@x AS FLOAT) / @y) RETURN @mod END GO
Next we need to implement the actual formula. This function takes the Year (between 1900 and 2099) as a parameter and returns the Date of Easter in that year:
CREATE FUNCTION dbo.udf_CalculateEaster (@Year INT) RETURNS DATETIME AS BEGIN DECLARE @Date DATETIME DECLARE @GoldenNumber INT DECLARE @EpactCalc INT DECLARE @PaschalDaysCalc INT DECLARE @PaschalWeekday INT DECLARE @NumOfDaysToSunday INT DECLARE @EasterMonth INT DECLARE @EasterDay INT IF @year >= 1900 AND @year <= 2099 BEGIN -- Golden Number of Year. The Phases of the moon repeat -- once every 19 years, so each year is associated with -- a Golden Number between 1 and 19 representing this -- relationship SET @GoldenNumber = dbo.udf_Modulo(@Year + 1, 19) -- Calculation based on Epact. Epact is the measure of -- the "age" of the moon (i.e., the number of days since the -- "official" new moon at the start of the year) SET @EpactCalc = dbo.udf_Modulo(5 + 19 * (@GoldenNumber), 30) -- Number of days from March 21 to Paschal Full Moon. The -- Paschal Full Moon is the full moon that is immediately -- before Easter. SET @PaschalDaysCalc = @EpactCalc - (@EpactCalc / 28) -- Calculate Weekday of Paschal Full Moon (0 = Sunday, -- 1 = Monday) SET @PaschalWeekday = dbo.udf_Modulo(@Year + @Year / 4 + @PaschalDaysCalc - 13, 7) -- Calculate Number of Days from March 21 to Sunday -- on or before Paschal Full Moon SET @NumOfDaysToSunday = @PaschalDaysCalc - @PaschalWeekday -- Calculate Easter Month SET @EasterMonth = 3 + (@NumOfDaysToSunday + 40) / 44 -- Calculate Easter Day SET @EasterDay = @NumOfDaysToSunday + 28 - 31 * (@EasterMonth / 4) -- Return Date SET @Date = CAST(@Year AS VARCHAR) + '-' + CAST(@EasterMonth AS VARCHAR) + '-' + CAST(@EasterDay AS VARCHAR) END RETURN @Date END GO
As a sample of this function, try the following:
SELECT 1900 AS [Year], dbo.udf_CalculateEaster(1900) AS [EasterDate] SELECT 1959 AS [Year], dbo.udf_CalculateEaster(1959) AS [EasterDate] SELECT 2048 AS [Year], dbo.udf_CalculateEaster(2048) AS [EasterDate] SELECT 2099 AS [Year], dbo.udf_CalculateEaster(2099) AS [EasterDate]
This returns the following results:
Year EasterDate 1900 1900-04-15 00:00:00.000 1959 1959-03-29 00:00:00.000 2048 2048-04-05 00:00:00.000 2099 2099-04-12 00:00:00.000
Aaron Bertrand over at ASPFAQ has added my original Easter calculation formula to his SQL Auxiliary Calendar Table. If you use SQL to manage calendars or for calendar-based scheduling be sure to check out his article: http://www.aspfaq.com/show.asp?id=2519
UPDATE:
There was an error in the formula given, which causes issues with some years. Since then I've located a better formula which does not have the same limitations as the other one. This one calculates Easter for all years, and doesn't require a modulo formula:
CREATE FUNCTION dbo.udf_CalculateEaster (@Year INT) RETURNS DATETIMEASBEGIN DECLARE @Date DATETIME DECLARE @c INT DECLARE @n INT DECLARE @i INT DECLARE @k INT DECLARE @j INT DECLARE @l INT DECLARE @m INT DECLARE @d INT SET @n = @Year - 19 * (@Year / 19) SET @c = @Year / 100 SET @k = (@c - 17) / 25 SET @i = @c - @c / 4 - (@c - @k) / 3 + 19 * @n + 15 SET @i = @i - 30 * (@i / 30) SET @i = @i - (@i / 28) * (1 - (@i / 28) * (29 / (@i + 1)) * ((21 - @n) / 11)) SET @j = @Year + @Year / 4 + @i + 2 - @c + @c / 4 SET @j = @j - 7 * (@j / 7) SET @l = @i - @j SET @m = 3 + (@l + 40) / 44 SET @d = @l + 28 - 31 * (@m / 4) SET @Date = CAST(@Year AS VARCHAR) + '-' + CAST(@m AS VARCHAR) + '-' + CAST(@d AS VARCHAR) RETURN @DateENDGO
I tested this one against a larger set of Easter dates at the US Naval Observatory, and the results are correct. Thanks.