SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Pseudo-Random

Add to Technorati Favorites Add to Google
Browse by Tag : functions (RSS)

YASSAS ("Yet Another SQL Server Array Splitter")

By Michael Coles in Pseudo-Random | 03-10-2006 4:59 PM | Categories: Filed under: , ,
Rating: (not yet rated) Rate this |  Discuss | 4,844 Reads | 90 Reads in Last 30 Days |no comments

There are billions of VARCHAR-array-splitting routines out there to compensate for the lack of true Arrays and Array Parameters in SQL Server.  In yesterday's blog I talked about passing in and splitting an array with more than 8,000 characters [the VARCHAR max].  Other times you might know that your input array will always be smaller than the VARCHAR upper size limit.  In those cases, we can try to squeeze a little extra efficiency out of the user-defined function by eliminating some of the extra logic required to process a TEXT parameter.  A lot of the routines out there use a lot more IF ... ELSE logic and string manipulations than they need to, so I've tried to minimize that in my code.  With that in mind, here's the YASSAS small array splitter that accepts a VARCHAR(8000):

CREATE FUNCTION dbo.fn_SmallSplit (@sText VARCHAR(8000),
 @sDelim VARCHAR(20))
RETURNS @ret TABLE(
 Idx INT IDENTITY(0, 1) NOT NULL PRIMARY KEY,
 Value VARCHAR(8000)
)
AS
BEGIN
 DECLARE @i INT
 SELECT @i = 1
 DECLARE @j INT
 WHILE @i <= LEN(@sText) + 1
 BEGIN
  SELECT
@j = CASE
   WHEN CHARINDEX(@sDelim, @sText, @i) <= 0 THEN
    DATALENGTH(@sText) + 1
   ELSE
    CHARINDEX(@sDelim, @sText, @i)
   END
  INSERT INTO
@ret (Value)
  VALUES (LTRIM(RTRIM(SUBSTRING(@sText, @i, @j - @i))))
  SELECT @i = @j + DATALENGTH(@sDelim)
 END
 RETURN
END

GO


8,001 Bytes And Beyond (SQL Arrays)

By Michael Coles in Pseudo-Random | 03-09-2006 11:29 PM | Categories: Filed under: , ,
Rating: (not yet rated) Rate this |  Discuss | 4,902 Reads | 86 Reads in Last 30 Days |2 comment(s)

I recently sat down to try to figure the best way to pass a comma-delimited array to a SQL Server 2000 stored procedure.  The catch?  It had to handle arrays larger than 8,000 bytes - the upper limit for VARCHAR data.  SQL Guru and newsgroup regular Erland Sommarskog posted a great piece on parsing arrays, demonstrating and comparing several methods at http://www.sommarskog.se/arrays-in-sql.html.  Unfortunately all his methods run up on the same VARCHAR limitations.  So I turned my attention to SQL Server's TEXT datatype.  I hit a bunch of brick walls there:  You can't declare TEXT variables.  You can't manipulate TEXT as easily as VARCHAR data.  You can't use TEXTPTR() in a user-defined function.  Etc.

Despite the problems, I knew I was on the right track.  I realized early on that you can use SUBSTRING() on TEXT data.  You can index into the TEXT data as far as you want, but SUBSTRING() only returns a VARCHAR once again maxed out at 8,000 bytes.

After hacking out my own half-working version using PATINDEX() and SUBSTRING(), I found a great example called "Breaking the 8000 Byte Limit" at http://weblogs.sqlteam.com/davidm/archive/2003/12/12/655.aspx.  DavidM's version uses a "Numbers" table of sequential numbers to index into the TEXT parameter, and has a fixed delimiter (",").  I converted his version to one that uses the simple looping structure I initially started with to break apart the TEXT parameter.  This way I didn't have to create the Numbers table.  The primary difference between his version and the one I came up with is his use of the nested CASE statement, and the clever use of the ABS() and SIGN() functions which are more efficient than the IF statements I originally implemented.  DavidM's version is also an inline function, which is probably the reason for the Numbers table.  Because I wanted to avoid the Numbers table, I had to create a Table-Valued Function.

With this function, you can pass in a delimited array of up to 2+ billion bytes; although I wouldn't recommend it.  I ran tests of arrays with 8,000, 32,000 and 64,000 bytes with delimiters of varying lengths (one, two and three characters).  The items in the list should each be less than 8,000 characters in length; since the function extracts the data in the form of VARCHARs.  This limit is imposed by SQL Server.  Regardless of the total length of the list in characters, the fewer items in the list, the faster the function will run (less INSERTs).  In my tests, a 64,000 byte list with 1,600 items was parsed 10 times faster than a 64,000 byte list with 9,400 items.

Here's the function.  Enjoy:

CREATE FUNCTION dbo.BigSplit
(
 @List  TEXT,
 @Delimiter VARCHAR(20)
)
RETURNS @ret TABLE (
 Number INT IDENTITY(0, 1) NOT NULL PRIMARY KEY,
 Value VARCHAR(8000)
)
AS
BEGIN

 DECLARE @Index INT
 SELECT @Index = 1
 DECLARE @Item VARCHAR(8000)

 WHILE @Index < (DATALENGTH(@List) + DATALENGTH(@Delimiter))
 BEGIN
  IF ((SUBSTRING(@List, @Index - DATALENGTH(@Delimiter), DATALENGTH(@Delimiter)) = @Delimiter AND @Index > 1) OR @Index = 1)
  BEGIN
   SELECT @Item = SUBSTRING(@List, @Index,
    CASE SIGN(CHARINDEX(@Delimiter, @List, @Index) - @Index)
     WHEN -1 THEN
      CASE PATINDEX('%' + @Delimiter + '%', SUBSTRING(@List, @Index, ABS(CHARINDEX(@Delimiter, @List, @Index) - @Index)))
     WHEN 0 THEN
      DATALENGTH(@List) - @Index + 1
     ELSE
      PATINDEX('%' + @Delimiter + '%', SUBSTRING(@List, @Index, ABS(CHARINDEX(@Delimiter, @List, @Index) - @Index))) - 1
     END
    ELSE
     ABS(CHARINDEX(@Delimiter, @List, @Index) - @Index)
    END)
   INSERT INTO @ret (Value)
   VALUES (@Item)
  END
  SELECT @Index = @Index + DATALENGTH(@Item) + DATALENGTH(@Delimiter)
 END
 RETURN
END
GO


Sorting Book Chapters

By Michael Coles in Pseudo-Random | 07-24-2005 10:46 PM | Categories: Filed under: , , ,
Rating: (not yet rated) Rate this |  Discuss | 4,579 Reads | 99 Reads in Last 30 Days |no comments

A while back a SQL newsgroup poster asked a question about sorting chapter numbers for a book in SQL.  As anyone who's picked up a programming book or technical paper online knows, it's common practice to number chapters in the format '1.1', '2.1.3', '3.4.5.6', etc.  The problem occurs when you have chapter numbers that contain two digits or more per segment, as demonstrated here:

CREATE TABLE #Chapters
( ID INT NOT NULL IDENTITY(1, 1),
  Chapter VARCHAR(20))

INSERT INTO #Chapters (Chapter)
SELECT '1.1.0'
UNION SELECT '1.1.10'
UNION SELECT '2.1.0'
UNION SELECT '2.2.0'
UNION SELECT '3.1.1'
UNION SELECT '3.10.1'
UNION SELECT '3.2.1'
UNION SELECT '1.10.1'
UNION SELECT '11.1.0'

SELECT Chapter
FROM #Chapters
ORDER BY Chapter

DROP TABLE #Chapters

The result is:

Chapter
-------
1.1.0
1.1.10
1.10.1
11.1.0
2.1.0
2.2.0
3.1.1
3.10.1
3.2.1

We know that chapter 11.1.0 should not come before Chapter 2.1.0, and that Chapter 3.10.1 should not come before Chapter 3.2.1. But SQL is sorting based on string character codes; therefore our list comes out sorted incorrectly. How can we fix this? My solution is to left-pad each segment of the string with zeroes. So '3.10.1' becomes '003.010.001'. By left-padding all of our Chapter numbers in this manner, we get the correct sorting action. The following UDF shows how to left-pad strings in this format:

CREATE FUNCTION dbo.udf_SortableNumber (@chapter_number VARCHAR(500))
RETURNS VARCHAR(500)
AS
BEGIN
  DECLARE @temp VARCHAR(500)
  SET @temp = ''
  DECLARE @i INTEGER
  DECLARE @j INTEGER
  SET @i = 0
  WHILE (@i < LEN(@chapter_number))
  BEGIN
    SET @j = CHARINDEX('.', @chapter_number, @i + 1)
    IF @j = 0
    BEGIN
      SET @j = LEN(@chapter_number) + 1
    END
    SET @temp = @temp + RIGHT('000' + SUBSTRING(@chapter_number, @i + 1, @j - @i - 1), 3) + '.'
    SET @i = @j
  END
  RETURN @temp
END

To incorporate this UDF into our previous example:

CREATE TABLE #Chapters
( ID INT NOT NULL IDENTITY(1, 1),
  Chapter VARCHAR(20),
  SortableChapter VARCHAR(20))

INSERT INTO #Chapters (Chapter)
SELECT '1.1.0'
UNION SELECT '1.1.10'
UNION SELECT '2.1.0'
UNION SELECT '2.2.0'
UNION SELECT '3.1.1'
UNION SELECT '3.10.1'
UNION SELECT '3.2.1'
UNION SELECT '1.10.1'
UNION SELECT '11.1.0'

UPDATE #Chapters
SET SortableChapter = dbo.udf_SortableNumber(Chapter)

SELECT Chapter
FROM #Chapters
ORDER BY SortableChapter

DROP TABLE #Chapters

The result is:

Chapter
-------
1.1.0
1.1.10
1.10.1
2.1.0
2.2.0
3.1.1
3.2.1
3.10.1
11.1.0

As we can see, the list is properly sorted this time around. This UDF also works to sort IP Addresses properly, so that '127.0.0.1' is listed after '96.10.1.10'.

 


Calculating Floating Holidays

By Michael Coles in Pseudo-Random | 07-17-2005 3:18 PM | Categories: Filed under: , , ,
Rating: |  Discuss | 5,328 Reads | 122 Reads in Last 30 Days |1 comment(s)

This is a follow-up to the calculating Easter entry I posted yesterday.  In this one we calculate "floating holidays", such as Thanksgiving (U.S.) and Memorial Day.  There was an issue with the previous version, pointed out to me by David Markham.  In this new version I've simplified the code a bit.  The first function is udf_DateSerial.  This function accepts an integer year, month, and day, and returns a corresponding DATETIME value.  I borrowed the name of this function from the old VB function, which performs a similar task.

CREATE FUNCTION dbo.udf_DateSerial

(

@Year INT,

@Month INT,

@Day INT

)

RETURNS DATETIME

AS

BEGIN

DECLARE @Temp VARCHAR(20);

SET @Temp = RIGHT('0000' + CAST(@Year AS VARCHAR(4)), 4) +

RIGHT('00' + CAST(@Month AS VARCHAR(2)), 2) +

RIGHT('00' + CAST(@Day AS VARCHAR(2)), 2);

RETURN CAST(@Temp AS DATETIME);

END

GO

The next function, udf_GetDayOfWeek assigns an integer value to every day of week (Sunday = 1, Monday = 2, etc.)  Note that the SQL DATEPART function performs a similar task, but it can start with Sunday = 0 instead of Sunday = 1 depending on your server settings.

CREATE FUNCTION dbo.udf_GetDayOfWeek

(

@DayName VARCHAR(30)

)

RETURNS INT

AS

BEGIN

RETURN CASE @DayName

WHEN 'Sunday' THEN 1

WHEN 'SUN' THEN 1

WHEN 'Monday' THEN 2

WHEN 'MON' THEN 2

WHEN 'Tuesday' THEN 3

WHEN 'TUE' THEN 3

WHEN 'Wednesday' THEN 4

WHEN 'WED' THEN 4

WHEN 'Thursday' THEN 5

WHEN 'THU' THEN 5

WHEN 'Friday' THEN 6

WHEN 'FRI' THEN 6

WHEN 'Saturday' THEN 7

WHEN 'SAT' THEN 7

END;

END

GO

The udf_DowOccurrence function has been simplified considerably.  It accepts a year, month, the name of the day of week, and the occurrence number.  The date returned is the nth occurrence of the day of the week for the given month and year.

CREATE FUNCTION dbo.udf_DowOccurrence

(

@Year INT,

@Month INT,

@DayOfWeek varchar(20),

@Occurrence INT

)

RETURNS DATETIME

AS

BEGIN

DECLARE @FirstOfMonth DATETIME;

SET @FirstOfMonth = dbo.udf_DateSerial(@Year, @Month, 1);

DECLARE @Diff INT;

SET @Diff = dbo.udf_GetDayOfWeek(@DayOfWeek) - dbo.udf_GetDayOfWeek(DATENAME(dw, @FirstOfMonth));

RETURN DATEADD(day, @Diff + (@Occurrence - CASE

WHEN @Diff >= 0

THEN 1

ELSE 0

END

) * 7, @FirstOfMonth);

END

GO

We can use the udf_DowOccurrence function to calculate floating U.S. holidays as follows:

-- Calculate American Thanksgiving Holiday:  The 4th Thursday in November
SELECT dbo.udf_DowOccurrence(2006, 11, 'THU', 4);
-- Calculate Martin Luther King, Jr. Holiday:  The 3rd Monday in January
SELECT dbo.udf_DowOccurrence(2055, 1, 'MON', 3);
-- Calculate President's Day:  The 3rd Monday in February
SELECT dbo.udf_DowOccurrence(2048, 2, 'MON', 3);
-- Calculate Memorial Day:  The last Monday in May.  Note that since May
-- can have 4 or 5 Mondays, we calculate the first Monday in June then
-- subtract 7 days from it
SELECT DATEADD(dd, -7, dbo.udf_DowOccurrence(2003, 6, 'MON', 1));
SELECT DATEADD(dd, -7, dbo.udf_DowOccurrence(2009, 6, 'MON', 1)); GO

Calculating Easter

By Michael Coles in Pseudo-Random | 07-16-2005 8:32 PM | Categories: Filed under: , , ,
Rating: |  Discuss | 5,984 Reads | 130 Reads in Last 30 Days |2 comment(s)

Updated Code Below.

I recently had to create a SQL table representing a Calendar with various holidays noted on it for work scheduling purposes.  One of the first challenges was figuring out on which day Easter falls each year.  I came up with two options: 

  • 1) Manually create a table with all pre-calculated Easter dates in it
  • 2) Implement a formula to calculate Easter dates

I found a few sites that list all dates for Easter by year.  One of the better ones is at the Astronomical Society of South Australia (http://www.assa.org.au/edm.html).

I decided to implement the formula method and calculate the dates on the fly.  The formula I implemented is a simplified version of Carter's Method, which is available here:  http://www.smart.net/~mmontes/carter.html.  This method calculates Easter dates between 1900 C.E. and 2099 C.E.  One thing to note is that when most of these formulas use the MOD() function, they are not talking about the results of the T-SQL MOD Operator (%), which is not a "true modulus", but rather the "remainder of division".  "True modulus" and "remainder of division" give us completely different results when one of the operands is negative.  So our first step is to define a "true modulus" function:

CREATE FUNCTION dbo.udf_modulo(@x INT, @y INT)
RETURNS INT
AS
BEGIN
        DECLARE @mod INT
        SET @mod = @x - @y * FLOOR(CAST(@x AS FLOAT) / @y)
        RETURN @mod
END
GO

Next we need to implement the actual formula.  This function takes the Year (between 1900 and 2099) as a parameter and returns the Date of Easter in that year:

CREATE FUNCTION dbo.udf_CalculateEaster (@Year INT) 
RETURNS DATETIME
AS
BEGIN
    DECLARE @Date DATETIME
    DECLARE @GoldenNumber INT
    DECLARE @EpactCalc INT
    DECLARE @PaschalDaysCalc INT
    DECLARE @PaschalWeekday INT
    DECLARE @NumOfDaysToSunday INT
    DECLARE @EasterMonth INT
    DECLARE @EasterDay INT
    IF @year >= 1900 AND @year <= 2099
    BEGIN
        -- Golden Number of Year.  The Phases of the moon repeat
	-- once every 19 years, so each year is associated with
	-- a Golden Number between 1 and 19 representing this
	-- relationship
        SET @GoldenNumber = dbo.udf_Modulo(@Year + 1, 19)

        -- Calculation based on Epact.  Epact is the measure of
	-- the "age" of the moon (i.e., the number of days since the
        -- "official" new moon at the start of the year)
        SET @EpactCalc = dbo.udf_Modulo(5 + 19 * (@GoldenNumber), 30)
 
        -- Number of days from March 21 to Paschal Full Moon.  The
	-- Paschal Full Moon is the full moon that is immediately
	-- before Easter.
        SET @PaschalDaysCalc = @EpactCalc - (@EpactCalc / 28)

        -- Calculate Weekday of Paschal Full Moon (0 = Sunday, 
	-- 1 = Monday)
        SET @PaschalWeekday = dbo.udf_Modulo(@Year + @Year / 4 + @PaschalDaysCalc - 13, 7)
 
        -- Calculate Number of Days from March 21 to Sunday 
	-- on or before Paschal Full Moon
        SET @NumOfDaysToSunday = @PaschalDaysCalc - @PaschalWeekday

        -- Calculate Easter Month
        SET @EasterMonth = 3 + (@NumOfDaysToSunday + 40) / 44
 
        -- Calculate Easter Day
        SET @EasterDay = @NumOfDaysToSunday + 28 - 31 * (@EasterMonth / 4)

        -- Return Date
        SET @Date = CAST(@Year AS VARCHAR) + '-' + CAST(@EasterMonth AS VARCHAR) + '-' + CAST(@EasterDay AS VARCHAR)
    END
    RETURN @Date
END
GO

As a sample of this function, try the following:

SELECT 1900 AS [Year], dbo.udf_CalculateEaster(1900) AS [EasterDate]
SELECT 1959 AS [Year], dbo.udf_CalculateEaster(1959) AS [EasterDate]
SELECT 2048 AS [Year], dbo.udf_CalculateEaster(2048) AS [EasterDate]
SELECT 2099 AS [Year], dbo.udf_CalculateEaster(2099) AS [EasterDate]

This returns the following results:

  Year       EasterDate
  1900       1900-04-15 00:00:00.000
  1959       1959-03-29 00:00:00.000
  2048       2048-04-05 00:00:00.000
  2099       2099-04-12 00:00:00.000

Aaron Bertrand over at ASPFAQ has added my original Easter calculation formula to his SQL Auxiliary Calendar Table. If you use SQL to manage calendars or for calendar-based scheduling be sure to check out his article: http://www.aspfaq.com/show.asp?id=2519

UPDATE:

There was an error in the formula given, which causes issues with some years.  Since then I've located a better formula which does not have the same limitations as the other one.  This one calculates Easter for all years, and doesn't require a modulo formula:

CREATE FUNCTION dbo.udf_CalculateEaster (@Year INT)
RETURNS DATETIME
AS
BEGIN

    DECLARE @Date DATETIME
    DECLARE @c INT
    DECLARE @n INT
    DECLARE @i INT
    DECLARE @k INT
    DECLARE @j INT
    DECLARE @l INT
    DECLARE @m INT
    DECLARE @d INT
    SET @n = @Year - 19 * (@Year / 19)
    SET @c = @Year / 100
    SET @k = (@c - 17) / 25
    SET @i = @c - @c / 4 - (@c - @k) / 3 + 19 * @n + 15
    SET @i = @i - 30 * (@i / 30)
    SET @i = @i - (@i / 28) * (1 - (@i / 28) * (29 / (@i + 1)) * ((21 - @n) / 11))
    SET @j = @Year + @Year / 4 + @i + 2 - @c + @c / 4
    SET @j = @j - 7 * (@j / 7)
    SET @l = @i - @j
    SET @m = 3 + (@l + 40) / 44
    SET @d = @l + 28 - 31 * (@m / 4)
    SET @Date = CAST(@Year AS VARCHAR) + '-' + CAST(@m AS VARCHAR) + '-' + CAST(@d AS VARCHAR)
    RETURN @Date
END
GO

I tested this one against a larger set of Easter dates at the US Naval Observatory, and the results are correct.  Thanks.