August 22, 2005 at 2:19 pm
Insert into dbo.NormalizedModel (Animal)
Select ElementText
from dbo.Split('YourStringHere', 'delimiter')
Select Name as AnimalName, count(*) as Total
from dbo.NormalizedModel
group by Name
I count 2 lines of code here.
August 22, 2005 at 2:20 pm
I worked for a financial agency, (you would know the name) and we were getting lots of data in from many sources. Every time, I would be told this was a "one time thing". IT NEVER WAS. I can assure you, if you force yourself to get into the practice of normalizing data whenever and from whereever you get it - you will be much happier with yourself!
What I wrote was from a general parsing script I had from years ago. If you keep these kinds of scripts around, they become very useful...
I wasn't born stupid - I had to study.
August 22, 2005 at 2:24 pm
Glad I don't need those here...
August 22, 2005 at 2:49 pm
Funny.   You handed me one of your most common one's just hours ago...
  You handed me one of your most common one's just hours ago...  
I wasn't born stupid - I had to study.
August 22, 2005 at 2:54 pm
That's not for splitting, that's for concatenating and I actually never used it in production. I do use the split thought for list of ids to process.
August 22, 2005 at 3:00 pm
With a small add of ',' at the end of each string you can do
declare @v-2 table(val char(50))
insert into @v-2
SELECT 'dog, mouse,cat,' UNION ALL
SELECT 'mouse, dog, fox, hen ,'
declare @t table(nmb int)
declare @i int
set @i=50
while @i>0
begin
insert into @t select @i
set @i=@i-1
end
select pet,count(*)
from
(select ltrim(reverse(ltrim(left(reverse(left(val,nmb-1)),
case charindex(',',reverse(left(val,nmb-1)))-1
when -1 then len(reverse(left(val,nmb-1)))
else charindex(',',reverse(left(val,nmb-1)))-1
end
  ))) pet
))) pet
from @t,@v
where substring(val,nmb,1)=',') a
group by pet
Vasc
August 22, 2005 at 3:27 pm
You are talking about application requirements. How the application works has nothing, absolutely nothing at all, with how the data should be modeled. The database is a collection of facts about some real-world information model. The relational model is the proven (over and over again) best way to manage those facts, especially to make sure that data integrity is not violated.
Whether or not the procedure will be executed once a week or twice every second is not interesting to me. The fact is that with this design it will take you much longer to create this procedure than it would to redesign the schema and then create a procedure that produces the same resulting output. And what about next time you need a similar requirement? You will again need to do a difficult implementation, again losing time.
August 22, 2005 at 3:29 pm
Yes, whenever you are using a relation DBMS you should design correctly. It saves you time, gives you the best performance in most cases and most importantly avoids problems with lost data integrity. And it's easier!
August 22, 2005 at 8:29 pm
Hang on there, Paully21, I'll get to you in a minute...
I absolutely agree with everyone that having CSV columns in a table is a basic form of "death by SQL".  Despite the best of intentions or reasons for the use of CSV columns, we all know it will punish you throughout the life of the database much like creating a new archive database for every month will (a recent post a couple of us worked on :sick .
.
Still, it's an interesting problem and we can introduce Paully21 to the wonders of a Tally/Numbers table...
Ok, Paully21... here we go...
There's lot's of routines to split CSV columns and most of them aren't worth the trons they're stored in. There's a couple of really smart folks on this forum that have come up with the notion of a thing called a "numbers" table... being old school, I call it a "Tally" table because it's used to tally things during a process. Basically, it allows you to do the same things a loop would do but without the loop. Splitting CSV columns is a really great example of Tally table use.
Before we can do anything with a Tally table, we first have to create it... this is a one time deal.  Here's the script... hope you don't have a BSOFH for a DBA...
--===== First, create the tally table with
CREATE TABLE dbo.Tally (n SMALLINT IDENTITY(1,1) PRIMARY KEY, DoDah CHAR(1))
--===== Populate it with a paltry 8000 numbers using a cross-join to avoid
-- doing it with a loop (which wouldn't hurt here, just cool to do without).
-- SYSCOLUMNS is guaranteed to have at least 256 entries so a single
-- cross-join is good up to 65,536 numbers. We only need 8k.
INSERT INTO dbo.Tally (DoDah)
SELECT TOP 8000 NULL
FROM dbo.SYSCOLUMNS sc1,
dbo.SYSCOLUMNS sc2
--===== Drop the DoDah column... it's not needed anymore.
ALTER TABLE dbo.Tally
DROP COLUMN DoDah
The Tally table has 8000 numbers in it starting at 1, incrementing by 1, and ending at 8000. I gotta warn ya... for some reason, many DBA's just can't stand to part with the lousy 16 or 20 kbytes this table takes up even though it can save hundreds of hours of processing time in a single year.
Next, we need a User Defined Function that will split any CSV string we throw at it. While we're at it, let's add a wee bit o' logic to cleanup things like extra spaces and the like...
CREATE FUNCTION dbo.fnSplit
/**********************************************************
Purpose:
This function splits names and strings of words up to 8000
characters long based on a delimiter. Multiple adjacent
spaces are reduced to single spaces throughout the input
string. Only one word is returned at a time. Leading and
trailing spaces for each word/word combo are dropped.
Usage:
dbo.fnSplit(stringtosplit,wordtoreturn,delimiter)
Notes: Don't change the datatype nor the length of
@Delimiter in the code. The code hasn't been set
up to use multicharacter delimiters (yet).
Revisions:
Rev 00 - 08/17/2005 - Jeff Moden - Initial Creation/Test
**********************************************************/
--=========================================================
-- Define the input/output parameters
--=========================================================
(
@String VARCHAR(8000), --The string to split
@WordNum INT = 1, --Which word to return
@Delimiter CHAR(1) = ',' --What the delimiter is
)
RETURNS VARCHAR(8000)
AS
BEGIN
--=========================================================
-- Setup and Presets
--=========================================================
--===== Determine the max number of characters to process
-- by counting characters in the string to split
DECLARE @MaxPosit INT
SET @MaxPosit = LEN(@String)
--===== Create a place to store the NameParts
DECLARE @Words TABLE
(
WordNum INT IDENTITY PRIMARY KEY,
Word VARCHAR(8000)
  
--===== This converts multiple spaces into single spaces
-- no matter how many occur or where in the string
SELECT @String = REPLACE(@String,' ',' ')
FROM dbo.Tally
WHERE [n] <= @MaxPosit/2+1
--===== Add a delimiter to the beginning and end to allow
-- words to be found at each end
SET @String = @Delimiter+@String+@Delimiter
--===== Recalc the maximum number of characters for speed
SET @MaxPosit = LEN(@String)
--=========================================================
-- Find the desired word
--=========================================================
--===== Parse the NameParts and store them
INSERT INTO @Words (Word)
SELECT SUBSTRING(@String,[n]+1,
CHARINDEX(@Delimiter,@String,[n]+1)-[n]-1
   AS Word
 AS Word
FROM dbo.Tally
WHERE [n] >= 1
AND [n] < @MaxPosit - 1
AND SUBSTRING(@String,[n],1) = @Delimiter
ORDER BY [n]
--===== Return the desired word without any leading or
-- trailing spaces if they occur
RETURN (
SELECT LTRIM(RTRIM(Word))
FROM @Words
WHERE WordNum = @WordNum
  
--=========================================================
-- End of the Function
--=========================================================
END
Most of the above code is setup and cleanup of the inputs... the meat of the function is in the section called "Parse the NameParts and store them". It get's a little inefficient if you have hundreds or thousands of CSV values in the same record because it solves them all even though it only returns one at a time (I'm working on that). But, it's still fast. Think of the function as "PARSENAME on steroids" because, like PARSENAME, you have to tell it which word (left to right, in this case) to return. You can even tell it what single character delimiter you want to use.
Now that we have that done, we can actually (finally) get to your original problem... I'll let the embedded comments do all the talking...
--===== If temp table to hold animal lists exists, drop it and rebuild
IF OBJECT_ID('TempDB..#Animals') IS NOT NULL
DROP TABLE #Animals
CREATE TABLE dbo.#Animals
(ID INT IDENTITY(1,1),
CSVAnimals VARCHAR(200))
--===== Populate the Animals table with sample CSV data
-- Notice the abhorant fashion of the inputs
INSERT INTO dbo.#Animals (CSVAnimals)
SELECT 'cat, dog, mouse' UNION ALL
SELECT ' mouse,dog, fox , hen ' UNION ALL
SELECT 'hen' UNION ALL
SELECT ' hen ' UNION ALL
SELECT 'fox,hen' UNION ALL
SELECT 'mongoose, goldfish,cat' UNION ALL
SELECT 'dog , cat' UNION ALL
SELECT 'pussy cat, big dog'
--===== All set... let's process... ==================================
--===== Create a temp table to hold each occurance of each animal
IF OBJECT_ID('TempDB..#MyTemp') IS NOT NULL
DROP TABLE #MyTemp
CREATE TABLE #MyTemp
(ID INT IDENTITY(1,1),
Animal VARCHAR(30))
--===== Can't do 100% set based but we can do a column at a time
-- where a "column" is the occurance of an animal in the
-- original table. Example, the 'cat, dog, mouse' has
-- 3 "columns" of data. So, declare some variables to help
-- us through that mess.
DECLARE @ColCounter SMALLINT
DECLARE @MaxCols SMALLINT
--===== Find the max number of columns to process by counting commas
-- and adding 1
SELECT @MaxCols = MAX(LEN(CSVAnimals)-LEN(REPLACE(CSVAnimals,',',''))+1)
FROM dbo.Animals
--===== Now, split each animal out one column at a time but all rows
-- in the source at the same time. The counter keeps track of
-- which column we're working on. The example data at the
-- beginning of this script will cause only 4 passes even
-- though there's 8 rows of data because there's only 4 "columns".
SELECT @ColCounter = 1
WHILE @ColCounter <= @MaxCols
BEGIN
INSERT INTO #MyTemp (Animal)
SELECT dbo.fnSplit(CSVAnimals,@ColCounter,',')
FROM dbo.Animals
WHERE dbo.fnSplit(CSVAnimals,@ColCounter,',') IS NOT NULL
SELECT @ColCounter = @ColCounter+1
END
--===== Now, calculate and display the counts for each animal
-- from the single records we accumulated in the temp table.
SELECT UPPER(Animal),COUNT(Animal) AS TheCount
FROM #MyTemp
GROUP BY Animal
ORDER BY Animal
And, now, hopefully, you see what everyone was talking about in the previous 30 posts or so... CSV is a pain in the socks and should be avoided at all costs.  As so many posts suggested, a normalized child table with one value per row would be much more effective in the long run.  Yes, the steps above solve the problem but look at all of the code you have to deal with... and that's just for one bloody scenario.  Yes, the function helps in all scenarios but it runs HUNDREDS of times slower than true, normalized, set-based processing.  And, it uses one of the fastest methods available for solving this type of problem (Kudos to Adam Mechanic and others). 
Lemme know how the code works out for you, eh?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 22, 2005 at 11:21 pm
Nice work... too bad I won't ever need that  .
.
August 23, 2005 at 7:51 am
My post got lost between some sol NOT SET based so I ll posted again (the while was just to generate the numbers table)
SET BASED solution
declare @v-2 table(val varchar(50))
insert into @v-2
SELECT 'dog, mouse,cat' UNION ALL
SELECT 'mouse, dog, fox, hen'
declare @t table(nmb int identity(1,1),[dummy] char(1))
insert into @t ([dummy])
select top 51 NULL from master.dbo.syscolumns
select pet,count(*)
from
(select ltrim(reverse(ltrim(left(reverse(left(val+',',nmb-1)),
case charindex(',',reverse(left(val+',',nmb-1)))-1
when -1 then len(reverse(left(val+',',nmb-1)))
else charindex(',',reverse(left(val+',',nmb-1)))-1
end)))) pet
from @t,@v
where substring(val+',',nmb,1)=',') a
group by pet
Vasc
August 23, 2005 at 7:47 pm
Yeah, I wish I didn't need it! Wonderful 3rd party solution at work. Bunch o' DB rookies... these are the same wonderful folks that built an Oracle-Like sequence table (NextID) into their wonderful frigging system because they didn't know how to spell "IDENTITY". To give you a hint of how bad this 3rd party software is, the NextID table was the source of an average of 620 deadlocks PER DAY when I started work there about 20 months ago. Now THAT was fun to fix!
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2005 at 9:25 pm
I can't even imagine the nightmare... I'm just hopping that it was not a 24/7 shop and that you had time to apply the updates... not to mention a kick ass testing/QA system.
August 23, 2005 at 9:36 pm
Wasn't that lucky... it's 24/7. We get to bounce the server once a week for "impacting changes" for about 30 minutes... then, QA get's to verify what we've done. Of course, we test the heck out of the changes on another server before any code even come close to the production box.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 2, 2006 at 8:29 am
I didn't read this whole post, but when I got to the part "why is this bad design?", I stopped to post a reply.
You are violating a rule called "1st Normal Form"
To disallow multivalued attributes, composite attributes, and thier combination.
"Only atomic (simple, indivisible) values"
..
ISBN: 0321122267

Viewing 15 posts - 16 through 30 (of 34 total)
You must be logged in to reply to this topic. Login to reply