﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / SQL Server 2008 / T-SQL (SS2K8)  / Word analysis with SQL / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Sun, 19 May 2013 22:28:11 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Word analysis with SQL</title><link>http://www.sqlservercentral.com/Forums/Topic1409016-392-1.aspx</link><description>[quote][b]sqlbi.vvamsi (1/23/2013)[/b][hr]@sam, can you test this one too, just curious, not sure if this helps performance wise..[/quote]It is a good suggestion to do a performance test.  I like general tools like PatternSplitCM for the ease with which they allow you to solve a problem quickly.  Most of the time a focused, working solution will outperform a general tool.  But the general tool may be a good, bootstrap approach to get your code working.</description><pubDate>Wed, 23 Jan 2013 17:29:13 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Word analysis with SQL</title><link>http://www.sqlservercentral.com/Forums/Topic1409016-392-1.aspx</link><description>@sam, can you test this one too, just curious, not sure if this helps performance wise..;with cte1(part,leftover,textid) as(select left([text],CHARINDEX(' ',[text])),RIGHT([text],len([text])-CHARINDEX(' ',[text])), id from #Textunion allselect left(leftover,case when CHARINDEX(' ',leftover)=0 then LEN(leftover) else CHARINDEX(' ',leftover) end ),RIGHT(leftover,len(leftover)-case when CHARINDEX(' ',leftover)=0 then len(leftover) else CHARINDEX(' ',leftover) end ),textid from cte1 where leftover!=''),cte2(trimpart,id) as(select cast(part as varchar(140)),textid from cte1 union allselect  cast(replace(trimpart,substring(trimpart,patindex('%[^a-z]%',rtrim(ltrim(trimpart))),1),'') as varchar(140)),id from cte2 where  patindex('%[^a-z]%',rtrim(ltrim(trimpart)))!=0)select t.id,t.[text],isnull(recurrence,0) from(select id,COUNT(id) as recurrence from(select id,trimpart from cte2 where  patindex('%[^a-z]%',rtrim(ltrim(trimpart)))=0 and trimpart in (select [word] from #Words))tempgroup by id)temp1 right join#Text ton t.id=temp1.id</description><pubDate>Wed, 23 Jan 2013 10:32:12 GMT</pubDate><dc:creator>sqlbi.vvamsi</dc:creator></item><item><title>RE: Word analysis with SQL</title><link>http://www.sqlservercentral.com/Forums/Topic1409016-392-1.aspx</link><description>Dwain - Thank you very much. It works like a charm!</description><pubDate>Mon, 21 Jan 2013 11:52:32 GMT</pubDate><dc:creator>Sam Vanga</dc:creator></item><item><title>RE: Word analysis with SQL</title><link>http://www.sqlservercentral.com/Forums/Topic1409016-392-1.aspx</link><description>If Mr. Magoo's link doesn't give you any ideas, you might want to take a look at the 4th article in my signature links to retrieve the FUNCTION PatternSplitCM and then try it this way.[code="sql"]SELECT a.ID, a.[Text], TotalScore=ISNULL(SUM(c.Score), 0)FROM #Text aCROSS APPLY PatternSplitCM([Text], '[A-Za-z]') bLEFT JOIN #Words c ON b.Item = c.WordGROUP BY a.ID, a.[Text][/code]If you're working in a case sensitive database, you'll want to UPPER the Item and Word in the ON clause of the LEFT JOIN.</description><pubDate>Sun, 20 Jan 2013 17:43:16 GMT</pubDate><dc:creator>dwain.c</dc:creator></item><item><title>RE: Word analysis with SQL</title><link>http://www.sqlservercentral.com/Forums/Topic1409016-392-1.aspx</link><description>you might want to take a look at the proc dm_fts_parserIt might just provide the word analysis you are looking for...Link : [url]http://msdn.microsoft.com/en-us/library/cc280463(v=sql.100).aspx[/url]</description><pubDate>Fri, 18 Jan 2013 11:18:23 GMT</pubDate><dc:creator>mister.magoo</dc:creator></item><item><title>Word analysis with SQL</title><link>http://www.sqlservercentral.com/Forums/Topic1409016-392-1.aspx</link><description>I did a blog post earlier called [url=http://svangasql.wordpress.com/2012/11/12/pass-summit-2012-twitter-dashboard/]PASS Summit Twitter Dashboard[/url]. It shows all tweets by people at the PASS summit.I now want to expand that by measuring the sentiment score. Logic to measure the score is below:there are a list of pre defined good and bad words with a score of 1 and -1 respectively.for each tweet, remove punctuation marks from text.compare words from each tweet with the predefined words list.get score based on matched words.sentiment score is the sum of scores from the above match.The logic may be inaccurate, but from what i learnt this model is currently being researched and is acceptable.Here is the sample DDL.Predefined words and score:[code="sql"]CREATE TABLE #Words(Id int identity(1,1) primary key, Word char(10), Score int) ;INSERT #Words(Word, Score)VALUES ('Good',1)		, ('Awesome', 1)		, ('Super', 1)		, ('Bad', -1)		, ('Fail', -1)		, ('Dirty', -1) ;[/code]		Tweets:[code="sql"]CREATE TABLE #Text(Id int identity(1,1) primary key, [Text] varchar(140))INSERT #Text([Text])VALUES ('New Bond movie is #awesome!'), ('I hear dirty reviews. Product X is a fail. #fail'), ('I am neutral!!!')[/code]Result:[code="sql"]CREATE TABLE #Result([Text] varchar(140), Score int)INSERT #ResultVALUES('New Bond movie is #awesome!',1), ('I hear dirty reviews. Product X is a fail. #fail',3), ('I am neutral!!!',0)SELECT  *FROM    #Result[/code]For example, score for 'New Bond movie is #awesome!' is 1 because after removing punctuation mark (!) word awesome matches with a word in the Words table and score is 1.Score for 'I hear dirty reviews. Product X is a fail. #fail' = 3 because of the words dirty, fail, and fail (after removing #).Query should be able to perform with a huge data set, approximately 100K rows.Many thanks for your time and input.Sam.</description><pubDate>Fri, 18 Jan 2013 10:54:10 GMT</pubDate><dc:creator>Sam Vanga</dc:creator></item></channel></rss>