I did a blog post earlier called PASS Summit Twitter Dashboard
. It shows all tweets by people at the PASS summit.
I now want to expand that by measuring the sentiment score. Logic to measure the score is below:
there are a list of pre defined good and bad words with a score of 1 and -1 respectively.
for each tweet, remove punctuation marks from text.
compare words from each tweet with the predefined words list.
get score based on matched words.
sentiment score is the sum of scores from the above match.
The logic may be inaccurate, but from what i learnt this model is currently being researched and is acceptable.
Here is the sample DDL.
Predefined words and score:
CREATE TABLE #Words
Id int identity(1,1) primary key
, Word char(10)
, Score int) ;
, ('Awesome', 1)
, ('Super', 1)
, ('Bad', -1)
, ('Fail', -1)
, ('Dirty', -1) ;
CREATE TABLE #Text
(Id int identity(1,1) primary key
, [Text] varchar(140))
('New Bond movie is #awesome!')
, ('I hear dirty reviews. Product X is a fail. #fail')
, ('I am neutral!!!')
CREATE TABLE #Result
([Text] varchar(140), Score int)
('New Bond movie is #awesome!',1)
, ('I hear dirty reviews. Product X is a fail. #fail',3)
, ('I am neutral!!!',0)
For example, score for 'New Bond movie is #awesome!' is 1 because after removing punctuation mark (!) word awesome matches with a word in the Words table and score is 1.
Score for 'I hear dirty reviews. Product X is a fail. #fail' = 3 because of the words dirty, fail, and fail (after removing #).
Query should be able to perform with a huge data set, approximately 100K rows.
Many thanks for your time and input.