As a business intelligence developer, my skills include taking the raw data, shaking it to remove all the junk, and presenting it to business users in the most sexiest way possible – smart people often use the words ETL, and BI to define this process, but I’m not smart – I neither have glasses nor a French beard. ^_^
Since I missed the Pass Summit 2012, I closely watched twitter feed for #sqlpass. Oh boy! That was amazing! There are believers (#PASSPrayers), first timers (#sqlfirsttimers), karaoke’rs (#sqlkaraoke), repeating offenders (Alumni), runners (#sqlrun) to name a few, making This a perfect family (#sqlfamily).
|OMG! So close.|
Then, I put my skills to work. Through out the summit I collected tweets that were coming in with #sqlpass and other hashtags mentioned above, imported them to a database, and created the following dashboard.
See PASS Summit 2012 as it happened on twitter. Click the image below to interact (opens in a new window).
Charts on the top show the top tweeters and the busiest hours. Times are shown in Pacific Standard Time. Clearly, people tweeted more during the keynote sessions and there are cool people in the top tweeters list.
When you select a tweeter, all views in the dashboard are filtered for that particular user. For example, in the below image I select BrentO (Brent Ozar). You can see everything Brent tweeted and tweets over time.
Similarly, you can select a time to see everything for that hour.
Last functionality I show is searching tweets. Enter a term in the search tweets filter area in the bottom left. For example, I enter ‘Boring’ to get this:
Good news: there are only a few tweets that reflect a boring event. Bad news: there are tweets suggesting that the event (at some point) is boring. You can see when attendees felt boring. Don’t get me wrong – It’s not my intention to convey that Summit was boring. A good BI solution tells business users when and where there are problems.
Download the database backup from here. It’s a SQL Server 2012 RTM database. There are three tables:
dbo.Hashtag contains all the hashtags that were collected. HashtagID is an identity column.
dbo.Tweep is a table to store tweeters. UserID is the unique Twitter ID for a user, it’s not Identity in the database.
dbo.Tweet is used to store tweets. TweetID in this table is again an unique ID generated by Twitter. Note that TweetID alone isn’t unique in this table. Primary key is TweetID + HashtagID.
This query removes duplicates and gives you unique tweets. Change the filter conditions as you like.
USE HashtagArchive ;
, PST = DATEADD(HH,-8,CreateDate)
, RowNum = Row_Number() OVER( Partition BY tw.TweetID ORDER BY (SELECT NULL))
FROM dbo.Tweet tw
INNER JOIN dbo.Tweep tp
ON tw.UserID = tp.UserID
INNER JOIN dbo.Hashtag ht
ON ht.HashtagID = tw.HashtagID
WHERE ht.Hashtag IN ('#Summit12', '#SqlPass','#Sqlrun','#sqlkaraoke','#sqlfamily','#ssisprecon','#PASSPrayers')
AND CreateDate > '2012-11-07 08:00:00.000'
, [Date] = CAST(PST AS DATE)
, [Time] = CONVERT(VARCHAR(8),PST,108)
, Hr = DATEPART(HH,PST)
FROM cte WHERE RowNum = 1
ORDER BY CreateDate ;
There is some noise with the #FirstTimers hashtag. I didn’t think young US citizens who voted for the first time would be on Twitter, strange! Make sure to avoid that and you’ve an archive of Twitter database for PASS Summit 2012.
Too much to ask, I know! If you enjoyed this post, follow me on twitter (@SamuelVanga) – I’d love to get to know you. And leave a comment with what you discover from this dashboard.
SQL Saturday US Map – Posted 08/03/2012
Chris Webb (@Technitrain on Twitter), MVP and BI Expert posted an example using NodeXL to analyze tweets. Check it out – It’s amazing!