SQLServerCentral Article

Queries for SSC Community Participation


I recently viewed a neat TED talk entitled "Clay Shirky on Institutions vs. Collaboration." Clay Shirky, a "Social Media Theorist" has lots of interesting things to say about the effects of social media which would include our community here at SQLServerCentral. So, I decided I'd use T-SQL to glean some insights on the collaboration going on here.

Shirky suggests that institutions like corporations are inherently self-limiting. That's because they tend to only engage people to full-time positions and focus on retaining top performers in terms of output. That tends to exclude the contributions of people who would contribute less than full-time output to the cause. These people are either removed or excluded from the get-go.

Now compare that with unconstrained social systems like our on-line community where people are free to contribute as little or as much as they want. Now all of a sudden the community gains access to the "long tail" of output - that is, if you look on a graph on how many people contribute one or two articles, the list is long - and it's this productivity that tends to elude more traditioanl institutions. The "long tail" is part of the magic of a community like ours.

Now fortunately, SQLServerCental shows a page where one can see the list of authors. From this page, one can look at the number of articles each author has written. The list of authors with one or two articles written is pretty long. So, from here we can see the evidence of collaborative magic happening. But I wanted to take a closer look - so I pulled in the article contributors information from this page into SQL Server so I could run some queries.

First, we need a table to hold our author contribution rows:

/* Table CommunityArticleCounts to hold author contribution rows */IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[CommunityArticleCounts]') AND type in (N'U'))
CREATE TABLE [dbo].[CommunityArticleCounts](
[Member] [varchar](40) NOT NULL
,[ArticleCount] [smallint] NOT NULL
,[MemberType] CHAR(1) CONSTRAINT [CK_MemberTypeValues] CHECK (([MemberType]='E' OR [MemberType]='C')) NOT NULL
-- Note there are two member types: 'E' for Employee and 'C' for Community Member.

I pulled the page contents into a text editor, and with a little conditioning and a with a few regular expressions, I turned lines such as "Bill Nicolich (Last post:2010/05/25 / Number of articles: 4)" into INSERT statements.

/* Partial list of INSERT statements. See the attached script for full list */INSERT INTO CommunityArticleCounts (Member, ArticleCount,MemberType) VALUES ('Steve Jones',1158,'E')
INSERT INTO CommunityArticleCounts (Member, ArticleCount,MemberType) VALUES ('Andy Warren',260,'E') ... INSERT INTO CommunityArticleCounts (Member, ArticleCount,MemberType) VALUES ('Bill Nicolich',4,'C')
INSERT INTO CommunityArticleCounts (Member, ArticleCount,MemberType) VALUES ('James Greaves',4,'C') ...

Now we're ready to run some T-SQL queries and look at some basic stats.

/* Total Number of Articles */SELECT SUM(ArticleCount)
FROM dbo.CommunityArticleCounts
-- 3813 on 11/10/2010

Let's look at the percentage of articles done by "employees" - or people listed as columnists:

/* Percentage of articles by "employees" - those that are listed as columnists and also the contributor named "Press Release". */SELECT CAST(
CAST((SELECT SUM(ArticleCount) FROM dbo.CommunityArticleCounts WHERE MemberType='E') AS DECIMAL) /
CAST((SELECT SUM(ArticleCount) FROM dbo.CommunityArticleCounts) AS DECIMAL) AS DECIMAL(4,2))
-- Employees contributed 55% of the articles.

How many organizations can say that 45% of their output comes from volunteers? That's pretty good. Now, Steve Jones, who accounts for 30% of the total, may be too agrieved of his workload to cheer the volunteers much - but since occasionally a guest columnist gives him a break, that consolation will have to do for now.

/* Steve's output */SELECT CAST(
CAST((SELECT SUM(ArticleCount) FROM dbo.CommunityArticleCounts WHERE Member='Steve Jones') AS DECIMAL) /
CAST((SELECT SUM(ArticleCount) FROM dbo.CommunityArticleCounts) AS DECIMAL) AS DECIMAL(4,2));

Now we have the Pareto principle, also known as the "80-20 rule" or "the law of the vital few" or the "principle of factor sparsity" that comes from observing many events where roughly 80% of the effects come from 20% of the causes. In this case, we'd predict that around 80% of the articles come from the top 20% of contributors. Let's see if that holds true.

/* Percentage of articles from top 20% contributors */SELECT CAST(
CAST((SELECT SUM(ArticleCount) FROM dbo.CommunityArticleCounts WHERE Member IN (SELECT TOP 20 PERCENT Member FROM dbo.CommunityArticleCounts ORDER BY ArticleCount DESC ))AS DECIMAL) /
CAST((SELECT SUM(ArticleCount) FROM dbo.CommunityArticleCounts) AS DECIMAL) AS DECIMAL(4,2));
-- 82%

Yep. We're right on track with the Pareto principle. Tweak that query to look at the top 10% contributors account for 73% of the articles. Now an institution would take that 73% output as the best it can do because it needs bang for the buck on all the costs that are involved in keeping tabs on contributors and maintaining the institution - and so for those who contribute less, the costs don't justify the inclusion.

But for a community like ours, we can reach that extra 27% and by so doing, we're allowing people to engage some of their "spare brainpower" or "cognitive surplus" for a worthwhile cause.

One of the takeaways is that a community shouldn't look at the large number of people who contribute only one or so articles and be sad about it - because that's part of the magic of an unconstrained social system. A typical institution wouldn't get those contributions at all. The question should be, how can we empower and engage more people to contribute an article or two - who might thereby satisfy their sense of purpose and community feeling?

For more from Clay Shirkey, read Cognitive Surplus: Creativity and Generosity in a Connected Age



4.86 (7)




4.86 (7)