Spam Busting in Community Server 2007 - Part 1

While spammers are eating our mailboxes, blogs and sites, we (developers) are working to fight against them and beat them.  The most annoying kind of spam is the blog spam because it shows up in the content and everyone can see it.  Therefore author of the blog has to spend (waste) his/her time to delete published spams everyday.

There have been many community reactions against comment and trackback spams both in .NET world and PHP world and there have been some results like Akismet service, Subkismet .NET library and several CAPTCHA controls to answer to the needs of users.  Anyhow, none of these methods (even all of them together) can stop spammers from spamming but choosing a good strategy can help you keep your blog clean and spend your time on playing Halo 3 rather than deleting stupid blog spams!!

Based on the blogging tool that you use, there are some options available to you in order to fight against spammers!  However, if you're looking for some general guides on spam busting, Phil has an excellent set of blog posts about different ways to fight against blog and email spams!

But here (and in an upcoming post) I just want to share my personal experience with spam busting in Community Server as a detailed guide to let other users of this great platform keep their blogs clean.  My main focus is on Community Server 2007 even though you can apply these techniques to prior versions as well.  I know there are some resources here and there about spam blocking in Community Server but I think having a general guide based on real experiences would be very helpful for all users.  Recently I almost had a clean blog by applying these techniques.

Before talking about these ways, you need to download two third party components that help you a lot:

The last point that I'd like to mention is about normal CAPTCHA controls.  You may say that I use a CAPTCHA to stop spammers but there are some drawbacks with these controls.  One important point is they're not user friendly and in some cases may not display correctly and stop users from leaving their thoughts.  The other point is although they can block many spams but however, there are some ways to pass some CAPTCHA controls finally.

Background

Before stepping in a discussion about spam busting techniques, let me give an introduction to spam blocking features in Community Server 2007 for those who don't know about this.

Community Server offers a spam blocking mechanism as a part of administration features for the site under Spam Blocker name.  Site wide configuration for your Community Server instance can be done via http://Site.Com/controlpanel/tools/ManageSpamRules.aspx URL.

Spam Blocker Page

In Community Server you can write spam rules in order to block spams.  A spam rule is an implementation of an abstract base class that implements your own logic to check an incoming comment or trackback and validate it.  I had a CS Dev Guide post about writing a spam rule.

A spam rule assigns an integer value to an incoming feedback item then you can configure your site via Control Panel to mark a feedback as Possible Spam or Spam based on two integer factors.

Spam Blocker Settings

Either Possible Spams or Spams won't be published to your blog directly and all of them need your moderation to get published.  Setting these two factors relies heavily on your own experiments and your blog statistics so I can't give you constant values for these because depending on the factors that you set for each spam rule and what you need to get from the spam blocker, these values need to change.

However, I can recommend you to set two values that you can be sure they reflect these three general statements:

You may need to test different factors for a while to get a good result but after finding your factors, you occasionally need to change them.

On the other hand, Community Server has a task (CSJob) that runs on a regular basis and deletes all Spam items.  This task is called CommunityServer.Blogs.Components.DeleteStaleSpamCommentsJob.  You can configure this task to run everyday or any other regular basis.  You also have to check your jobs report page to make sure that this job (along other jobs) are running successfully to avoid further problems.  This task guarantees that spam items don't increase your database size after a while.

After installing Telligent Spam Rules for Community Server 2007, you should see them in the Spam Blocker page.  To enable a spam rule, you must check its checkbox then click on Save button to apply your changes.  Each spam rule has a Configure button to set it up.  Moreover, you can override spam rules in your blogs.  Also some spam rules are related to a specific application like blog or forum.

Here I want to write something as sidebar note: sometimes we hear that some people say Community Server is a great platform and an enterprise application but is not something suitable for a single user blogging!  Although I can agree with them about some points but as you see, using such a great platform has an advantage and you can get benefits of these powerful spam blocking features in Community Server.  I haven't see such a great mechanism in other blogging tools yet!

I talk about each spam rule to show you how to configure it.

Forbidden Word

A forbidden word is a word that shouldn't occur in a post.  If it occurs then you need to assign an integer factor to the post.  For example, "TramadolDog" is a word that occurs in spam posts only.  So you can assign an integer for each forbidden word in a post and add a semicolon delimited list of forbidden words to the rule.

Forbidden Word

Bad Word Count

A bad word is a word that may be harmful and large number of its occurrences is a sign of spam.  For example, "S-h-e-m-a-l-e" (without dashes!) is a word that if occurs frequently in a post then we can consider it as a spam.  Bad Word Count spam rule gets an integer value as the maximum number of times that a word can appear in a post and an integer factor to assign to each occurrences of the word after this maximum value.  You can enter bad words as semicolon delimited list.

Bad Word Count

Link Count

One of main goals of spammers is to promote their own sites and bring traffic to them so try to insert their links in the post bodies (although new age spammers use a single link for the author to pass some blockers).  You can configure this rule to assign an integer factor to any link that occurs in the post if the number of links go beyond a threshold.  I strongly recommend you to set some factors for this rule to mark posts as Possible Spam because some real commenters may share some links with you and this can kick their feedbacks.

Link Count

IP Count

Another way to block spammers is to block comments from same IP addresses in an interval.  For example, a human can't send more than 3-4 real comments to a site in a minute.  This rule gets an integer factor to assign to additional posts from an IP as well as the maximum number of posts that can be sent in a specified interval (in seconds).  You can also exclude some IPs like common user IPs and local host IPs.

IP Count

Akismet

Most likely you know Akismet service as a way to check incoming comments/trackbacks and distinguish spams from hams.  Akismet spam rule (written by Thomas) is a part of Telligent spam rules for Community Server 2007 and you can set your WordPress API key and integer factor for this rule.  This is one of best ways to block blog spammers.

Akismet

Anonymous User

Obviously you can trust on your registered users and their feedbacks so can assign negative integer factor to registered users or blog administrators/owners to subtract from their factors or add some points to anonymous users as well.

Anonymous User

Comment Length

The length of a comment can be a factor to determine if it's a spam.  Nowadays, many spammers leave their comment body with some short sentences or single words like "Thanks", "Cool", "Good", "Good design" and ...  Therefore you can assign some points to short comments that are shorter than a threshold.

Comment Length

Email

The last spam rule that I want to cover in this post is the Email spam rule that checks for the existence of email addresses in a post body and assigns points to each occurrences of email addresses.  Sometimes spammers insert their email addresses in the post body to let people contact them for what they're advertising.

Email

In the second part I'll cover other Telligent spam rules as well as some ways to use Subkismet to block spammers in Community Server 2007.

[advertisement] Axosoft OnTime 2008 is four developer tools in one: bug tracking, project wiki, feature management, and help desk. It manages your development process so developers can focus on coding. Installed or Hosted – Free Single-user license -- Free 30-day team trial.

3 Comments : 10.03.07

Feedbacks

 avatar
#1
Dave Burke
10.03.2007 @ 4:53 PM
blog bits Jose tells us two developers are apparently hired by Telligent to work on CS Core. That's
 avatar
#2
Dave Burke's Community Server Bits
10.06.2007 @ 5:52 PM
Excellent coverage of spam prevention approaches in Community Server from Keyvan. Lots of screenshots
 avatar
#3
Keyvan Nayyeri
10.07.2007 @ 12:36 PM
In the first post I gave an introduction and outlined eights spam rules to fight against spammers in

Leave a Comment