Installing DSPAM filtering on Dreamhost

Introduction

I'm writing this guide to document how I went about installing DSPAM on the servers of the fine folks over at Dreamhost, but the same procedure should work on most other hosts with minimal adjustment. Please be sure to take at least a quick glance over the entire document before proceeding with your installation. Do not attempt this process unless you are fairly comfortable with the procedures outlined; while you shouldn't have any trouble if you follow the instructions, there is a chance something may go wrong, in which case some tinkering is in order.

Important note: Setting this software up REQUIRES that you have a full blown shell account on a Dreamhost server, which you can access via ssh or telnet. IT WILL NOT WORK with FTP-only or m1234567 (mail-only) accounts.

Just a quick note on formatting:

Italics

These are paths to directories.

$ echo 'DSPAM is great'

Text that indicates a command or code listing. Do not type the "$", because that is the shell prompt.

--password=password

Replace the text with information relevant to your installation.

Installation

After getting a shell prompt on your server, using ssh or telnet, we can begin. With these commands, we will create a folder to put the source in and then fetch the latest version (3.0.0 as of 7/5/04).

$ mkdir ~/src
$ cd ~/src
$ wget http://www.nuclearelephant.com/projects/dspam/sources/dspam-3.0.0.tar.gz
$ tar -xzf dspam-3.0.0.tar.gz
$ cd dspam-3.0.0

Now we have to run configure in preparation for compiling the source. This may take a bit, and you will see a lot of output, but as long as it doesn't exit with an error don't worry about any of it.

$ ./configure --with-dspam-home=$HOME/.dspam \
--with-userdir-owner=none --with-userdir-group=none \
--with-dspam-mode=none --with-dspam-owner=none \
--prefix=$HOME/usr --enable-delivery-to-stdout \
--with-mysql-includes=/usr/include/mysql \
--with-mysql-libraries=/usr/lib --with-storage-driver=mysql_drv \
--disable-user-logging --disable-system-logging

These are just a few of the flags you can use, but they should be all you need to get a working install. Remove --disable-user-logging and --disable-system-logging if you want to keep logs. The reason I chose to disable them was because they have the potental to balloon in size. You can see the rest of the options by running ./configure --help or reading the relevant parts of the README. In addition, pay attention to this note later in the guide.

Time to compile and install the binaries. Again, you don't have to worry about any of the output.

$ make && make install

If you don't already have ~/usr/bin in your path, you need to add it. Assuming you are using the default bash shell, run these commands.

$ echo 'export PATH=$HOME/usr/bin:$PATH' >> ~/.profile
$ source ~/.profile

DSPAM is now installed. Let's get to work configuring and passing mail through it.

Setup

Create ~/.dspam and a few needed files.

$ mkdir ~/.dspam
$ touch ~/.dspam/untrusted.mailer_args
$ echo $USER > ~/.dspam/trusted.users
$ cp ~/src/tools.mysql_drv/purge.sql ~/.dspam

When we compiled DSPAM, we instructed it to use MySQL to store mail tokens and the other information it uses to classify spam. We need to create a database for that information now. If you'd rather use a preexisting DB, user and table, skip the next paragraph.

Go to this tab in the Dreamhost panel, and click "Add New Database". Enter a name for the table and continue. Choose your plan. When you are back at the main MySQL tab, click on "Add User". Give the DSPAM user an appropriate name and a password. When you finish setting the user up, set up a hostname for the database.

Once you have a database, hostname and user set up, you have to tell DSPAM how to connect to it. Using your favorite editor, edit ~/.dspam/mysql.data. I will use nano for demonstration's sake.

$ nano ~/.dspam/mysql.data

Put the following text in the file, remembering to replace text in bold italics with the information about YOUR MySQL user, table and host.

mysql.host.tld
3306
mysql_username
mysql_user_password
table_name

Hit Ctrl-X, type "Y" when nano asks "Save modified buffer (ANSWERING "No" WILL DESTROY CHANGES) ?" and hit Enter to confirm the use of ~/.dspam/mysql.data as the filename. Now run the following to load the DB schema DSPAM needs. Here it is broken with a backslash for display purposes.

$ mysql --host=mysql.host.tld --user=mysql_username --password=mysql_user_password table_name \
< ~/src/dspam-3.0.0/tools.mysql_drv/mysql_objects.sql.space.optimized

We also need to set to little cron jobs up to do routine DB and token maintenance.

$ export EDITOR='nano -w'
$ crontab -e

The file should look lke this when you are done with it. You can leave the mysql line broken or join it, doesn't matter.

0       0       *       *       *       $HOME/usr/bin/dspam_clean -p
5       0       *       *       *       mysql --host=mysql.host.tld --user=mysql_username \
--password=mysql_user_password table_name < $HOME/.dspam/purge.sql

Save and quit nano. DSPAM is now ready to be trained and have mail piped through it.

Training

To be able to differentiate between ham (legitimate mail) and spam (the nasty stuff), you must feed a sample (corpus) of each to the filter to allow it to learn what your real mail looks like and what kind of spam you recieve. The more of each you feed DSPAM, the more accurate it becomes.

The following scripts will aid in the feeding of the beast. Save then as ~/.dspam/spamfeed.sh and ~/.dspam/innocentfeed.sh respectively, and don't forget to make them executable by running chmod +x filename on each.

#!/bin/sh
#~/.dspam/spamfeed.sh

cat ${1} | dspam --mode=teft --source=corpus --class=spam --feature=chained,noise --user ${USER}
#!/bin/sh
#~/.dspam/innocentfeed.sh

cat ${1} | dspam --mode=teft --source=corpus --class=innocent --feature=chained,noise --user ${USER}

Running the next commands will allow you to train DSPAM more easily. Because DSPAM needs to train on full emails, headers included, you cannot simply forward it messages.

$ mkdir -p ~/Maildir/.spam-corpus/tmp ~/Maildir/.spam-corpus/cur ~/Maildir/.spam-corpus/new

A folder called spam-corpus should now appear in your IMAP mail client. Drag any spammy emails into this folder.

Now go back to your shell prompt and enter the following command, which passes each email you just dropped into the folder through the filter.

$ find ~/Maildir/.spam-corpus/cur/ -name "*" -exec ~/.dspam/spamfeed.sh {} \;

You can safely remove the .spam-corpus folder with either rm -rf or your mail client.

Now let's go through your Inbox and old-messages to learn what your ham looks like. Skip the old-messages part if you do not have your account setup to rotate old mail into that folder.

$ find ~/Maildir/cur/ -name "*" -exec ~/.dspam/innocentfeed.sh {} \;
$ find ~/Maildir/.old-messages/cur/ -name "*" -exec ~/.dspam/innocentfeed.sh {} \;

That should be enough initial training to get decent accuracy to start off with. I started with 700 ham and 700 spam and off the bat had 90% accuracy. With a few weeks of use the accuracy should get much better (up to 99.991%).

Starting filtering

Almost there! We need to tell Postfix, Dreamhost's MTA, to use a program called procmail as our LDA, or program that processes our mail before it hits your mailbox. Running the following command will instruct Postfix to do just that. Note that this is one of the few steps that is Dreamhost specific; normally a filename of ~/.forward is used instead of ~/.forward.postfix.

$ echo '"|/usr/bin/procmail -t"' > ~/.forward.postfix

Here you are presented with a choice as to how you want to proceed with your installation. Option 1 is very similar to the way Dreamhost currently implements Vipul's Razor filtering, whereby you drag missed spam and false positives into different IMAP folders to train the filter in the event of a miss. In option 2, I will walk you through setting up email aliases to handle the error correction. To correct an error made on a message, you would simply forward the email to a predetermined address, where DSPAM would pick it up and adjust its algorithms to avoid the error. Pick one or the other, based on personal preference really.

Option 1: drag-and-drop error correction

Important note: If you decide to use this processing scheme, consider adding --enable-signature-headers to the flags we used in the configure step during installation. This will prevent DSPAM from writing a little bit of signature data to the end of the body of each message it processes, which is not needed when using this method (however, you DO need this information to use the alternate error and false positive training method, discussed after this section). You do not have to use the flag; it's a purely optional, cosmetic step.

First, create a ~/.procmailrc file that looks like the following. Uncomment LOGFILE=$PMDIR/procmail.log to have procmail log its actions.

MAILDIR=$HOME/Maildir
PMDIR=$HOME/.procmail
#LOGFILE=$PMDIR/procmail.log
SHELL=/bin/sh

# Begin spam treatment.
:0fw
| $HOME/usr/bin/dspam  --stdout --deliver=innocent,spam --mode=teft \
--feature=chained,noise,whitelist --user $USER

:0
*^X-DSPAM-Result: Spam
.Spam/
# End spam treatment.

# inbox catch-all
:0
$HOME/Maildir/

Run the following to set up the potty-training mail folders.

$ mkdir -p ~/Maildir/.blocked-nonspam/cur ~/Maildir/.blocked-nonspam/new ~/Maildir/.blocked-nonspam/tmp
$ mkdir -p ~/Maildir/.unblocked-spam/cur ~/Maildir/.unblocked-spam/new ~/Maildir/.unblocked-spam/tmp
$ mkdir -p ~/Maildir/.Spam/cur ~/Maildir/.Spam/new ~/Maildir/.Spam/tmp

More scripts to aid in the processing of the mail going into these folders. Save these two as ~/.dspam/errorfeed.sh and ~/.dspam/falsefeed.sh respectively, remembering to chmod +x each.

#!/bin/sh
#~/.dspam/errorfeed.sh

cat ${1} | dspam --mode=teft --source=error --class=spam --user ${USER}
#!/bin/sh
#~/.dspam/falsefeed.sh

cat ${1} | dspam --mode=teft --source=error --class=innocent --user ${USER}

Now we need to set up more cron jobs to check in our recently created folders periodically to see if there are any messages to process in them.

$ crontab -e

After editting, your crontab should look like this, taking our previous additions into account.

0	0	*	*	*	$HOME/usr/bin/dspam_clean -p
5	0	*	*	*	mysql --host=mysql.host.tld --user=mysql_username \
--password=mysql_user_password table_name < $HOME/.dspam/purge.sql
28,58	*	*	*	*	find $HOME/Maildir/.unblocked-spam/cur -name "*" \
-exec $HOME/.dspam/errorfeed.sh {} \; ; mv $HOME/Maildir/.unblocked-spam/cur/ $HOME/Maildir/.Spam/cur/
0/30	*	*	*	*	find $HOME/Maildir/.blocked-nonspam/cur -name "*" \
-exec $HOME/.dspam/falsefeed.sh {} \; ; mv $HOME/Maildir/.blocked-nonspam/cur/ $HOME/Maildir/cur/

Save and exit. You now have a fully functional DSPAM installation! If spam does slip through the filter, drop it in unblocked-spam. If you find a false positive (legitimate mail the filter thinks is spam) in the Spam folder, move it to the blocked-nonspam folder. Hope you enjoy life spam-free!

Option 2: mail forwarding error correction

In this training scheme, we will set up two email aliases with which to train DSPAM when it makes a mistake. Go to this tab in the Dreamhost panel and setup two aliases, both pointing to the user that you have just set DSPAM up for. Call one "dspam-false" and the other "dspam-missed". Now, use the following as your ~/.procmailrc.

MAILDIR=$HOME/Maildir
PMDIR=$HOME/.procmail
#LOGFILE=$PMDIR/procmail.log
SHELL=/bin/sh

:0
*^TO_dspam-false
{
	:0
	| $HOME/usr/bin/dspam  --source=error --class=innocent --mode=teft --user $USER
	
	:0
	/dev/null
}

:0
*^TO_dspam-missed
{
	:0
	| $HOME/usr/bin/dspam  --source=error --class=spam --mode=teft --user $USER
	
	:0
	/dev/null
}

# Begin spam treatment.
:0fw
| $HOME/usr/bin/dspam  --stdout --deliver=innocent,spam --mode=teft \
--feature=chained,noise,whitelist --user $USER

:0
*^X-DSPAM-Result: Spam
.Spam/
# End spam treatment.

# inbox catch-all
:0
$HOME/Maildir/

Now, to train DSPAM on the event of an error, forward any emails that did not go where they were supposed to go to those email addresses (dspam-false@domain.tld for false positives and dspam-missed@domain.tld for spam that got past the filter).

You are done!

DSPAM, while not particularly easy to setup and get running, walks over its competition in spam filtering, including programs like the famed SpamAssassin. You will immediately begin to see the effects of your handy work after taking the time to get this package up.

While this will get you up and filtering mail, I have not walked you through some other things that may be of interest, including the web-interface (only really useful if you like looking at pretty graphs on your filtering statistics) and having the same install filter for multiple users on the same server. The latter is fairly easily accomplished by someone with an understanding of Unix file system permissions and slight tweaking of the procmail recipe lines.

If you have any questions on anything detailed on this page, drop me an email at rusty _%_ lalkaka.com. Replace _%_ with the obvious.


Valid XHTML 1.0!