  |  |  | mail-scripts (dspam learning wrapper) bash script to ease procmail/IMAP/dspam
integration |
|  |
 |
  | SectionsREADMESPECIAL THANKS TO John Seifarth FOR HIS SUPPORT.
What is mail-scripts ? -----------------------
This package holds the 'dspam-learn' bash script that will help greatly
using dspam learning by looking at mailbox directories rather than a forwarding method. This is much as you could have set it with Spamassassin and the sa-learn.
I had terrible experiences installing dspam, as I wanted to set it a way
that doesn't seem to be thought for : I wanted to use it only through procmail recipes. This appear to me much more logical and less messy than the
default installation.
Actually, dspam can mark mail through recipes in procmail pretty much as spamassassin does : by adding a special line in the header of the mail
where it puts its conclusion on the mail : is it spam or not ?. But dspam
learning part involves (in dspam official documentation) forwarding mails to other
mail boxes (which create lots of mail-boxes which I didn't want to... feeling
this quite messy).
What I've had understood of spam filtering and learning was really clear using SpamAssassin, but dspam managed to break all the simple concepts, and introduce us to new complicated ones (as the quarantine box, or the spam/false-positive distinction more tricky than the spam/non-spam distinction) where it seems there is no need to such concepts.
Do I need mail-scriptss ? -------------------------
Only on very special cases :
Required :
- You want to use dspam for spam detection. - You are using procmail as MUA. - You have mail "directories" where mail are supposedly classified. - You have/use (or you are willing to create) mail "dir(s)" as : - spam boxes : which purpose is to contain SPAM - ham boxes : which contain non SPAM mail - You want to trigger dspam learning depending on where mails are
classified in your mail directory structure. Letting the user teaching dspam by
moving mail along the directories.
NOTE : THIS MEAN THE USER MUST HAVE AN IMAP ACCESS TO HIS MAIL ACCOUNT. - It worked on courier IMAP / procmail / spamassassin / dspam combination - It is compatible with mbox files (all mails in one file) or standard
maildir (with new/ cur/ tmp/ internal representations).
Not required but possible :
- You prefer using dspam invocation in procmail recipes. - You have spamassassin, or other automatic spam detection that move mail around in your IMAP dirs and want dspam to learn from these other method. - You would like that learning phase do not alter (delete or move)
mail... - Or on contrary, you would like that after learning phase, mail are
deleted.
What does mail-scriptss ? -------------------------
mail-scriptss holds the dspam-learn script, and that's all for now. :)
This dspam-learn scripts wraps the dspam binary to automagically learn
what is spam and what isn't spam from where you actually classified your mail in
your mail dirs.
It will ensure that you won't feed dspam two times with the same SPAM by storing MD5 of each spam already fed to dspam. It'll call dspam with the correct arguments whether the mail was previously marked by dspam
correctly or not.
This means that if dspam do not catch all your SPAM, you'll only have to
move the missed spam in a "SPAM" directory in your IMAP structure...
Inversely, if dspam marked wrongly an email as spam, you'll have to move the mail in
the proper directory (that contains no spam) to feed the mail as
"false-positive".
You could also do a "copy" of the mail you want to feed dspam in 2 IMAP
dirs : one for SPAM and one for false-positives. But this seems more confusing
for me.
How does it work ? ------------------
It simply parses all your mail in the directory specified in the
configuration file. When it finds a mail, it checks that it wasn't already fed to dspam by looking for its md5 in its list. It checks also that dspam hasn't
already marked the mail as spam if it must be taught as being spam or "Innocent"
if it must be taught as Innocent.
If there's no mark, it'll send it as a "corpus" mail. If it is marked,
it'll send it as classification error with the "--spam" or "--false-positive" arguments.
You can safely launch several time dspam-learn. The MD5 list ensure that
you won't teach the same mail two times.
How do I use it ? -----------------
You must install it correctly (this involves setting up a correct config
file), see the installation section that follows this one.
Then you'll have to launch it :
# dspam-learn
That's all. It'll use the configuration file to fetch its informations.
This will help if you want to use it as a cron job.
Calling dspam-learn will feed dspam with all message that weren't already
fed and thus upgrade dspam experiences with your mail found where you told it to look in the configuration file to find ham(non-spam) and spam.
You can notice that it uses heavily pretty ASCII colors, that are not
pretty at all actually in mail output (as cron could send to you).
You can deactivate ascii colors by setting the environnement variable 'ascii_color' to "no" by doing for example :
# export ascii_color=no # dspam-learn
Or shorter :
# ascii_color=no dspam-learn
That can fit neatly in your cron job.
How do I install it ? ---------------------
This is a GNU packages, so a simple :
# ./configure && make && make install
should do the trick. It'll install a single dspam-learn script.
Next, you should take a look in the source package at
src/sample/dspam-learn.rc
which is a good commented template for creating a correct configuration
file.
The configuration file is supposed to be found in
"~/.dspam/dspam-learn.rc".
Note : this could be tweaked depending on your configuration. You might
even be able to do a single general "dspam-learn.rc" somewhere else. Just look at the corresponding section.
So you could :
$ cp src/sample/dspam-learn.rc ~/.dspam/dspam-learn.rc
Note : this command assume that your current working directory is the
package source directory. And that you are logged in as the destination user that will use dspam.
and edit ~/.dspam/dspam-learn.rc
When finished you can launch the first dspam-learn by launching
$ dspam-learn
If you have a lots of mail to be taught to dspam, this could take time.
What procmail rules should I use ? ----------------------------------
With this config, you should make all your mail pass through dspam
without interfering with the delivery : dspam should then be called in top of
your procmailrc :
:0fw: dspam.lock * < 256000 | dspam --user username --stdout
You should read attentively the configuration of dspam and the
--deliver-spam and --deliver-fp at runtime. They might be of use as you wich that
innocent mail AND spam must be delivered.
When you feel that dspam as a good experience (by looking to headers and looking if it marks correctly Innocent and Spam well or by launching dspam-learn and looking at output) you can add a rule to delete spam or
as this example, to move spam detected by dspam to a special dir :
:0 H: * $ ^(X-DSPAM-Result: Spam) ${MAILDIR}/.SPAM.dspam/
This is for maildir format (mails are in separate files in a folder). Or
:0 H: * $ ^(X-DSPAM-Result: Spam) ${MAILDIR}/spam
This is for mbox format (mails are concatenated in same file).
Note: (MAILDIR var must have been defined before to use these rule).
Can I use SpamAssassin AND Dspam ? ----------------------------------
Yes of course. In fact, this seems a good way to teach dspam in the
beginning, and SpamAssassin uses a totaly different way of spam detection (in
exception of the bayesian system).
I use SpamAssassin to automatically move mail rated with more than 8.0
points to my spam dir. And when my cron job launches the dspam-learn, these mail
are checked and learned if dspam didn't spot them as spam.
I think this is a great combination. I actually have less than one spam a
day managing to get thru the two filters, this out of 100-200 spam a day.
Spam that goes thrue are very special : usually viruses (labeled "your file"), or
empty mails.
I HAVN'T THOUGHT OF SPAMASSASSIN AND DSPAM MISLEADING THEMSELF WITH THEIR MARKS.
WHAT CAN BE TAKEN FOR SURE, IS THAT THESE SYSTEM ARE REALLY WORKING WELL
ON MY CURRENT SYSTEM, AND COULD POSSIBLY BE EXPLAINED BY THE "LEARNING"
ALGORITHMS OF DSPAM AND THIS COULD EVEN PRODUCE BETTER RESULTS BY JOINING THE
QUALITIES OF EACH FILTERS.
TO MAKE A CONCLUSION, AN EXHAUSTIVE TEST REMAINS NECESSARY. If you've
come along some tests on this topic, please drop me a mail about this.
How must I set up dspam / procmail for a proper installation ? --------------------------------------------------------------
Go for : http://splodge.fluff.org/docs/dspam-for-sa-users Which speaks of dspam/procmail integration for non-IMAP integration, but
a great part of info found there applies also in IMAP config.
The ascii-colors display annoys me ! can I remove it ? ------------------------------------------------------
Yep, this is new in the version 0.0.2 . You can just set the shell
variable 'ascii_color' to 'no'. So this could be a correct call :
ascii_color=no dspam-learn
This is highly recommended if the output must be mailed, as it could be
in when called by a cron job. Or if you want a clean log by forwarding the output to a logfile.
Why do you make sure that a mail isn't taught two times to dspam ? ------------------------------------------------------------------
I've received mail stating that it wasn't usefull to check that a mail
isn't taught twice to dspam. Here's my answer :
It is clearly stated in the README of dspam that teaching dspam until the mail is correctly filtered could lead to strange behavior. So this was a reason why I did the MD5 check stuff. And this was really helpfull when I first launched dspam-learn on my thousands of spam to learn : I had to
cancel it several times, so dspam-learn had to look again to each mail, and
didn't reteach them to dspam-learn. This provide a stability measure in fact :
you can Ctrl-C or kill dspam-learn when you want, it'll resume it's job
without any drawbacks.
There's a dspam configure time option to force mail in the bayesian
engine until it is correctly filtered. If you want this behavior to occur this might
be a solution.
I have question, can I mail you ? ---------------------------------
Of course, i'll try to reply quickly. Here's my email :
<vaab@free.fr>
I found a bug, or to modify the script... what should I do ? ------------------------------------------------------------
Contact me at <vaab@free.fr>.
I have installed vlfs-shlib, is dspam-learn using them ? --------------------------------------------------------
Yes, these are included by default statically, but if you have installed vlfs-shlib, you could do :
# shlib d dspam-learn
this will greatly reduce the size of the script and its readability.
What is this vlfs-shlib all about ? -----------------------------------
These are shell libraries i'm using quite often. Look for the package "vlfs-shlib", there's some info.
YOU DO NOT NEED vlfs-shlib TO USE/INSTALL mail-scriptss. The libraries
used by dspam-learn are included by default in the shell script.
In some aspect, you could see this as if the "vlfs-shlibs" were linked statically in mail-scripts...
How do I modify the default location of the config script ? -----------------------------------------------------------
you can easily change at run-time the default location of the script by specifying your path to configuration file in the environnement variable DSPAMLEARN_RC.
You could set for example :
# export DSPAMLEARN_RC="/etc/mail/dspam-learn.rc" # dspam-learn
This could offer the possibility to use on general config file. or shorter :
# DSPAMLEARN_RC="/etc/mail/dspam-learn.rc" dspam-learn
You can also modify the defaults in the bash script. (I'll think of a
configure time option in next releases).
And at last, you could specify several location separated by spaces in
the DSPAMLEARN_RC, the first file found will be used.
So you could :
# export DSPAMLEARN_RC="~/.dspam/dspam-learn.rc /etc/mail/dspam-learn.rc" # dspam-learn
or shorter :
# DSPAMLEARN_RC="~/.dspam/dspam-learn.rc /etc/mail/dspam-learn.rc"
dspam-learn
Note : All the configuration file are read if present. They are read in
the order they are listed in DSPAMLEARN_RC. For multiple option definitions,
only the first definition will work. (This rules do not work for 'hambox' and 'spambox' options which will concatenate all values found).
Hint : You could set DSPAMLEARN_RC with global file and a local file. The global will have the defaults. And the local leave the user free to
redefine locally some variables. In this case you'll have to set DSPAMLEARN_RC
with first the local file (ie : ~/.dspam-learn.rc) and last the server wide
config file (ie : /etc/mail/dspam-learn.rc). This would do :
# DSPAMLEARN_RC="~/.dspam-learn.rc /etc/mail/dspam-learn.rc" dspam-learn
|  |  |
|
|