Antispam Level TWO
In this protection level, we pratically have all the necessary informations for bring controls on the message.
Caronte Antispam analizes the content of the message with the help of a database of reference , populated by you.
As it can be noticed in the following picture, there are four containers where the message can be found after the bayes filter has analized it.
Clearly, these containers are populated only while used and can't be manually populated.
The four containers are:
- The soul of the message considered to LEARN
- The soul of the message considered SPAM
- The soul of the message considered HAM
- The soul of the message considered NEUTRAL
The soul of the message is a symbolic refer, it indicates that what you see is only an extract of the message that is transited inside the filter.
Using the four buttons you can tell the filter to correct any errors or have it learn the message that was considered "Neutral" or "LEARN"..
This mechanism is ruled from the percentual of the “occurences” found after the analysis, higher will be the number of occurences that must be found to consider the message as “SPAM” or “HAM”, more accurate will be the filter,conversely less occurences we will ask to the filter, higher will be the problem of false positives.
We remember that it's not possible to populate manually the database, but it is populated only with the use of the software and with your help.
For start in populate a database in the correct mode, regulate the setting of the occurances between “20%” and “30%” and raise of one point at time at the first false positive of false negative.
Indeed at the start the filter will work little, but this is the price that must be paid for a good bayesan database.
Another rule that must be reminded is ,don't send the message to SPAM or HAM if this is already considered HAM or SPAM. (use the delete function if you dont' want to see them)
The containers HAM or SPAM are needed to correct the false positives or negatives and not for AGGRAVATE the situation of a message already considered SPAM or HAM from your database.
The messages to be learn will be found in the container to “LEARN” or in the container “NEUTRAL”
The “look” button makes a count in real time of the WORDs in the database, both HAM and SPAM. This permits of assess the unbalance both before and after the “LEARN” of the message.
After reaching a satisfactory outcome of the analysis of the messages from the filter, the setting as "Keep the soul of quarantined messages ..." in containers for a certain time, allows Caronte Antispam to avoid making too much and you to don't erase messages manually, both in terms of disk space and RAM used and time management. We therefore reccomend setting the value of “auto delete” the message REMAINED UNRECORDED by the system administrator to less than three days, making so the queue of the messages ,to sift in the four containers, will not be excessive and in a case of false positive or negative, you will always have a period for correct such error.
Some considerations on the BAYES filter.
The algorithm used by Caronte Antispam is the “Montecarlo Bayes”. besides the words of the message, inside the database will go also some little “signatures” of the attachments of the email. This approach permits us to identify (on a populated database), PDF spam, image spam, PDF incapsulated images, spam with Microsoft WORD documents, all without the help of a OCR software. The method can look like as an Antivirus control, analizing the files in BYTE to BYTE, but it applies an Heuristic method combined with the occurrences of the signature of the message, we have much more defined and bigger blocks.
Considerations and usefull suggestions for the BAYES filter.
This type of feature is been tested by us for long time, we have populated also some databases with more than 20 milions of words.
In Caronte Antispam is not precluded the elaboration speed even if is reached such big number, this can be because the database is loaded and allocated in the RAM with binary index.
The analysis of the message with the formula “W x WDB” (Word * Word in DataBase) is made in less than one second also if this formula generates 200 milions of controls per word.
The only negative note is the RAM factor, with a database of 20 milions of words, this filter can claim 350 MB of RAM allocated on your server.
As there is an unique database for all your users, this will contain inside also those signatures of messages that some users want and others don't.
This type of problem can be solved with a value above 60% of the occurences.
For speed up a preparation of a good database, you can disable for the due time the GREYLIST and every other REJECT of the message.
Thanks to a third part library for the menagement of the “Regular Expression”, Caronte Antispam makes available these macros and scores that can be applied on certain blocks of the message.
For example if we would block or better say "quote" with a score an incoming message that has inside the “Subject” the word “Viagra” or its derivate of spam, we can make a rule like this:
rule name : VIAGRA
score : 5.000
where look for : Subject
IE. RE : (viagra)|(\\\/iagra)
IE. RE. cmd : is
The ANSI used for the R.E. is the one used in PERL, it needs to have a little of knowledge for make a good antispam R.E.,but this don't want says that with a little practice and with the help of the TEST , the goal can't be reached.
The score can be also negative, so green line to fantasy also for the “REVERSE HAM”,that can be your sign, a telephone number or a password.
The “where look for” this R.E. can be also BODY or RAWBODY.
The difference between BODY and RAWBODY is that, in the second one it's considered all the message including HEADER and ending attachments like FULLTEXT, while in the BODY is considered only the body of the message.
We have heard many times of SpamTrap or Honeypot, let's try to understand what are they and how they can be used against the spammers.
Fundamentally a “trap” email address or “honeypot” is a mailbox of your local domain, exclusively created for make a trap. Normally this email address is inserted inside your web pages, and hidden to the users but it is visible only to particular crawling software, that are looking for email address to insert in their lists, that will be used in future for spam.
In this phase Caronte Antispam does nothing less than announces to the “Dante Community”(read in later), that has been primed a SPAM TRAP for a determined IP address and for a MAILFROM, in such way to share such experience with all the Community.
This filter has the only goal to “announce” to the community the experience made.
Caronte Antispam will answer with an irreversible error “550 5.7.1 and a little part of the Divina Commedia” (try it) .
This feature has sense of existing only if parallel has been activated the “Dante Community” in Caronte Antispam.
Suggestions on the spam trap:
An email spam trap could be also one of that particular emails left in newsgroup, blog or other, where for reply to a the message occurs remove determinated words.
Example:
Reply to: dante_REMOVEME_@caronteantispam.com
A software can't understand that it has to remove the word “_REMOVEME_” for send you a message,
accordingly this particular email could become a spam trap email.
Other indication is, put a spam trap email with other good emails, for example in the contacts page of your website.
Clearly this must be hidden to the user with a simple html trick: <a href=”mailto:dante[itwityou]@caronteantispam.com”></a>
For example , examine the HTML of our contacts page.