Header Scan Options

Top  Previous  Next

AgGw000013

 

The header scan options are performed after the message has been received but not yet saved to disk, and before an acknowledgment is sent to the remote server that the message was received successfully. This gives us an opportunity to analyze the message structure and return a 550 refused response to the sender to let them know that the message was not excepted.

 

If the header has unbalanced CR and LF pairs add xx points: When a message is received, the message headers and the body of the message are together in one block of data. A single blank line separates the two section. Because headers can not contain blank lines, email programs search for the first blank line in the raw message and separate the headers from the message body that this location. Every complete line in the message MUST end with a carriage return character (CR) and a line feed character (LF). This character pair is called a CRLF. Because these are always used in pairs, if the number of CR and LF characters is not the same, it is likely that the message was created by some poorly designed software, usually spammer software. This is sometimes also used as a mechanism to trick email programs into executing some macro or executable code on a users computer.

 

If the header has no TO address add xx points: The TO header field is not strictly required, however many poorly programmed spam products omit this field.

 

If the MAIL FROM and header FROM similarity is less than xx percent add xx points: Ideally, the MAIL FROM and header FROM field would be the same.  Unfortunately this does not always happen, however in many cases they are similar. What this test does is to test the similarity of the MAIL FROM and  header FROM field values. We have found through testing that the similarity of less than 40% is a good level to add penalty points at. Anything higher than 40% should probably not be penalized however anything under 40% is more suspicious. This is not a particularly good indicator that the message is spam, however it can be used to help accumulate points towards refusal if the message fails other tests.

 

If the header has no DATE or invalid DATE add xx points: You can add a penalty if the header DATE field is missing. Alligate also checks the date to ensure it is in an acceptable format and that the actual date is within a realistic time frame.

 

If the header has no MESSAGE-ID add xx points: The MESSAGE-ID header field is not strictly required, however it is almost always used in legitimate email.

 

If the header has no SUBJECT field add xx points: It is highly unusual for a message to not have a header SUBJECT field. Any message without one is worthy of being treated with suspicion. You can also add penalty points id the message does have a subject field, but the contents of the subject field is empty, or blank. This is less unusual, however in some cases this is worthy of some degree of suspicion.

 

If the header indicates the message is BASE64 encoded add xx points: It is highly unusual for normal email messages to specify in the headers that the message is BASE64 encoded. Spammers will often send messages with nothing but an image in the message body. In many cased these are BASE64 encoded. Messages that fail this can be treated with suspicion.

 

If message passes through x or more countries, add xx points per country. Ignore x hops. This test reads the Received portions of the header and looks up the countries that the message passed through. You can add penalties for every different country the message passes though. If necessary, you can tell Alligate to ignore the most recent hops or hops.

 

Add xx points if the message hits on the content encoding mask (Regular expression): Content encoding is specified in the header indicating that the message was composed and is intended to be viewed in a specific character set. A character set is a set of characters such as A-Z,0-9, and punctuation characters. Other languages use different character sets. If a message comes in that is encoded in a character set you think is suspicious, then points can be added for possible refusal. The regular expression for this is fairly simple. You just list the character set names and separate them by a | (pipe) character. In regex parlance, this means "or".

 

Example:

 

koi8|Windows-1251|ISO-2022|big5|iso-8859-2|iso-8859-6|iso-8859-7|iso-8859-8|iso-8859-9

 

See also: Common Character Sets

 

Add xx points  if the message hits on the subject encoding mask (Regular expression): Subject encoding is basically the same as content encoding above except that it applies only to the message subject.

 

 

Add xx points if the header has a "SPAM-FLAG: YES" field and hits on the FROM domain mask (Regular expression): Believe it or not, at least one major Internet service company lets their users send spam but tries to let you know that it is spam. If America On Line (aol.com) thinks an outgoing message is spam, they add "SPAM-FLAG: YES" to the headers. It is not recommended that this be blocked or penalized directly because the header may be included in forwarded or redirected messages. We added a regular expression qualifier check so that accidental penalties will not be assessed. Alligate will check for the existence of the "SPAM-FLAG: YES" header then check to see if the senders domain hits on the regular expression. Only if both conditions are met will the penalty be assessed.