Stopping Language Character Set Spam

Neo · February 15, 2003, 2:03pm

I have noticed a significate decrease in spam by prodmail filtering on the language character sets that I don't read.

Nothing against our fine friends in Japan, China, Korea and all over the world who use different languages, but I now find that over 50% of my mountains of spam daily is unreadable language sets.

Here is my current set of procmail recipes for this problem that is working well based on language sets I cannot and do not read. If you can read these and need them, it is easy to remove them from the recipe:

# PARTIAL LIST OF CHARSET SPAM
#big5   - chinese
#gb2312 - chinese
#koi8-r     - cyrillic
#iso-8859-2 - Latin-2 f�r Eastern Europe
#iso-ir-111 - cyrillic (ECMA)
#iso-8859-5 - cyrillic
#euc-kr         - korean
#ks_c_5601-1987 - korean
#iso-2022-kr    - korean
#euc-jp         - japanese
#iso-2022-jp    - japanese

:0:
* charset.*ks_c_5601|euc-kr|3Deuc-kr|euc-kr|big5|gb2312|utf-8|koi8|iso-ir-111
charset_spam

:0:
* charset.*iso-8859-[2-8]|euc-jp|iso-2022|windows-125
charset_spam


:0:
* charset.*shift_jis|x-johab|x-unified-hangul
charset_spam


:0:
* charset.*cn-gb|cn-big5|utf-8|x-euc-tw|iso_2022_cn
charset_spam

:0:
* ^Subject:*ks_c_5601-1987|euc-kr|3Deuc-kr|euc-kr|big5|gb2312|utf-8
charset_spam

I recreated the simple filters above by looking at lots of spam and also visiting sites that list charsets:

http://www.terena.nl/library/multiling/ml-docs/wincharsets.html

Anyone have any ideas or suggestions for improving these recipes? They are working OK and I'm refining them daily..... Neo

Neo · February 16, 2003, 9:07pm

Lastest set of these,, working great....

:0
* charset.*ks_c_5601|euc-kr|3Deuc-kr|euc-kr|big5|gb2312|koi8|iso-ir-111
charset_spam

:0
* charset.*iso-8859-[2-8]|euc-jp|iso-2022|windows-125
charset_spam


:0
* charset.*shift_jis|x-johab|x-unified-hangul|3Dgb2312
charset_spam


:0
* charset.*cn-gb|cn-big5|utf-8|x-euc-tw|iso_2022_cn
chinese_charset_spam

:0
* ^Subject:*ks_c_5601-1987|euc-kr|3Deuc-kr|euc-kr|big5|gb2312|utf-8
charset_spam