I have noticed a significate decrease in spam by prodmail filtering on the language character sets that I don't read.
Nothing against our fine friends in Japan, China, Korea and all over the world who use different languages, but I now find that over 50% of my mountains of spam daily is unreadable language sets.
Here is my current set of procmail recipes for this problem that is working well based on language sets I cannot and do not read. If you can read these and need them, it is easy to remove them from the recipe:
# PARTIAL LIST OF CHARSET SPAM
#big5 - chinese
#gb2312 - chinese
#koi8-r - cyrillic
#iso-8859-2 - Latin-2 f�r Eastern Europe
#iso-ir-111 - cyrillic (ECMA)
#iso-8859-5 - cyrillic
#euc-kr - korean
#ks_c_5601-1987 - korean
#iso-2022-kr - korean
#euc-jp - japanese
#iso-2022-jp - japanese
:0:
* charset.*ks_c_5601|euc-kr|3Deuc-kr|euc-kr|big5|gb2312|utf-8|koi8|iso-ir-111
charset_spam
:0:
* charset.*iso-8859-[2-8]|euc-jp|iso-2022|windows-125
charset_spam
:0:
* charset.*shift_jis|x-johab|x-unified-hangul
charset_spam
:0:
* charset.*cn-gb|cn-big5|utf-8|x-euc-tw|iso_2022_cn
charset_spam
:0:
* ^Subject:*ks_c_5601-1987|euc-kr|3Deuc-kr|euc-kr|big5|gb2312|utf-8
charset_spam
I recreated the simple filters above by looking at lots of spam and also visiting sites that list charsets:
http://www.terena.nl/library/multiling/ml-docs/wincharsets.html
Anyone have any ideas or suggestions for improving these recipes? They are working OK and I'm refining them daily..... Neo