Spam Filtering: Understanding SEP and CEP

Greg Reemler
Mon, 14 Apr 2008 04:56:52 +0000

In order tohelp folksfurther understand the differences between CEP and SEP, prompted byMarc�s reply in the blogosphere, More Cloudy Thoughts, here is the scoop.
In the early days of spam filtering, let�s go back around 10 years, detecting spam was performed with rule-based systems.
In fact, here is a link to one of the first papers that documented rule-based approaches in spam filtering, E-Mail Bombs and Countermeasures: Cyber Attacks on Availability and Brand Integrity published in IEEE Network Magazine, Volume 12, Issue 2, p.10-17 (1998).** At the time, rule-based approaches were common (the state-of-the-art)in antispam filtering.
Over time, however, the spammers get more clever and they find many ways to poke holes in rule-based detection approaches.
They learn to write with spaces between the letters in the words, they change the subject and message text frequently, they randomize their originating IP addresses, they use IP addresses of your best friends, they changed the timing and frequency of the spam, etc. ad infinitium.
Not to sound like an elitist for speaking the truth,* but the more operational experience you have with detection-oriented solutions, the more you will understand that rule-based approaches (alone)are not scalable nor efficient.**If you followed a rules-based approach (only),*againstheavy, complex spam (the type of spam we see in cyberspace today), you would spend much of your time writing rules and still not stop very much of the spam!
The same is true for the security situation-detection example in Marc�s example.
Like Google�s Gmail spam filter, and Microsoft�s old Mr Clippy (the goofy help algorithm of the past), you need detection techiques that use advanced statistical methods to detect complex situations as they emerge.* With rules, you can only detect simple situations unless you have a tremendous amount of resources to build a maintain very complex rule bases (and even then rules have limitations for real-time analytics).
We did not make this up at Techrotech, BTW.** Neither did our favorite search engine and leading free email provider, Google!***
This is precisely why Gmail has a great spam filter.Google detects spam with a Bayesian Classifer, not a rule-based system. If they used (only) a rule-based approach, your Gmail inbox would be full of spam!!!*
The same is true for search and retrieval algorithms, but that is a topic for another day.* However, you can bet your annual paycheck that Google uses a Bayesian type of classifer in their highly confidential search and retreival (and - hint - classification) algorithms.
In closing, don�t let the folks selling software and analysts promoting three-letter-acronyms (TLAs)cloud your thinking.
What we are seeingin the market place, the so-called CEP market place, are simple event processing engines. CEP is already happening in the operations of Google, a company thatneeds real-time CEP for spam filtering and also for search-and-retrieval. We also see real-time CEP*in top quality security products that use advanced neural networks, and Bayesian networks,*to detect problems (fraud, abuse,*denial-of-service attacks, phishing, identity theft)*in cyberspace.

Source...