Spamassassin Beats CRM-114 In Anti-Spam Shootout

Catch up on stories from the past week (and beyond) at the Slashdot story archive

Spamassassin Beats CRM-114 In Anti-Spam Shootout 330

Posted by timothy on Tuesday June 22, 2004 @11:24PM from the hawaii-alaska-and-utah dept.

Simon Lyall writes "A new study of antispam software shows that Spamassassin performed well in various configurations along with Spamprobe , Bogofilter and Spambayes also came out good while CRM-114 failed to live up to its previous claims . The study shows: 'The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day.'"

This discussion has been archived. No new comments can be posted.

Spamassassin Beats CRM-114 In Anti-Spam Shootout

Load All Comments

Search 330 Comments Log In/Create an Account

Comments Filter:

Correct link to CRM-114 (Score:5, Informative)

by athakur999 ( 44340 ) writes: on Tuesday June 22, 2004 @11:27PM (#9502934) Journal

CRM-114 [sourceforge.net]

The link in the article points to SpamBayes again.

Share
twitter facebook
- Isn't Human Accuracy always 100% (Score:4, Insightful)
  
  by PetoskeyGuy ( 648788 ) writes: on Tuesday June 22, 2004 @11:43PM (#9503021)
  
  From the CRM-114 site...
  
  News Flash: As of Feb 1 through March 1, 2004, 8738 messages (4240 spam, 4498 nonspam), and my total error rate was ONE. That translates to better than 99.984% accuracy, which is over ten times more accurate than human accuracy
  
  Maybe I'm missing something human accuracy always going to be 100%? I tell the computer what is spam, it learns. I may decide that regardless of what it thinks, this last message is OK. So aside from clicking too fast or changing your mind (which is a common thing to do) how can a filter ever suggest it is be better then people at deciding what people want to see?
  
  Parent Share
  twitter facebook
  - Re:Isn't Human Accuracy always 100% (Score:5, Insightful)
    
    by sholden ( 12227 ) writes: on Tuesday June 22, 2004 @11:50PM (#9503073) Homepage
    
    People make mistakes.
    
    Yes, given one message to classify as spam or ham you are going to get it right 100% of the time.
    
    Given 8000 messages to classify the wonders of boredom is going to mean you make a mistake every so often (not an "oops I clicked the wrong button" mistake, but an "oops I put it in the wrong folder because the subject looked spammy and I couldn't be bothered checking the body" mistake).
    
    In practice though, those stats on human accuracy are provided by having one person classify email that has been classified by others - which of course means some of the mistakes in fact be disagreements...
    
    Parent Share
    twitter facebook
    - Re:Isn't Human Accuracy always 100% (Score:5, Funny)
      
      by fireman sam ( 662213 ) writes: on Wednesday June 23, 2004 @12:28AM (#9503269) Homepage Journal
      
      Remember, an email being classified as spam is sujective. For example, you might consider a message from a Nigerian bank manager spam, but I may consider it a way to pay of the house :)
      
      Or, presonally I consider all email I get with the from hotmail.com is spam. But that is my opinion.
      
      OT: btw, a friend at work actually got a Nigerian scam letter in the post. Because it was not email, he thought it was real.
      
      Parent Share
      twitter facebook
      - Re:Isn't Human Accuracy always 100% (Score:4, Funny)
        
        by Anonymous Coward writes: on Wednesday June 23, 2004 @12:35AM (#9503315)
        
        OT: you need smarter friends.
        
        Parent Share
        twitter facebook
The Mozilla ThunderBird SPAM filter (Score:5, Interesting)

by k.ellsworth ( 692902 ) writes: on Tuesday June 22, 2004 @11:30PM (#9502948)

the mozilla spam filter does a very good job too, when it learns enough it becomes over 95% acurate. i dropped evolution for it , and never looked back

Share
twitter facebook
- Re:The Mozilla ThunderBird SPAM filter (Score:2, Interesting)
  
  by Cyb3rBull3ts ( 779853 ) writes:
  
  If you use the Mozilla TB spam filter with your ISP filter its near 99% accurate.
  
  I have gone from a wopping 200 spam messages a day (a very old e-mail address) to the occational spam message once a week.
  
  Leme do the math. 200*7 = 1400. 1399/1400 = 0.9992857 accruaccy. Not TOO bad :D
- Re:The Mozilla ThunderBird SPAM filter (Score:3, Informative)
  
  by ImpTech ( 549794 ) writes:
  
  Of course its pretty easy to hook spamassassin, bogofilter, or whathaveyou into Evolution. Tutorials abound if you search google. Thunderbird's nice, but IMO Evolution's still a bit nicer, so it was worth my time to plug in a spam filter manually.
  - - Re:The Mozilla ThunderBird SPAM filter (Score:5, Insightful)
      
      by norton_I ( 64015 ) writes: <hobbes@utrek.dhs.org> on Wednesday June 23, 2004 @04:01AM (#9504420)
      
      Better to do spam filtering with your MTA/MDA anyway, if possible. That way, the same filter is used no matter which email client you use from which computer. Plus, it means you don't have to download spams to your MUA when on a slow connection.
      
      Now if only I could get the rest of my mail configuration to be shared between evolution, mutt, and squirrelmail.
      
      Parent Share
      twitter facebook
- Re:The Mozilla ThunderBird SPAM filter (Score:4, Interesting)
  
  by Mark_MF-WN ( 678030 ) writes: on Tuesday June 22, 2004 @11:40PM (#9503008)
  
  It works with IMAP too -- which is something most other spam filters aren't capable of.
  
  Parent Share
  twitter facebook
Invasion (Score:2, Insightful)

by artlu ( 265391 ) writes:

I must admit that I am not upto date on these new anti-spam software packages, which operate on the server side. However, what is the probability of real mail getting rejected by these things. It seems almost like an invasion of privacy to block my own email even if it is from a "benevolant big brother" perspective.
I guess that is why there are privacy policies though.

aj

GroupShares Inc. [groupshares.com] - A Free and Interactive Stock Market community!
- Re:Invasion (Score:2)
  
  by p2sam ( 139950 ) writes:
  
  The point of automated mail sorting isn't about having 0 false negatives. It's about having a lower false negative than if YOU were to sit down and sort the hundreds of spam yourself.
- I'm running SpamAssassin at work. (Score:5, Insightful)
  
  by khasim ( 1285 ) writes: <brandioch.conner@gmail.com> on Wednesday June 23, 2004 @12:21AM (#9503237)
  
  People LOVE it.
  
  There are some false positives and some false negatives.
  
  But I have it set to delete anything 12+. That gets rid of the worst of the worst spam. So far, not a single complaint of any email being deleted.
  
  Everything else has the subject re-written so people can run their own rule set against it.
  
  In the past 8 hours
  1867 messages received
  375 messages deleted
  1266 messages flagged as spam
  
  So, only a few hundred actual, good emails.
  
  Of course, that's only 4 hours during the regular work day (and 4 hours after work). But you can see the proportions. It saves people a TON of time.
  
  And it makes them happier when they don't have to constantly dig through crap to see if any real messages have arrived.
  
  Now, those spam messages are NOT distributed evenly. Our HR manager had her email address posted on the website. So she gets about 20-25% of the spam.
  
  It's not exactly Big Brother 'cause no human sees the deleted spam.
  
  Parent Share
  twitter facebook
  - Re:I'm running SpamAssassin at work. (Score:2)
    
    by YetAnotherDave ( 159442 ) writes:
    
    I have a similar spamassassin setup on the server for my family's email - 5.5 and up gets redirected to a spam box (and I sort thru it - we're family, so BB issues are less, besides which I haven't had a false positive in months) and 10 or greater gets tossed.
    
    The two thresholds have been creeping down as the bayes system gets more trained. I started with 7 or greater getting redirected, and 15 or greater getting tossed...
    
    If only I could convince work to use this great, free system. They're using a reall
  - Re:I'm running SpamAssassin at work. (Score:3, Insightful)
    
    by Robmonster ( 158873 ) writes:
    
    So far, not a single complaint of any email being deleted
    
    How do they know they are missing any emails to complain about it?
    
    I had a recent argument with my email provider. They introduced blacklist filtering to eliminate the worst of their spam. In the process it also blacklisted some legitimate email. (The mails in question were Topic Reply notifications from a message board)
    
    I dont have a problem with filtering, as long as there is a way to review undelivered mails
    
    In my case I only realsied something
  - Re:I'm running SpamAssassin at work. (Score:3, Insightful)
    
    by sTeF ( 8952 ) writes:
    
    I'm also running spamassassin, but i am absolutely not satisfied with the performance of it. how long does it take for your SA to scan one message? My mailserver is only a Athlon 600, but still this does not justify a few seconds hit per message.
    
    other than the performance, i'm really happy with SA.
Quit acting like goddamn babies... (Score:5, Funny)

by Anonymous Coward writes: on Tuesday June 22, 2004 @11:32PM (#9502961)

Baysian, gaysian. Real men hit delete.

Share
twitter facebook
- No, REAL MEN... (Score:3, Insightful)
  
  by Dimensio ( 311070 ) writes:
  
  ...hammer the spammer's ISP with complaints until the advertised website is DEAD, DEAD, DEAD.
- Re:Quit acting like goddamn babies... (Score:5, Funny)
  
  by fireman sam ( 662213 ) writes: on Wednesday June 23, 2004 @12:30AM (#9503284) Homepage Journal
  
  Pfft, Real men have this as the ~/.bashrc
  
  #!/bin/sh
  rm -f /var/spool/mail/$USER
  
  Who needs email.
  
  Parent Share
  twitter facebook
  - Re:Quit acting like goddamn babies... (Score:3, Funny)
    
    by Too Much Noise ( 755847 ) writes:
    
    Silly rabbit! all you need is
    
    ln -s /dev/null /var/spool/mail/$USER
    
    and you will have email peace forever. ^_^
  - - Re:Quit acting like goddamn babies... (Score:2)
      
      by idiotnot ( 302133 ) writes:
      
      killall -TERM sendmail
      echo 'SENDMAIL="NONE"' >> rc.conf
      
      *real men* don't do sysv, or so I've heard.
      
      I do sysv and I don't run sendmail.
      
      I also am typing this on a Macintosh. /me seriously questioning masculinity at the moment.....
- Re:Quit acting like goddamn babies... (Score:2)
  
  by Technician ( 215283 ) writes:
  
  Baysian, gaysian. Real men hit delete.
  
  Real men have a life instead of spending the day poking a small button over and over.
  - - - Re:Quit acting like goddamn babies... (Score:3, Insightful)
        
        by Technician ( 215283 ) writes:
        
        just a different button...
        
        I assume you are not referring to the delete key. ;-) There is more to life than hitting the delete key.
I didn't RTFPDF... (Score:4, Interesting)

by john_smith_45678 ( 607592 ) writes: on Tuesday June 22, 2004 @11:32PM (#9502964) Journal

The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day.

How many false positives though?

Share
twitter facebook
- Re:I didn't RTFPDF... (Score:2)
  
  by Malc ( 1751 ) writes:
  
  Why's this moderated "troll". It's a very good question. I'd rather receive some spam than have just one valid message blocked. I use Yahoo and they piss me off sometimes with their false-positives.
  - Re:I didn't RTFPDF... (Score:2)
    
    by timeOday ( 582209 ) writes:
    
    Yup, I can easily reduce spams to fewer than 2 per day. Just redirect all mail to /dev/null.
I use two... (Score:2, Interesting)

by hkfczrqj ( 671146 ) writes:

I use Spamassassin. Surviving mail then goes through CRM-114. At least in my case, it works better than each of the filters on its own.
No HTML, Just ps or pdf, conclusions inside (Score:5, Informative)

by randyest ( 589159 ) writes: on Tuesday June 22, 2004 @11:34PM (#9502971) Homepage

And a long document it is (funny placeholder images though.) Here's the conclusions for the impatient but interested in a little more than the summary:

Supervised spam filters are effective tools for attenuating spam. The best-performing filters reduced the volume of incoming spam from about 150 messages per day to about 2 messages per day. The corresponding risk of mail loss, while minimal, is difficult to quantify. The best-performing filters misclassified a handful of spam messages early in the test suite; none within the second half (25,000 messages). A larger study will be necessary to distinguish the asymptotic probability of ham misclassification from zero.

Most misclassified ham messages are advertising, news digests, mailing list messages, or the results of electronic transactions. From this observation, and the fact that such messages represent a small fraction of incoming mail, we may conclude that the filters find them more difficult to classify. On the other hand, the small number of misclassifications suggests that the filter rapidly learns the characteristics of each advertiser, news service, mailing list, or on-line service from which the recipient wishes to receive messages. We might also conjecture that these misclassifications are more likely to occur soon after subscribing to the particular service (or soon after starting to use the filter), a time at which the user would be more likely to notice, should the message go astray, and retrieve it from the spam file. In contrast, the best filters misclassified no personal messages, and no delivery error messages, which comprise the largest and most critical fraction of ham.

A supervised filter contributes significantly to the effectiveness of Spamassassin's static component, as measured by both ham and spam misclassification probabilities. Two unsupervised configurations also improved the static component, but by a smaller margin. The supervised filter alone performed better than than the static rules alone, but not as well as the combination of the two.

The choice of threshold parameters dominates the observed differences in performance among the four filters implementing methods derived from Graham's and Robinson's proposals. Each shows a different tradeoff between ham accuracy and spam accuracy. ROC analysis shows that the differences not accountable to threshold setting, if any, are small and observable only when the ham misclassification probability is low (i.e. hm
CRM-114 and DSPAM exhibit substantially inferior performance to the other filters, regardless of threshold setting. Both exhibit substantial learning throughout the email stream, leading us to conjecture that their performance might asymptotically approach that of the other filters. From a practical standpoint, this learning rate would be too slow for personal email filtering as it would take several years at the observed rate to achieve the same misclassification rates as the other systems. Both these systems were designed to be used in a train on error configuration, and do not self-train. This configuration could account for a slow learning rate as each system avails itself of the information in only about 1,000 of the 50,000 test messages. In an effort to ensure that we had not misinterpreted the installation instructions, we ran CRM-114 in a train-on-everything configuration and, as predicted by the author, the result was substantially worse.

Spam filter designers should incorporate interfaces making them amenable for testing and deployment in the supervised configuration (figure 4). We propose the three interface functions used in algorithm 1 - filterinit, filtereval, and filtertrain - as a standardized interface. Systems that self-train should provide an option to self-train on everything (subject to correction via filtertrain) as in algorithm 2.

Ham and spam misclassification proportions should be reported separately. Accuracy, weighted accuracy, and precision should be avoided as primary evaluation measures as th
Read the rest of this comment...

Share
twitter facebook
Mozilla Messenger / Thunderbird Performance? (Score:5, Interesting)

by Mark_MF-WN ( 678030 ) writes: on Tuesday June 22, 2004 @11:34PM (#9502974)

I wonder how Mozilla Messenger/Thunderbird's spam filtering stacks up against these filters? I've heard some negative comments about the Mozilla filtering system, but it's worked wonders for me.

Share
twitter facebook
- Re:Mozilla Messenger / Thunderbird Performance? (Score:2, Informative)
  
  by k.ellsworth ( 692902 ) writes:
  
  100% agreed I use mozilla thunderbird spam filter (after some human teaching to it) and it works marvelous, on a spam-me(account used on usenet, and some forums and to anything that i know that will become a spam source but i need to give a valid email address anyways) email account i have i recive ~38K spams a month and thunderbird only misses 3 or 4 per day... sometimes i look the JUNK folder of it and i haven't seen any false positive on it so far. ThunderBird is THE email client, works on Linux and W
- Re:Mozilla Messenger / Thunderbird Performance? (Score:2)
  
  by mbourgon ( 186257 ) writes:
  
  Mozilla 1.8 has (had?) a real problem with it's Junk Mail controls... namely, they don't (didn't?) work nearly as well as 1.7's. Someone feel free to karma whore the details, but I think the problem is that they're using a bunch of different spam filters, and it's not as powerful as whatever was used in 1.7.
- Re:Mozilla Messenger / Thunderbird Performance? (Score:2)
  
  by darkmeridian ( 119044 ) writes:
  
  I used Thunderbird and the SpamBayes proxy concurrently for a while. SB kicks the crap out of the Thunderbird.
  
  Just one example. I get spam from VIPClubber. I don't know why and I'm afraid to click the "Cancel Me" link because I didn't sign up for anything. Anyway, they don't spoof their headers. Everything from VIPClubber.com is spam. Thunderbird, after ~30 from VIPClubber, still lets some through. SB does not.
  
  Perhaps the TB should integrate SB. This demonstrates the power of open-source software. Just im
  - SpamBayes + Thunderbird (Score:3, Informative)
    
    by Anthracks ( 532185 ) writes:
    
    Thunderbird already has integrated significant improvements based on SpamBayes, I believe. See http://bugzilla.mozilla.org/show_bug.cgi?id=23009 3 , which was closed about a month ago. The test data from that patch is encouraging, although obviously results will be different for everyone since not everyone gets the same type of spam. If you want to keep tabs on upcoming refinements to junk mail filtering, take a look at the dependencies of this meta bug: http://bugzilla.mozilla.org/show_bug.cgi?id=228674 .
  - Re:Mozilla Messenger / Thunderbird Performance? (Score:3, Informative)
    
    by WuphonsReach ( 684551 ) writes:
    
    I used Thunderbird and the SpamBayes proxy concurrently for a while. SB kicks the crap out of the Thunderbird.
    
    Definitely agree.
    
    I use the SpamBayes MSOutlook plugin for my work e-mail and it is extremely good at discriminating spam from ham. I use Thunderbird for my non-corporate e-mail. SpamBayes has two additional (and rather important features) that Thunderbird/Mozilla just don't have:
    
    1. SpamBayes (at least the Outlook plug-in) actually has (3) levels of classification... definite ham, maybe, an
- Re:Mozilla Messenger / Thunderbird Performance? (Score:3, Interesting)
  
  by dasmegabyte ( 267018 ) writes:
  
  From person experience, it works pretty well (I think Mail.App is good too, but the management of the junk once marked needs to be customized). But since it's not really a server side program, you can't run a server-side test on it. Hence why it wasn't included in this test.
  
  Some anecdotal "evidence" for you: some of the users at my office run their own spam engines on their desktops because they're control freaks. I let them pass by SpamAssassin entirely. In my observation, SpamAssassin works WAY bette
Spamassasin is great! (Score:2, Informative)

by JohnFromCanada ( 789692 ) writes:

I have been using SpamAssassin in conjunction with Evolution and it has cut my spam to virtually nothing. I wish it was built right into Evolution so that it was a little faster however it is worth the wait as I barely ever get any spam in my Inbox anymore. I set it up with evolution very similar to how it is shown here [atlantawebhost.com]. I really like using it with Evolution however I am curious if anyone knows of anything that would work faster and as efficient in conjuntion with Evolution?
Real way to block spam (Score:2, Interesting)

by DRWHOISME ( 696739 ) writes:

Is to do away with current email protocols and go with new ones with verification.

That should take care of the problems. The gov is now concentrating on this.
- Re:Real way to block spam (Score:2, Insightful)
  
  by PornMaster ( 749461 ) writes:
  
  Is to do away with current email protocols and go with new ones with verification. That should take care of the problems. The gov is now concentrating on this.
  
  Except for making a new standard that's a requirement for doing business with federal agencies, just what do you think government's capable of doing regarding replacing protocols?
  
  -PM
- REAL REAL way to block spam (Score:2)
  
  by Mad Bad Rabbit ( 539142 ) writes:
  
  [Ripley] "I say we take off and nuke the entire planet
  from orbit. That's the only way to be sure."
  
  [Hudson] "F--kin' A..."
  
  [Burke] "Ho-ho-hold on a second! The Earth has a
  very substantial dollar value attached to it!"
  
  [Ripley] "They can BILL me."
- Re:Real way to block spam (Score:2)
  
  by Technician ( 215283 ) writes:
  
  Already done that. I have a geocaching account. It doesn't permit bulk mail of any kind. To mail me, get an account, choose send mail to another user, and fill in the online form. This type of mail so far has been spam free and works. I know for those on bulk lists, it doesn't work for you. But it's a place my family can reach me without haveing to weed out a stuffed inbox and possibly loose the important stuff.
  
  Mailboxes and bulk mail just don't mix. Newsgroup notifications and such should use anoth
A little advice (Score:5, Funny)

by Anonymous Coward writes: on Tuesday June 22, 2004 @11:37PM (#9502992)

You don't want to face an assassin in a shootout. Maybe a pie eating contest, or a spelling bee... but not a shootout.

Share
twitter facebook
I've had CRM114 running for a few months . . . (Score:5, Informative)

by klevin ( 11545 ) writes: on Tuesday June 22, 2004 @11:38PM (#9502994) Homepage Journal

CRM114's best was about 80%, which lasted for a few of weeks (weeks 3-5). Before and after that, it's doing good to catch 25% of the spam. I'm not sure why, but for the last month it's only been catching about 10%. When one gets through, I run it through mailfilter.crm with the learnspam switch. It'll say it's learned it, but if I have it check the spam again, it still lets it past.

Share
twitter facebook
- Re:I've had CRM114 running for a few months . . . (Score:3, Informative)
  
  by CoolGopher ( 142933 ) writes:
  
  I've been running CRM114 for about a year now, and it's performing extremely well. Far better than my Mozilla filter. In fact, just the other week I scrapped Mozilla's junk filter completely and am now relying on CRM alone. It's very rare that I get any misses in either direction.
  
  If I was to make an estimate, I'd say that the error rate is something like .1%, quite possibly less (say 1 miss/5 days, with 200 mails per day). This is having started with clean corpus files and train-on-error only.
  - Re:I've had CRM114 running for a few months . . . (Score:3, Interesting)
    
    by fferreres ( 525414 ) writes:
    
    Me too. I couldn't check email for about a week and grew 4200 or so spam messages and 300 ham ones. 1 spam misclassified...(but some false positives also).
    
    I try to teach the program the least possible (if a message doesn't look like spam for me, even if it is though, I do not teach it).
    
    I also delete de ADV: (prefix) in the subject and the crm114 spam metadata (TAG) and fix it in general so it doesnt get confused when learning spam.
    
    Bad teaching at the beggining leads to lower quality filtering (I did this
Good results with spamprobe (Score:3, Informative)

by bigberk ( 547360 ) writes: <bigberk@users.pc9.org> on Tuesday June 22, 2004 @11:38PM (#9502995)

I have been using spamprobe [sourceforge.net] for some time, with the webfilt [pc-tools.net] front-end, and I'm very pleased with the speedy spamprobe program (written in C++).

I receive approximately 10 legit emails/day and about 300 spam/day. I have only had 2 false positives overall (that's 2 out of about 100,000 total emails received) and on average only 2 spams/day split past the filter. Now I'm testing Spambayes on one of my most spammed accounts, but it's definitely much slower than spamprobe and not more accurate as far as I can tell.

Share
twitter facebook
compute farms for anti-spam AI? (Score:5, Informative)

by potus98 ( 741836 ) writes: on Tuesday June 22, 2004 @11:39PM (#9503000) Journal

From page 24: Hidalgo suggests the use of ROC curves, originally from signal detection theory and used extensively in medical testing, as better capturing the important aspects of spam filter performance.

Perhaps a distributed analysis system (similar to SETI@home [berkeley.edu]) could be used to combat spam. Not only could the idle time of bazillions of CPUs be levereaged to improve "signal" analysis, but perhaps the clients could analyize local incoming mail to corelate new trends in spam originators and then share that information with all of the other clients. Then you could combine that with the genetic evolution improvements of the F1 sim-cars recently mentioned [slashdot.org] on /.

So there's the high-level idea, now you smart people go make it work. :-)

Share
twitter facebook
- Re:compute farms for anti-spam AI? (Score:5, Informative)
  
  by damiangerous ( 218679 ) writes: <1ndt7174ekq80001@sneakemail.com> on Wednesday June 23, 2004 @12:38AM (#9503329)
  
  There are already spam packages that do this, at least the collaborative part. Vipul's Razor [sourceforge.net] (which is under the Artistic license) at the personal level and Brightmail [brightmail.com] (which is closed and not free) at the enterprise/ISP level, off the top of my head.
  
  Parent Share
  twitter facebook
Spamassassin uses collaborative spam-tracking (Score:3, Informative)

by vivek7006 ( 585218 ) writes: on Tuesday June 22, 2004 @11:43PM (#9503030) Homepage

Razor: Vipul's Razor is a collaborative spam-tracking database, which works by taking a signature of spam messages. Since spam typically operates by sending an identical message to hundreds of people, Razor short-circuits this by allowing the first person to receive a spam to add it to the database -- at which point everyone else will automatically block it.

This is a really cool.

Share
twitter facebook
- Re:Spamassassin uses collaborative spam-tracking (Score:5, Informative)
  
  by bigberk ( 547360 ) writes: <bigberk@users.pc9.org> on Tuesday June 22, 2004 @11:53PM (#9503095)
  
  It gets better. Vernon Schryver, networking genius, is responsible for the Distributed Checksum Clearinghouse [rhyolite.com] which does something similar, but as I understand it, is much more efficient for large servers. When our university turned on DCC filtering combined with greylisting, the daily spam to inboxes dropped from hundreds daily to ZERO (I kid you not). I am not aware of any false positives, at least on my account. DCC blew my mind.
  
  Parent Share
  twitter facebook
So I'm not the only one... (Score:5, Informative)

by sholden ( 12227 ) writes: on Tuesday June 22, 2004 @11:44PM (#9503032) Homepage

I did a *much* smaller test of spam filters earlier this year (which was published in hakin9 [haking.pl] but not in English).

I also found that crm114 gave poor results in comparison to other filters - but figured I must have set something up incorrectly...

Share
twitter facebook
Why don't people use catch-all accounts? (Score:5, Interesting)

by mattkinabrewmindspri ( 538862 ) writes: on Tuesday June 22, 2004 @11:44PM (#9503033)
When you register with a hosting company, very frequently, they set up what's called a catch-all account, and any email to your domain that's not addressed to a real address goes there. This is how I use it:
- I only use my main email address with friends and family, and never post it online.
- Whenever I post an email address or register for anything online, I put thatsite@mydomain.com as my email address.
- All email is received by one account, but each message can have a different "to:" header. I set my filters to filter mail to different boxes. Email sent to amazon@mydomain.com goes to the amazon folder. Same with ebay, slashdot, whatever.
- Any time I start receiving spam, I just set my mail server to disregard email sent to whatever email address is getting the spam, and I can stop doing business with the company that sold my email address.
I receive on average 0 spams per day.
Share
twitter facebook
- Re:Why don't people use catch-all accounts? (Score:2)
  
  by YrWrstNtmr ( 564987 ) writes:
  
  Because not everyone has a mydomain.com
- Re:Why don't people use catch-all accounts? (Score:5, Informative)
  
  by sr180 ( 700526 ) writes: on Wednesday June 23, 2004 @12:13AM (#9503195) Journal
  
  Wait till the spammers decide to spam your whole domain. They can start with aaaaaaaa@yourdomain.com and keep going till they get to zzzzzzzz@yourdomain.com, and your mailserver will accept and pass on every single one of these emails.
  I would recommend not using a catch all account, but if you have the domain, create, delete and rename email accounts as you need to...
  
  Parent Share
  twitter facebook
  - Re:Why don't people use catch-all accounts? (Score:2)
    
    by videodriverguy ( 602232 ) writes:
    
    Very true. This happened to me recently and my spam count went from around 30 to over 400!
    
    Thankfully, my host has a 'blackhole' option for the default account. Turned that on and the spam volume dropped back to the previous level.
  - Re:Why don't people use catch-all accounts? (Score:3, Informative)
    
    by dasmegabyte ( 267018 ) writes:
    
    Why would I wait until spammers did that?
    
    Already if a server tries to send the same email to more than three fake addresses at my company, I blacklist the IP for two days. Not just for email, but for any IP traffic. I did this to prevent trojans, but it's a somewhat effective spam deterrant as well.
    - Re:Why don't people use catch-all accounts? (Score:2)
      
      by sr180 ( 700526 ) writes:
      
      Now that is a kick arse idea....
  - Re:Why don't people use catch-all accounts? (Score:3, Insightful)
    
    by sfe_software ( 220870 ) * writes:
    
    Wait till the spammers decide to spam your whole domain.
    
    That's exactly when I decided to disable the "catch-all" and allow only specific addresses. Some spammer sent several hundred identical messages, in a few hours, to made-up names at my domain.
    
    Catch-all is no longer a good idea in my opinion...
- Re:Why don't people use catch-all accounts? (Score:2)
  
  by burns210 ( 572621 ) writes:
  
  what if it isn't ebay that sold the account, rather a random generation spammer sent to ebay@DOMAIN.TLD? Or if the company(or you, by accident) post the email address to the web, and a spider grabbed it and was added to a spammers list?
  
  how many CORP_X accounts do you go through? ebay1@DOMAIN.TLD, ebay2@, ebay3@... ditching each once it starts to recieve spam.
  
  A most interesting approach, though.
- Re:Why don't people use catch-all accounts? (Score:4, Insightful)
  
  by FrenZon ( 65408 ) * writes: on Wednesday June 23, 2004 @12:26AM (#9503258) Homepage
  
  Why don't people use catch-all accounts?
  
  Because you will always have one main 'obvious' address - be it something that goes on your business card, or something you tell to people you meet. For example, I use glen at glenmurphy.com.
  
  Now all it takes is one slip - someone you know to get a virus, whatever, and your address is 'out there' for the taking. Your only possible recourse then is to stop using that address, but for some people that's just not an option, and it's a just bit defeatist to sit there surrendering email address after email address.
  
  Parent Share
  twitter facebook
- Re:Why don't people use catch-all accounts? (Score:2)
  
  by mrpuffypants ( 444598 ) * writes:
  
  alas, that also equates to you receiving 0 emails total per day :(
  
  Some of us don't use spam filters to give us a feeling of life...
- Re:Why don't people use catch-all accounts? (Score:2)
  
  by someguy456 ( 607900 ) writes:
  
  I do the exact same thing, except I don't have my own domain. Instead, I have a free subdomain at cjb.net, which goes something like: somesite.cjb.net I can get every e-mail sent to *@somesite.cjb.net from one login, and can sort and filter it accordingly.
- Re:Why don't people use catch-all accounts? (Score:2, Interesting)
  
  by Anonymous Coward writes:
  
  I do that too. Works great (0/day). The problem is, unlike you, for my job, I have to have a public e-mail address.
  I even got spam from the president of the univesity I work for. (Why spam, because it was a political response to a news paper article that had nothing to do with my job.) When I asked to be removed, I was told I couldn't opt-out, since I worked for the university. So I removed my e-mail address from the offical database. I was lucky. It got worse. I know five other people who did the
- Re:Why don't people use catch-all accounts? (Score:5, Informative)
  
  by lewko ( 195646 ) writes: on Wednesday June 23, 2004 @01:00AM (#9503444) Homepage
  
  I used to do the same. Now I'm paying for it.
  Several viruses were sent to jane@mydomain, pete@mydomain, sedlskjl@mydomain etc.
  
  Inevitably these same addresses are now being used for Spam and viruses as the source OR destination address (meaning I get bounce messages as well).
  
  I HATE it when moron anti-Virus gateway administrators set them up to return confirmed viruses to sender with a polite note - except I am NOT the sender, my address was spoofed.
  
  Unfortunately I have been using the catch-all trick for so long (e.g. ebay.com@mydomain etc.) that it's not as simple as turning it off or setting up filters - I don't even know what all the 'legit' addresses are as I used to create them on the fly and may only get email to some of them once a year or so.
  
  I only ever busted one person for passing on the account details which was satisfying, but I am getting PLENTY of Spam/viruses now instead.
  
  I use the excellent Spam Gourmet [spamgourmet.com] now for instantly creating disposable addresses with the added advantage that they can actually die when I want/need them to.
  
  Parent Share
  twitter facebook
Another data point. (Score:5, Interesting)

by juuri ( 7678 ) writes: on Tuesday June 22, 2004 @11:45PM (#9503039) Homepage

OSX's built in mail seems to be pretty close to the accuracy numbers listed in the above summary. I tend to have one to three pieces of spam slip through which are almost always entirely image based with some poetry or equivalent attached.

I must say I've been pleasantly surprised with the spam filtering it provides and it has been a lot easier than the hoops I used to utilize to clean out my inbox.

Share
twitter facebook
DSPAM (Score:5, Insightful)

by More Trouble ( 211162 ) writes: on Tuesday June 22, 2004 @11:48PM (#9503063)

In real world deploys of statistical filters, something like DSPAM's "global user" feature is necessary. The ability to begin with a relatively mature dictionary is critical to the user experience. Personally, DSPAM is filtering around 200 SPAMs per day for me, allowing one through every few days. It's 99.985% effective for me.

:w

Share
twitter facebook
- Re:DSPAM (Score:4, Informative)
  
  by Daniel Quinlan ( 153105 ) writes: on Wednesday June 23, 2004 @01:30AM (#9503572) Homepage
  
  Quoting the (unfinished) paper:
  CRM-114 and DSPAM exhibit substantially inferior performance to the other filters, regardless of threshold setting. Both exhibit substantial learning through outthe email stream, leading us to conjecture that their performance might asymptotically approach that of the other filters. From a practical standpoint, this learning rate would be too slow for personal email filtering as it would take several years atthe observed rate to achieve the same misclassification rates as the other systems.
  
  This is interesting considering the harsh words the DSPAM author directs towards SpamAssassin in the DSPAM FAQ [nuclearelephant.com]. In contrast, I think, the SpamAssassin developers say they are interested in testing the "dobly" noise reduction technique that DSPAM employs, see SpamAssassin bug 3078 [spamassassin.org].
  
  Parent Share
  twitter facebook
- Re:DSPAM (Score:3, Informative)
  
  by More Trouble ( 211162 ) writes:
  
  Here's a response from the DSPAM [nuclearelephant.com] author.
  
  :w
No DSPAM (Score:2, Interesting)

by XMichael ( 563651 ) writes:

It's unforchunately that DSPAM was left out of this very good quality report. I have personally used SpamAssassin, SpamProbe and DSPAM [nuclearelephant.com]

After using each for a couple months at a time, I found DSPAM to be by far the most effective (after it was properly trained)

DSPAMS claim "DSPAM (as in De-Spam) is an extremely scalable, open-source statistical hybrid anti-spam filter. While most commercial solutions only provide a mere 95% accuracy (1 error in 20), a majority of DSPAM users frequently see between 99.95%
Problems with Bayesian filtering (Score:5, Informative)

by dlevitan ( 132062 ) writes: on Tuesday June 22, 2004 @11:54PM (#9503101)

Up to this past weekend I was using only bogofilter (which is a pure bayesian filter). I seem to get about 200 spam a day on my main account. Until about a month or two ago bogofilter was amazing - I'd get maybe 1 or 2 spam a day, if that many. Then recently I suddenly started getting hit with 20 spam messages a day, and I noticed most of those were using lots of common words to bypass bogofilter. Most spam was still being removed by bogofilter, but enough to make me annoyed. This past weekend I also enabled spamassassin (without its bayes filter though), and its cut down the number of spam to maybe 5 a day, but its still too much for me. I'm hoping we have the next breakthrough in spam filtering technology soon (akin to bayesian filtering) because it seems that every new technique we use to filter the spam is eventually targeted by the spammers and bypassed.

Share
twitter facebook
- Re:Problems with Bayesian filtering (Score:3, Informative)
  
  by swillden ( 191260 ) * writes:
  
  Then recently I suddenly started getting hit with 20 spam messages a day, and I noticed most of those were using lots of common words to bypass bogofilter.
  This is very surprising to me, and it's not my experience at all (also using bogofilter). My bogofilter doesn't seem to be fooled one bit by those common words, at least not in a way that causes it to missclassify spam. That makes sense, actually, since most common words end up being viewed by the filter as neutral, and if the spammers want to sell
the true cause of the majority of spam... (Score:3, Interesting)

by Etaipo ( 787613 ) writes: on Tuesday June 22, 2004 @11:58PM (#9503127) Homepage

users. those silly, silly users. i was in charge of spam for my company for the greater part of a year. using an outdated KEYWORD based system > I was forced to read every.caught.message to look for false positives. ... did you catch that? yeah...i had to go through EVERY 'spam' tagged e-mail that went through the company. needless to say, after the first week i was ready to gouge my eyes out. but hey, at least i earned that 'i read your e-mail' sticker! anyways, the point that i'm failing to make here is the cause of the spam... the damn users. whether it be responding to spam, putting their e-mail address in every single webform they encounter while surfing instead of working, signing up for spam voluntarily, or whatever the cause may be.. i ran some numbers on the logs, and came to an astounding find. a few people were getting literally a thousand messages blocked, per month. i, on the other hand, had maybe one or two a month. and i'm not a nazi with my e-mail address....but i do take some care in what places i type it in. an ounce of prevention goes a long way folks.

Share
twitter facebook
- Re:the true cause of the majority of spam... (Score:2)
  
  by stevesliva ( 648202 ) writes:
  
  Sure man, blame the victim. She was asking for it.
  All sarcasm aside, I DO ask for it with my hotmail account (see above) and that just makes me so glad that I keep my other addresses quiet!
SpamAssassin used to work but recently... (Score:3, Interesting)

by squisher ( 212661 ) writes: on Tuesday June 22, 2004 @11:58PM (#9503129)

SpamAssassin used to be super-good for me, but recently it has become a nightmare... even with Bayes filters on and training it with about almost 2000 spam messages that have escaped it before, I STILL get an enourmous amount of spam every day... maybe I'm doing something wrong with the config, I admit that I haven't spent that much time on that, but it seems like it should be working better :-((.

Spam sucks. Everyone stop buying the products advertised and it'll be over. But then again, people will always be too dumb for an easy solution like that (reminds me of the gooback southpark...)

Share
twitter facebook
Issues with testing corpus (Score:5, Interesting)

by w_mute ( 40724 ) writes: on Wednesday June 23, 2004 @12:00AM (#9503143)

I haven't read everything in detail yet, but one of the things that stands out is that their 'gold standard' representing the best result consists of 9,038 ham messages (18.4%) 40,048 spams (81.6%). While large, the dataset is unbalanced. One of the things that is recommended by many of the filters is training on equal proportions of ham/spam in order to prevent biasing (overfitting).

Their train on errors approach may simulate what goes on with some filters it doesn't reflect the scenario where there is a initial dataset to be trained on _before_ new messages are processed. Instead, each message is in essence 'new'. So in their tests the machine learning filters start out knowing nothing, but SpamAssassin starts out with its inbuilt ruleset. Not exactly fair.

-Greg

Share
twitter facebook
- Re:Issues with testing corpus (Score:2, Insightful)
  
  by PlusFiveTroll ( 754249 ) writes:
  
  Not exactly fair.
  
  Huh, since when did spammers start playing fair!. This is about winning, not software political correctness.
  Also on the unbalanced dataset, I train my filter with spam corpuses that reflect my what I receive in my email. Many accounts receive 10 spams for every ham. The biggest thing that I've had to retrain on is receipts for airplane tickets, spamassassin seems to think they are spam the first time I receive them, and from the article, they had the same issues too.
why I don't use spam filters (Score:2, Interesting)

by Begemot ( 38841 ) writes:

just my humble opinion...

i use email for business and receive many letters from clients. i just afraid to loose any of these because of a spam filter. therefore even when i used one, i checked all the emails anyway.
SpamAssassin is a dud (Score:2)

by Animats ( 122034 ) writes:

My hosting service, EZ Publishing [ezpublishing.com], uses SpamAssassin. Their hosting service is fine, but incoming mail filtering is terrible. SpamAssassin is only filtering out about 25% of the incoming spam. I'm getting about 2000 spams per day after SpamAssassin filtering.
I use Netscape's Bayesian filter as a second tier, and that removes about 60% of the remaining spam.
SpamCop was better, until IronPort bought them and they went black-hat, with Bonded Spammer [bondedsender.com] and the Spam Engine [ironport.com].
- Re:SpamAssassin is a dud (Score:2)
  
  by sloanster ( 213766 ) * writes:
  
  No offense, but that's a pretty ignorant statement, unless you know that "spam assassin" is indeed running, and what version, with what added rule packs, and what the scoring threshold is set at.
  
  There's a wide range of things that could be called "spam assassin", but without competent administrators who keep the program and the rulesets up to date, the effectiveness can degrade significantly, especially in a vanilla install of an older version, that's never been trained.
  - Re:SpamAssassin is a dud (Score:2)
    
    by Animats ( 122034 ) writes:
    The most recent e-mail SpamAssassin botched has this header:
    
    X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on main6.ezpublishing.com X-Spam-Status: No, hits=1.1 required=4.0 tests=HTML_40_50,HTML_MESSAGE, HTML_TAG_EXISTS_TBODY,MIME_HTML_ONLY,NO_REAL_NAME, RCVD_IN_SORBS autolearn=no version=2.60
    The mail content is "Major income on eBay", sent via a free account on Netster. If it can't recognize that as spam, it's not doing much.
    I turned the threshold down from 5 to 4; at 5, it
- Bayes SHOULD be better than vanilla SpamAssassin (Score:3, Interesting)
  
  by khasim ( 1285 ) writes:
  
  For an INDIVIDUAL, Bayesian filter works far better than just the regular SpamAssassin rulesets.
  
  That's because the Bayesian system will LEARN from you what you consider to be spam and ham.
  
  I use SpamAssassin with Bayesian filtering turned on and it catches over 90% of the spam. But then I've fed it a decent sized corpus.
Active Spam Killer (Score:2)

by Admiral Llama ( 2826 ) writes:

No false positives, disgusting amounts of spams killed. 'Tis a glorious thing.
I've been using SpamAssassin about 6 months (Score:2, Interesting)

by cool_st_elizabeth ( 730631 ) writes:

And it has just now learned to filter out almost all the spam. IIRC, SpamAssassin said it would learn what to mark as spam after a couple hundred obvious spams and the same number of obvious non-spams. I still get the occasional false positive.
Spamgourmet (antichef) and SpamSieve (Score:5, Informative)

by dougman ( 908 ) writes: on Wednesday June 23, 2004 @12:38AM (#9503328)

Why people don't use disposable accounts is beyond me. Once you start using Spamgourmet [spamgourmet.com] you'll never go back. I've been active with them over two years and here's my current stats:

Your message stats: 339 forwarded, 43,796 eaten. You have 155 disposable address(es).

yeah, that's right, thanks to disposable addresses I *haven't* read 43,457 spam emails! When I do need (want) to use my real address, I use SpamSieve (with Entourage X) - very good baysean filter (not sure if it Mac only or not).

Share
twitter facebook
- Re: SpamSieve (Score:4, Interesting)
  
  by hondo77 ( 324058 ) writes: on Wednesday June 23, 2004 @01:57AM (#9503684) Homepage
  I'd like to second SpamSieve [c-command.com]. If more than one piece of spam gets through in a day (where each day I receive > 500 pieces of email), I am truly surprised. My stats for June are:
  
  1007 Good Messages
  
  13729 Spam Messages (93%)
  
  1 False Positives
  
  24 False Negatives (96%)
  
  99.8% Correct
  
  Works for me. Oh, the false positive was a list that I just signed up for. They sent a confirmation mail, I checked to see if it was caught (it was), and marked it as "good". Piece of cake.
  Parent Share
  twitter facebook
POPFile? (Score:3, Interesting)

by gmuslera ( 3436 ) writes: on Wednesday June 23, 2004 @01:19AM (#9503528) Homepage Journal

I'm using since months POPFile [sourceforge.net] and it have an accuracy of 99.75% with 17k messages. Its not very dependant on the client, it just sit as a pop3 proxy, and it classifies mails in buckets that you can define (so no need to just split mail in spam/ham, for some time i even have categories for virus, nigerian-like scams, automated reports, etc).
Would be interesting to see how that message sample reacts against more spam filtering technologies, or even webmails with spam protection integration.

Share
twitter facebook
- Re:POPFile? (Score:5, Interesting)
  
  by puppetman ( 131489 ) writes: on Wednesday June 23, 2004 @01:53AM (#9503670) Homepage
  
  Yah, I ran this for about a year before I switched ISPs (and got a new, spam-free email account).
  
  It was amazingly accurate, with about one mistake per thousand emails once I had it trained. I'll go back to it if I start to get a bunch of crap in my in-box. I remember reading that spammers would test their emails against the most popular anti-spam filters, but they still almost never got through Popfile.
  
  I tried SpamAssassin as well, after I had some issues with PopFile (it would stop responding after a large volume of email), and it was more difficult to set up, and didn't have the nice configuration options of Popfile.
  
  Parent Share
  twitter facebook
I keep hearing about how great spamassasssin is... (Score:2)

by rsilvergun ( 571051 ) writes:

maybe I'm doing something wrong (wouldn't be the first time). I run the spamd command as root (tried it with the -d option too), pointed sa-learn at 3000+ spams and about 200 hams and set up kmail filters to pipe everything less than 250k through spamc and move anything with X-Spam-Flag=Yes to junk. It's slow as heck and only filters about 60% of my spam. Bogofilter was doing about 80% (it's more trouble to set up though). But I keep reading posts of people with 98% filter rates.
Counterintuitive Advertising (Score:5, Interesting)

by KalvinB ( 205500 ) writes: on Wednesday June 23, 2004 @03:36AM (#9504295) Homepage

Some guy a few stories back mentioned he was getting 3000 ad impressions and 15 clicks a day or so with AdSense. Which is terrible. At first I assumed he was just oversaturating his visitors with ads. But his ad placement is also terrible. It's at the very bottom of the page where few are going to see it. But he is also over saturating. His pages are very busy with information and the ads are on every single page.

What happens when you constantly shove something in someone's face is that they learn to ignore it. Either consciously or subconsciously. In the case of advertising if someone is shown an ad and they aren't interested and another ad is shown there's a very good chance they won't even notice it. Even if they would have been interested in what it was offering. This is because they were annoyed by the first ad so they just mentally block any additional ads.

This is why the response rate to spam is so terrible. People for the most part just subconsciously ignore it. It's just noise.

Advertisers like radio stations because it tends to be a captive audience. People are very unlikely to turn the station when ads come on. However there is one local station that I've learned to turn the channel on when the ads start because I know I'm going to get to my destination before another song comes on. There are other stations that I don't change the channel on because I know it's just a short break.

Just like the guy pumping out 2985 ads that no one clicks on, spammers would benefit immensly by pulling a large chunk of the ads. People are more likely to notice when they aren't bombarded by ads and the response percentage goes up.

It seems counterintuitive that less advertising means a greater response but that's actually the case.

I normally notice the ad banners on Slashdot because that's pretty much all the advertising there is. I rarely ever notice the text ads. Even though they're placed on the left side in the best position as anyone who scrolls the page is probably going to see them. Slashdot's problem is that the ads blend in with the web-site's color scheme too well so they're pretty much invisible to anyone with a scroll wheel.

On GameDev the site is so littered with advertising that I never notice it anymore. By the time I close the stupid popup ads that circumvent Google's pop up blocker using evil little tricks I'm too annoyed to even look at the other ads.

Web-sites get desperate and think more ads == more money. And the actual result is less valuable ad space because the click thru rate is so low and fewer clicks because users tune the ads out which results in less money than if they had focused on the click thru percentage rather than the number of impressions. If you have a web-site with a high click thru rate advertisers are more likely to pay more because they know that if they show an ad there's a very good chance they'll get a click thru.

But then I'm guess spammers have never taken a course in marketing or bothered to think about things from their potential customer's perspective.

Keeping ineffective ads visible hurts the effectiveness of the better ads. Spammers are in effect destroying themselves in that area. As are ad happy web-sites.

Ben

Share
twitter facebook
DSPAM. (Score:5, Interesting)

by asackett ( 161377 ) writes: on Wednesday June 23, 2004 @04:20AM (#9504514) Homepage

I've been using DSPAM for nearly a year now, and it's just kept on getting better. I can't imagine life without it now.

I have 17 DNS-based blacklists in front of it, because I would rather block the messages at the network interface than filter them with my own resources, but those that slip through don't stand much of a chance of reaching my inbox. I have had my current email address out there on the web and in Usenet for six years, so I see a lot of junk -- DSPAM stops all but one or two per month. SpamAssassin can't even come close to that.

Share
twitter facebook
CRM114 Author Response (Score:3, Informative)

by Anonymous Coward writes: on Wednesday June 23, 2004 @07:56AM (#9505363)

I am the author of CRM114 and I corresponded with Professor Carmack for setup assistance during this study; he did have some problems with CRM114 that he brought to my attention and which were possibly never quite resolved.

I can also state that *do* run CMR114 myself; I also run SpamAssassin (regularly maintained by the systems staff) on a parallel account. I find that SA gets about 90+ percent of what makes it past the firewall's immediate RBL lists (which matches Prof. Cormack's Figure 8 pretty closely); CRM114 nails 99.9% or more (this week, ending June 21, 2004, my CRM114 stats are 2528 nonspam and 1114 spam messages, and had just 1 error (a false reject) which is 99.972% accuracy.

I have gotten reports from some very happy users who are seeing similar accuracies; I've also gotten sad reports similar to Prof. Carmack's that show very weak accuracy.

I can conclude from this (and other reports) that filter performance varies _greatly_ with spam mix - that is to say, Your Mileage Will Vary.

Further, consider Fig 15, which compares CRM114's accuracy with respect to nonspam v. spam. Note that the two curves are displaced considerably, by a factor of accuracy between 3 and 5 times!

This is odd, because CRM114 is _entirely_ symmetrical; it does NOT have any predisposition toward (or against) erring on the side of caution; the only difference between nonspam and spam is the names of their files, which could be changed to "foo.css" and "bar.css" (or even interchanged) without affecting anything else.

Therefore, the two accuracy curves _should_ therefore lie on top of each other; there is no difference in the processing. The fact that the nonspam v. spam curves seem to differ by a factor of 3 to 5 in magnitude gives me some reason to believe that the setup issues Prof. Carmack encountered never really were completely addressed.

-Bill Yerazunis

Share
twitter facebook
And SpamAssassin is just getting better (Score:4, Informative)

by KjetilK ( 186133 ) writes: <kjetil AT kjernsmo DOT net> on Wednesday June 23, 2004 @08:22AM (#9505558) Homepage Journal

I've been using SA 2.63 for some time now. At first, my statistics was about 90% rejected at SMTP-time, 0.1% false negatives and 0.01% false positives. Spammers have learned to adapt, so now I have about 2% false negatives.
But SpamAssassin is just getting better and better. Version 3.0 is coming up, and 3.0-pre1 [gmane.org] was recently released. I do not have a test system available for it, but those who have may want to take it for a spin.
Especially for large sites, this is extremely interesting. It adds relational database support for the Bayes database, so it should be a lot easier to set up on a large site.
I find the lack of individual training the main reason why SA works so well for me, but not very well at my old university.

Share
twitter facebook
- Re:in related news (Score:5, Insightful)
  
  by bigberk ( 547360 ) writes: <bigberk@users.pc9.org> on Tuesday June 22, 2004 @11:42PM (#9503016)
  
  Content-based spam filtering is a waste of time. . . RBLs WORK
  
  But content-based filters can very accurately determine what is spam and what's not, and so they can feed RBLs/DNSBLs. Let real spam to real user accounts form the blocklist! One such project is WPBL.
  
  Parent Share
  twitter facebook
- Re:in related news (Score:2, Insightful)
  
  by plasm4 ( 533422 ) writes:
  
  filtering tools work fairly well, but more importantly they work right now. Waiting for the authorities to "wake from their slumber" might take years, if it ever even happens.
- Re:in related news (Score:2)
  
  by djmurdoch ( 306849 ) writes:
  
  RBLs only work against honest admins, getting them to clean up the holes in their security. Spammers aren't honest, and as you say, will just use worms to invade machines to create proxies.
  
  RBLs have been around for years, but the amount of spam Spamassassin catches on its way in to me is ever-increasing. If RBLs worked, the spam problem would have been solved years ago.
  
  On the other hand, the amount of spam getting past Spamassassin to me is pretty steady. I guess that indicates it's getting better. Mo
- Re:in related news (Score:4, Interesting)
  
  by Crudely_Indecent ( 739699 ) writes: on Wednesday June 23, 2004 @12:24AM (#9503251) Journal
  
  I can certainly see how waiting on our government will decrease the number of messages transmitted through my mail servers daily.
  
  It's reassuring to know that the "authorities" have effectively reduced the number of messages through my server by 10-14k per day......What great guys, those 'authorities', aren't they thoughtful and quick to respond. We've only been waiting for a spam-relief law for....10 years and they finally gave one to us. Oh wait....SpamAssassin is what reduced those messages.
  
  The reason we don't wait for the gov to step in and take care of business is that THEY'VE DONE NOTHING SO FAR. You expect me to believe the government will solve my spam problems? I'm not holding my breath.
  
  A combination of RBLs, DNSBLs, F-Prot, and SpamAssassin is what reduced the number of messages sent through my servers. I'm interested in results NOW, not legislation tomorrow.
  
  Parent Share
  twitter facebook
- Re:Holy Shit.... (Score:3, Interesting)
  
  by fdiskne1 ( 219834 ) writes:
  
  It's getting just plain rediculous. When I started keeping track about a year ago, the email filtering system I set up was blocking about 10,000 spams per week for just under 1500 users. Last week, it blocked over 170,000. That is an average of over 100 spams per user and the vast majority of my users don't get any at all. There are a couple dozen that get the vast majority of it. Of course, these are addresses that would be a major pain in the ass to change because of all the people that would have to be n
- Re:Okay, but what about... (Score:4, Interesting)
  
  by dasmegabyte ( 267018 ) writes: <das@OHNOWHATSTHISdasmegabyte.org> on Wednesday June 23, 2004 @12:48AM (#9503378) Homepage Journal
  
  Here's how you assuade false positives:
  
  You keep one account for people who don't know you. You spam check that one. You put that on business cards, use it to sign up for porn sites, and post it on slashdot.
  
  You keep another account for responding to email. You set that as your reply-to. You do not spam check it.
  
  This way, there is a way to reach you for customers, clients and friends that will ALWAYS work. Call it the direct line. And, there's a way for people to introduce themselves to you. Call it the "front desk." Anyhow, with SpamAssassin (which includes a bayesian filter, btw, which can be autotrained to learn spam-like language from other mail it sets up), most of the bullshit calls will be correctly tagged and most of the incoming calls will get to you. I haven't had a false positive in months. But I train the thing like Rocky Balboa.
  
  Parent Share
  twitter facebook
- Re:Why am I so Blessed? (Score:4, Funny)
  
  by lewko ( 195646 ) writes: on Wednesday June 23, 2004 @01:05AM (#9503461) Homepage
  
  How come I have an @hotmail.com email for 4+ years (pre-MSN) and I only get 15 junk mails a week?
  Because the 15 junk mails put you over quota?
  
  Parent Share
  twitter facebook
- Re:Why am I so Blessed? (Score:4, Insightful)
  
  by dasmegabyte ( 267018 ) writes: <das@OHNOWHATSTHISdasmegabyte.org> on Wednesday June 23, 2004 @01:18AM (#9503521) Homepage Journal
  
  Because you don't put it into wierd text boxes, you don't use newsgroups, you don't have any enemies, you don't have any domains, and you don't have it in plaintext on your website.
  
  I do all 4. I get my share of spam. It's not a HUGE deal, but it made it worth my while to get a spam filter.
  
  Parent Share
  twitter facebook
- Re:What d'you think spamassissin would make of thi (Score:2)
  
  by dasmegabyte ( 267018 ) writes:
  
  No time to read it, son, just email it to me.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Correct link to CRM-114 (Score:5, Informative)

Isn't Human Accuracy always 100% (Score:4, Insightful)

Re:Isn't Human Accuracy always 100% (Score:5, Insightful)

Re:Isn't Human Accuracy always 100% (Score:5, Funny)

Re:Isn't Human Accuracy always 100% (Score:4, Funny)

The Mozilla ThunderBird SPAM filter (Score:5, Interesting)

Re:The Mozilla ThunderBird SPAM filter (Score:2, Interesting)

Re:The Mozilla ThunderBird SPAM filter (Score:3, Informative)

Re:The Mozilla ThunderBird SPAM filter (Score:5, Insightful)

Re:The Mozilla ThunderBird SPAM filter (Score:4, Interesting)

Invasion (Score:2, Insightful)

Re:Invasion (Score:2)

I'm running SpamAssassin at work. (Score:5, Insightful)

Re:I'm running SpamAssassin at work. (Score:2)

Re:I'm running SpamAssassin at work. (Score:3, Insightful)

Re:I'm running SpamAssassin at work. (Score:3, Insightful)

Quit acting like goddamn babies... (Score:5, Funny)

No, REAL MEN... (Score:3, Insightful)

Re:Quit acting like goddamn babies... (Score:5, Funny)

Re:Quit acting like goddamn babies... (Score:3, Funny)

Re:Quit acting like goddamn babies... (Score:2)

Re:Quit acting like goddamn babies... (Score:2)

Re:Quit acting like goddamn babies... (Score:3, Insightful)

I didn't RTFPDF... (Score:4, Interesting)

Re:I didn't RTFPDF... (Score:2)

Re:I didn't RTFPDF... (Score:2)

I use two... (Score:2, Interesting)

No HTML, Just ps or pdf, conclusions inside (Score:5, Informative)

Mozilla Messenger / Thunderbird Performance? (Score:5, Interesting)

Re:Mozilla Messenger / Thunderbird Performance? (Score:2, Informative)

Re:Mozilla Messenger / Thunderbird Performance? (Score:2)

Re:Mozilla Messenger / Thunderbird Performance? (Score:2)

SpamBayes + Thunderbird (Score:3, Informative)

Re:Mozilla Messenger / Thunderbird Performance? (Score:3, Informative)

Re:Mozilla Messenger / Thunderbird Performance? (Score:3, Interesting)

Spamassasin is great! (Score:2, Informative)

Real way to block spam (Score:2, Interesting)

Re:Real way to block spam (Score:2, Insightful)

REAL REAL way to block spam (Score:2)

Re:Real way to block spam (Score:2)

A little advice (Score:5, Funny)

I've had CRM114 running for a few months . . . (Score:5, Informative)

Re:I've had CRM114 running for a few months . . . (Score:3, Informative)

Re:I've had CRM114 running for a few months . . . (Score:3, Interesting)

Good results with spamprobe (Score:3, Informative)

compute farms for anti-spam AI? (Score:5, Informative)

Re:compute farms for anti-spam AI? (Score:5, Informative)

Spamassassin uses collaborative spam-tracking (Score:3, Informative)

Re:Spamassassin uses collaborative spam-tracking (Score:5, Informative)

So I'm not the only one... (Score:5, Informative)

Why don't people use catch-all accounts? (Score:5, Interesting)

Re:Why don't people use catch-all accounts? (Score:2)

Re:Why don't people use catch-all accounts? (Score:5, Informative)

Re:Why don't people use catch-all accounts? (Score:2)

Re:Why don't people use catch-all accounts? (Score:3, Informative)

Re:Why don't people use catch-all accounts? (Score:2)

Re:Why don't people use catch-all accounts? (Score:3, Insightful)

Re:Why don't people use catch-all accounts? (Score:2)

Re:Why don't people use catch-all accounts? (Score:4, Insightful)

Re:Why don't people use catch-all accounts? (Score:2)

Re:Why don't people use catch-all accounts? (Score:2)

Re:Why don't people use catch-all accounts? (Score:2, Interesting)

Re:Why don't people use catch-all accounts? (Score:5, Informative)

Another data point. (Score:5, Interesting)

DSPAM (Score:5, Insightful)

Re:DSPAM (Score:4, Informative)

Re:DSPAM (Score:3, Informative)

No DSPAM (Score:2, Interesting)

Problems with Bayesian filtering (Score:5, Informative)

Re:Problems with Bayesian filtering (Score:3, Informative)

the true cause of the majority of spam... (Score:3, Interesting)

Re:the true cause of the majority of spam... (Score:2)

SpamAssassin used to work but recently... (Score:3, Interesting)

Issues with testing corpus (Score:5, Interesting)

Re:Issues with testing corpus (Score:2, Insightful)

why I don't use spam filters (Score:2, Interesting)

SpamAssassin is a dud (Score:2)

Re:SpamAssassin is a dud (Score:2)

Re:SpamAssassin is a dud (Score:2)

Bayes SHOULD be better than vanilla SpamAssassin (Score:3, Interesting)