More SMTP Issues 2

Posted by Toby Thu, 01 Nov 2007 12:53:00 GMT

This is getting really old, really fast. Now the Charlotte hotel I am staying in also proxies port 25 and fails to proxy SMTP AUTH. WTF? And they are using Symantec Mail Security, a product from the division where I used to work! Sheesh… Anyway, its really Mailstreet’s fault for not listening on 587 in the first place.

Testimony of U.S. Elections Being Rigged

Posted by Toby Sat, 14 Oct 2006 16:24:00 GMT

Clinton Curtis has got balls. I guess we shouldn’t be surprised. Without a paper trail, you can’t be sure, and it never made any sense to me that these machines wouldn’t have them. Personally, I’m appalled. I left a job at a previous employer because I thought there was improprietary going on amongst the higher-ups, but this is on another level.

MIT Spam Conference 2006 3

Posted by Toby Wed, 29 Mar 2006 18:57:00 GMT

UPDATE The papers and slides are now available for download.

Overview

Just got done with the MIT Spam Conference 2006 and let me tell you, it was much better than last year. This might have been the best one yet. Most of the talks were pretty good and there was a definitive blunting of the Bayes daisho they had been weilding in previous years. This year, the conference was held in March, so it wasn’t anywhere near as cold as it was in previous years (when it was held in early January). And the next one will be around the same time, so that’s even better!

CipherTrust sponsored the meetup the night before and it was a fun time. I talked to a bunch of people that night, including Jon Zdziarski and Matt Sergeant. It was held at the Cambridge Brewing Company, which cabbies apparently have a tough time finding, but I made it there alright. We ended up going there again for dinner the second night and they had some good food.

The conference itself was very punctual this year. In past years, there had been problems with that, but Bill ran a pretty tight ship this year. There weren’t a lot of people there at all (it was packed last year) and that’s a shame. They missed out. I had some more pictures, but the RAZR doesn’t take very good pictures in low light, so I’m only putting up the ones that came out.

Presentations

First, Tobias Eggendorfer gave a high-level overview of HTTP and SMTP tarpits. As a former TurnTide employee, I was a little disappointed that he seemed to ignore that approach, but overall it was a fairly good overview. One important point that he brought up was that SMTP tarpit effectiveness is local to the protected network whereas HTTP tarpits can be used to slow down and trap spammers as they attempt to harvest email addresses. The difference here is that one protects you (SMTP tarpits) and the other protects all in a strictly weaker capacity by hindering spammers at the start of the journey (HTTP tarpits).

Phil Raymond from Vanquish then got up and gave what amounted to a sales pitch for his new email reputation system. He tried pretty hard to coin the term “personal interrupt value” in describing how he wanted to create a “fluid market for your moment-by-moment attention”. Its not just that I’ve heard this before, but I’ve heard it from every email reputation service vendor. Goodmail, Bonded Sender, smtpRM... they all say the same exact things. How are we to differentiate? In case you don’t know, the basic idea is that senders put up something of value in bond, and then that bond is debitted should there be a conflict. The twist with Vanquish is that if you decide that you know a person or that you like the email, the sender doesn’t pay. That does neatly handle the case of friends and family not having to pay to play, but there are a number of other problems. It hit a couple of my FUSSP sensors, including requiring (or at least substantially benefitting from) a “flag day”, they didn’t have a good answer for zombies (“end users will have so little at risk, it won’t matter”), etc. Basically, I wasn’t impressed. If all these guys got together and standardized something that might move me.

There’s another important point about email reputation systems that a lot of people seem to be missing. This trial that Goodmail has going on with AOL and Yahoo!? That’s the entire industry’s big break. If Goodmail fails to deliver, then BS, smtpRM, Vanquish and any others are going to be set back years, as well. I understand their willingness to get in the game before its truly established, but they’re a niche within a niche. A big setback at either AOL or Yahoo! will not incite other ISPs or enterprises to knock down any of their doors.

As an aside, I have a feeling that no one from AOL or Yahoo! showed up this year in large part because they didn’t want the shitstorm from the audience about their decision to go with Goodmail.

The next talk was very cool: the guys from BitDefender talked about using adaptive neural networks for email classification. Specifically, filtering spam, but its uses are more general than that. Basically, they employed the Adaptive Resonance Theory to a hierarchy of these neural networks and got some pretty promising results. The big thing here is that you don’t need to retrain it with the entire corpus; it can learn new heuristics without forgetting the old ones. The heuristics still have to be created by some other means, of course, but this was a pretty interesting technique, nonetheless.

Next, Giovanni Donelli from the University of Bologna talked about a technique he called “Email Interferometry”. The idea boils down to monitoring a set of related accounts (called an “e-pool”) looking for the same spam messages to come into multiple accounts in the pool. He posited that it might not work well on large scales and did not itself indicate a filtering/classification technology. Didn’t seem too promising.

Bill Yerazunis talked about sorting spam with k-nearest neighbor and his new Hyperspace classifier in CRM114. I missed the first half of this talk, but I think he used kNN to attempt to match or exceed the quality of Markov chaining. Since I missed the meat of this one, I’m not going to comment further, but you can try it today by telling crm114 to use “hyperspace”.

Now, as far as the paper with the biggest potential for impact, I’m going to say it was Kang Li et al’s Towards a Ham Archive. Anyone who works on anti-spam software knows that we can get spam any time we want, in any quantity. The problem is getting a source of quality ham. There is the SA ham corpus and the TREC corpus, but not much else. This is a problem because without a large, quality source of ham all of our effectiveness statistics are eternally suspect. Li has thought of a method that might work for creating a large public corpus of ham without exposing the actual message data. Simply hash bigrams of the message in a sliding window and insert the digest values into a vector. The vector is then the quantity which is published in a public archive. The cool thing is, statistical filters already work on “tokens”, which are currently some number of words from the message. The digests in the digest vector could easily be used in the same capacity. But, since the messages are being digested in bigrams by a sliding window, the original message cannot be reproduced, so users can have confidence in releasing their ham to a public corpus. It wasn’t clear how large of a tradeoff there would be between protecting the privacy of the messages’ authors and effectiveness of the filters trained using digest vectors, but I think its definitely a well needed advance for a tough problem.

Reflexion then talked about their Supplemental Address Management System, which as far as I can tell, differs little from TitanKey other than the fact that it doesn’t employ challenge-response. They have some theory about how their system is better then disposable email addresses, but frankly, I couldn’t see a qualitative difference.

Mr. Palla then talked about how to detect phishes in email. I had some high hopes for this talk, because I talked to him and his wife for at least an hour the night before, but he blazed through those slides way too fast. The slides themselves were also far too dense for the time allotted. From what I gather, he was analysing the headers for rDNS information and also checking the recipient’s sent folder for matching addresses. Andrew from MessageLabs commented that they were getting better effectiveness from rDNS inspection alone than what Palla reported in his talk. Oh well.

Here’s the biggest travesty of the day: we came back just in time for Jon Zdziarski’s last slide for his talk about probabilistic digital fingerprinting techniques as applied to phishing detection. That sucked. I really wanted to see this one because his talk from last year was the best by far. Still, I talked to him the night before about it and he told me that it was basically building fingerprints of the pages that are linked to in email messages. The fingerprints were then correllated to find pages that have a large number of fingrprints in common, so that they can distinguish which of those uncommon fingerprints would be replicated across multiple emails. This would then indicate a set of fingerprints for an author of a phish. I may be messing up the details somewhat, so I will redo this section after I read the paper.

Fidelis Assis then phoned in a talk about “Exponential Differential Document Count” from Brazil. It was somewhat hard to understand over the phone over the loudspeaker, but the EDDC technique attempts to replicate what humans do when reading mail by picking out strong features and lessening the importance of ones that occur about equally in both ham and spam. I wrote down that I should get the paper, but it appears to increase effectiveness in CRM114. In the meantime, you can check out the code here or here.

Aaron Kornblum then talked about how Microsoft’s team let a PC get infected with a zombie and checked to see what it did. They didn’t let any email out from it, but it got a huge number of connection attempts and tried to send a ton of email. They then used that info to file suits against the zombie controllers. Cool stuff.

Jon Praed kept up his streak by talking at this one, this time about CAN-SPAM and some problems it has. Spammers are doing what he calls “microbranding”, which is keeping a low enough profile to appear small while still getting enough volume to be profitable. This entails started a bunch of shell companies and not spamming the biggest ISPs. Spammers are also fleeing offshore, but the fact that they are US citizens poses both problems and solutions for the authorities. Jon then indicated that the costs of CAN-SPAM are not known, but that it was basically really good for ensuring the legit mailers comply but not having much of an effect on spammers. He posed an interesting alternative to CAN-SPAM modeled after 18 USC 2257, which is the regulation that says all adult performers need to have their age and info recorded by someone and available upon request (to prevent another Traci Lords). Good talk, as usual.

Keynotes

Eric Allman, creator of Sendmail, gave the first keynote. He was advocating using Sender Domain Authentication (i.e. DKIM). Mostly it was an overview, but he indicated that there were definitely some rich research topics to explore in this area and that a lot of work was left to do to work out the IETF standards. Benefits listed were making whitelists more reliable, displaying auth results to user, etc. He concluded that it was a valuable tool for the anti-spam toolkit and that authentication was required to achieve a full ID suite for Internet communications.

Barry Shein, President and CEO of The World, gave the second keynote. It was pretty funny, if a little disconnected. He talked about all of this stuff.

Overall, a very good conference and I was glad I decided to go again this year, especially after last year’s debacle. I plan on attending again next year.

Thoughts on GDrive 1

Posted by Toby Sat, 04 Mar 2006 15:51:00 GMT

This article really shows where Google’s head is at lately. They are making the classic information lock-up play, the same one IBM made with the 360 mainframes and the same one Microsoft made with Exchange, except on a much grander scale.

Imagine the power they would wield with a large percentage of even just US customer’s data on their servers; to control, to query, to analyse, to “adjust”, to manipulate in any way they see fit, because there’s no visibility into the maintenance on the customer side. We’re all better for the power of Google search, but the data used to power that search is (ostensibly) public. When will they decide that its worth it to start charging you a fee not to index your personal data along with website’s data for public search consumption? What happens when Sergey and Larry leave (which they will) and a guy like Steve Ballmer or even worse, John Poindexter, takes over? Anybody who thinks that kind of TIA-esque regime is a good idea should start thinking hard about this and also go back and give 1984 another go.

I recently interviewed with Google, and the guy wasn’t even allowed to tell me what the replication count was for chunks of mail data in the Gmail GFS cluster(s). Imagine what they could do with all of your data behind your back. Frankly, I find this move both scary and empowering at the same time. I’d love to be able to get at all my data from anywhere with just a browser, but I’d really like to be sure that Google’s not serving it up to the US government or scanning through it for IP to steal, too.

Call me paranoid, but there are some real questions here to answer before GDrive can go live, in my opinion. A lot of people complained about privacy concerns when Gmail came out but GDrive is on another level. I just don’t think I can trust any one entity with this kind of trump card over me. Think of it this way: if Microsoft made this play, would you trust them with all your data? Now think about the fact that Google will eventually be viewed as Microsoft is today.

They only need a finger...

Posted by Toby Wed, 20 Apr 2005 20:47:00 GMT

I just saw this post about voice print activation on credit cards linked from Slashdot. There are issues with biometric security with which it seems that most people are unaware. This kind of thing may decrease crime overall, but I believe it will actually result in an increase in violent crimes related to the subversion of biometric security techniques.

The would-be thief now needs some biometric in order to complete the transaction and this will increase the likelihood of thieves using violence to acquire the victim’s cooperation. This is already being seen in the case where a man’s finger was cut off in order to drive his new Mercedes without him around. The car had been factory-fitted with fingerprint authentication, thus making it necessary to obtain his physical cooperation to steal the vehicle. Personally, I’d rather just give a carjacker my keys and call the cops after they’re gone.

I’m pretty sure this is impractical at the moment, but something interesting to consider is this: perhaps they should not attempt to make the voice-print analysis very good (or perhaps, rather, extremely good). This would have the effect of making the voice-print match fail if the voice of the cardholder is too emotionally strained to fall within acceptable parameters (or the software could attempt to make the determination itself, if trained properly). This could serve to deter the use of violence to coerce cooperation in defeating biometrics because if the attacker got the victim too upset, the card would be inaccessible even with physical control of the victim. Even if possible, this would probably create too much customer confusion and false negatives to be implementable, but something along these lines would go a long way to detering this burgeoning avenue for violence.

Two-factor authentication, if done correctly, is a good thing, and I also love the fact that some companies are willing to try new things in the area of security, typically considered an expense rather than an opportunity to gain consumer trust. I would just like to see some more thought on how to keep the consumer safe, as well as the consumer’s money.

Not more secure

Posted by Toby Thu, 24 Mar 2005 11:11:00 GMT

This morning, Microsoft sockpuppet Dave Massy posted a reply to an article with Mitchell Baker regarding Mozilla’s security relative to IE. I will reply to Dave’s egregious misuse of communication inline, with the quoted material in italics.

That’s an argument we can spend a great deal of time on and still not prove one way or the other.

This is proven over and over again all the time. Count up the number of security holes that were/are in IE versus the number in Mozilla, weight them by severity and IE is a clear loser by any metric.

Now I’m pretty confident that Mitchell doesn’t actually know the details of how IE is developed so I don’t fully understand the basis of the statement. As we develop IE we go through very thorough and stringent security reviews to ensure that every change is secure and does not expose the user to attack.

Who is this supposed to impress? The fact that Mozilla’s process is open to world review is a tremendous advantage over the closed-source development model of IE. As well, since it can be reviewed by anyone, security holes can be fixed much faster than that of IE’s, some of which are years old and will never be fixed, by Microsoft’s own admission. This would never stand in the OSS community. Also, on that same note, how can we be sure that the process is “thorough and stringent”? It might just be a guy in a room going to Yahoo! and if that works, testing is done… we don’t know. Finally, there cannot possibly be as many people reviewing IE’s code as there are Mozilla’s and therefore security bugs will be found quicker and fixed quicker than IE. Unless, of course, you open the source to IE…

The security of any browser is irrelevant to if it is part of the operating system.

What? In fact that’s extremely relevant. If Mozilla has a buffer overflow, it can’t affect the video driver or USB subsystem, for example. This is because its a normal process, confined to its own address space and nothing more. If IE, on the other hand, has some kind of overflow or corruption issue, the potential exists for a massive stability or security problem because some IE code runs in a IA32 CPU ring other than 3. Exactly how is that not pertinent to a discussion about IE’s relative security?

If we are to debate security of browsers then let’s bring in relevant arguments…

Yes, let’s. That includes you, as well.

...and accurate details about different possible attacks rather than rely on the irrational fear that because IE is part of the operating system it must be exposing OS functionality to the web. This is not the case as any software has access to the same set of OS APIs and can therefore expose the same set of OS functionality as IE.

So, let me get this straight: your argument is, since you expose kernel functionality for browsers for everyone, but you’re the only ones stupid enough to use them, somehow that’s better? Exactly how does the above statement make any sense??

Dave, please, before you post next time, make sure you’ve read what you’ve written. Other people will if you don’t and you’ll get more of this type of response.

They get paid for this?

Posted by Toby Sun, 27 Feb 2005 17:32:00 GMT

Recently, Microsoft Research published a “paper” regarding the detection of hidden files on a Windows system, ostensibly for the purpose of detecting rootkits, Trojans, keyloggers and some forms of spyware. If you’re interested, the document is here (Office required, obviously). I can sum it up for you completely in one sentence: do a full directory listing, reboot off of a clean OS CD and do another listing and then compare the two listings, looking for files that didn’t show up in the first one. This completely obvious idea takes up 5 full double-spaced pages and is replete with popular movie references; the idea that one would need an analogy to understand such a blindingly obvious idea is actually insulting. It looks like the papers I used to write the night before they were due back in college. At any rate, I have a few thoughts about this paper.

The first is that I am glad that Microsoft is spending some effort in the anti-spyware realm. This is a much bigger problem than the average computer user realizes and even bigger than some IT personnel care to admit (to their customers, at least). I expect to see much more research to come out of Microsoft in the coming years with regard to spyware detection, prevention and mitigation, especially in light of their newly acquired Microsoft AntiSpyware product.

Having said this, I will reiterate that this “paper” describes an idea that, while currently effective, is so blatantly obvious that I can’t believe that they took the time to write it down, let alone take 14 screenshots of WinDiff in what was clearly an effort to legitimize the work. As well, they didn’t even take the time to seriously investigate how a piece of malware could circumvent this technique and what could be done about that (I have already thought of a way to circumvent this that I’m pretty sure they haven’t thought of).

The real issue with this paper is not that they wrote it, but that I believe that these Microsoft researchers will now attempt to patent it. It is no secret that the state of patents in the United States with respect to software has been seriously damaged for some time and this serves to bolster the frequency of frivolous patent application submissions. Personally, I would like to see Microsoft attempt to come up with something original in the anti-spyware space rather than waste my tax dollars in the pursuit of undeservedly extending their monopoly position.