Friday, October 23, 2009

Blogger Spam

If you remember, the other day I had a bit of a meltdown in terms of all the spam I saw piling up over here.

I only have ~30 articles here, yet I had over 300 comments which were spam, and it is quite an annoying task to go delete them one by one. Especially when a week later, I'll have to go delete them one by one yet again.

Instead of just throwing my hands up in the air, I found it was time to get insane - I went to check out Blogger's API. So looking it over, I found it's really easy to log in, and about everything else after that gets annoying.

Blogger provides a way to get a list of articles, create new articles, delete articles, and also managing their comments. But the support is kind of limited if you want to specify what kind of data you want to retrieve.

At first, I thought about analyzing each comment for spam, but I didn't want to run the risk of false positives, and figured my best bet for now is just to identify spammers. I identified 25 different spam accounts.

However, Blogger only offers deleting comments by the comment ID, and then, only one by one. The only way to retrieve the comment ID is to retrieve the comments for a particular article, which includes the comments themselves and a bunch of other data. All this data is in a rather large XML file.

It would be rather easy to delete comments if Blogger provided a function like deleteCommentsOf(userId, blogId), or getCommentIdsOf(userId, blogId), or something similar. But no, one needs 4 steps just to get an XML file which contains the comments IDs along with a lot of other unnecessary data. This has to be repeated for each article.

It seems Blogger's API is really only geared towards providing various types of news feeds of a blog, and minimal remote management to allow others to create an interface for one to interact with blogger on a basic level. Nothing Blogger provides is geared towards en masse management.

Blogger also has the nice undocumented caveat that when retrieving a list of articles for a site, it includes all draft articles not published yet, if the requester is currently logged in.

But no matter, I create APIs wrapped around network requests and parsing data for a living. So using the libraries I created and use at work for this kind of thing, and 200 lines later which includes plenty of comments and whitespace, I got an API which allows me to delete all comments from a particular user from a Blogger site. So I arm an application using my new API with the 25 users I identified, and a few minutes later, presto, they're all gone.

As of the time of this posting, there should be no spam in any of the articles here. I will have to rerun my application periodically, as well as update it with the user IDs of new spam accounts, but it shouldn't be a big deal any more.

Remember the old programming dictum: Annoyance+Laziness = Great Software. It surely beats deleting things by hand every couple of days.


Peterson EspaƧoporto said...

Hmm that's one of the reasons why I use wordpress + akismet =)

Dan said...

Spam-B-Gone! It's great. (Though like i said before, you shouldn't have to do this yourself, blogger should have the spam filter tools we've come to expect from google.)

Žygimantas said...

I also vote for akismet!

Jeff Cagle said...

Hey, if you'd care to share your code I would be grateful! I had a Chinese spammer dump the same long comment at the end of every one of my posts. I found my way here by Googling for "Blogger delete all comments from one user."


insane coder said...

I'll probably release it after I make a couple of tweaks to make it easier to use.

But I can't release the source code because it uses libraries from work.

Eli From Brooklyn said...

It would be funny if there was spam comment left on here.

I'm just saying.