Friday, October 23, 2009

Blogger Spam

If you remember, the other day I had a bit of a meltdown in terms of all the spam I saw piling up over here.

I only have ~30 articles here, yet I had over 300 comments which were spam, and it is quite an annoying task to go delete them one by one. Especially when a week later, I'll have to go delete them one by one yet again.

Instead of just throwing my hands up in the air, I found it was time to get insane - I went to check out Blogger's API. So looking it over, I found it's really easy to log in, and about everything else after that gets annoying.

Blogger provides a way to get a list of articles, create new articles, delete articles, and also managing their comments. But the support is kind of limited if you want to specify what kind of data you want to retrieve.

At first, I thought about analyzing each comment for spam, but I didn't want to run the risk of false positives, and figured my best bet for now is just to identify spammers. I identified 25 different spam accounts.

However, Blogger only offers deleting comments by the comment ID, and then, only one by one. The only way to retrieve the comment ID is to retrieve the comments for a particular article, which includes the comments themselves and a bunch of other data. All this data is in a rather large XML file.

It would be rather easy to delete comments if Blogger provided a function like deleteCommentsOf(userId, blogId), or getCommentIdsOf(userId, blogId), or something similar. But no, one needs 4 steps just to get an XML file which contains the comments IDs along with a lot of other unnecessary data. This has to be repeated for each article.

It seems Blogger's API is really only geared towards providing various types of news feeds of a blog, and minimal remote management to allow others to create an interface for one to interact with blogger on a basic level. Nothing Blogger provides is geared towards en masse management.

Blogger also has the nice undocumented caveat that when retrieving a list of articles for a site, it includes all draft articles not published yet, if the requester is currently logged in.

But no matter, I create APIs wrapped around network requests and parsing data for a living. So using the libraries I created and use at work for this kind of thing, and 200 lines later which includes plenty of comments and whitespace, I got an API which allows me to delete all comments from a particular user from a Blogger site. So I arm an application using my new API with the 25 users I identified, and a few minutes later, presto, they're all gone.

As of the time of this posting, there should be no spam in any of the articles here. I will have to rerun my application periodically, as well as update it with the user IDs of new spam accounts, but it shouldn't be a big deal any more.

Remember the old programming dictum: Annoyance+Laziness = Great Software. It surely beats deleting things by hand every couple of days.


Peterson Silva said...

Hmm that's one of the reasons why I use wordpress + akismet =)

DeFender1031 said...

Spam-B-Gone! It's great. (Though like i said before, you shouldn't have to do this yourself, blogger should have the spam filter tools we've come to expect from google.)

┼Żygimantas said...

I also vote for akismet!

Jeff Cagle said...

Hey, if you'd care to share your code I would be grateful! I had a Chinese spammer dump the same long comment at the end of every one of my posts. I found my way here by Googling for "Blogger delete all comments from one user."


insane coder said...

I'll probably release it after I make a couple of tweaks to make it easier to use.

But I can't release the source code because it uses libraries from work.

Eli From Brooklyn said...

It would be funny if there was spam comment left on here.

I'm just saying.

MBBS in Philippines said...

Wisdom Overseasis authorized India's Exclusive Partner of Southwestern University PHINMA, the Philippines established its strong trust in the minds of all the Indian medical aspirants and their parents. Under the excellent leadership of the founder Director Mr. Thummala Ravikanth, Wisdom meritoriously won the hearts of thousands of future doctors and was praised as the “Top Medical Career Growth Specialists" among Overseas Medical Education Consultants in India.

Southwestern University PHINMAglobally recognized university in Cebu City, the Philippines facilitating educational service from 1946. With the sole aim of serving the world by providing an accessible, affordable, and high-quality education to all the local and foreign students. SWU PHINMA is undergoing continuous changes and shaping itself as the best leader with major improvements in academics, technology, and infrastructure also in improving the quality of student life.

Easy Loan Mart said...

Spam in blogs (also called simply blog spam, comment spam, or social spam) is a form of spamdexing. (Note that blog spam also has another meaning, namely the post of a blogger who creates posts that have no added value to them in order to submit them to other sites.)
You are also read more Business Loan Calculator

QuickBooks Support said...

Your blog is great. I read a lot of interesting things from it. Thank you very much for sharing. Hope you will update more news in the future. If you want to Fix QuickBooks Error Code 163 please contact QuickBooks team for instant help.