Showing posts with label Blogger. Show all posts
Showing posts with label Blogger. Show all posts

Friday, October 23, 2009


Blogger Spam



If you remember, the other day I had a bit of a meltdown in terms of all the spam I saw piling up over here.

I only have ~30 articles here, yet I had over 300 comments which were spam, and it is quite an annoying task to go delete them one by one. Especially when a week later, I'll have to go delete them one by one yet again.

Instead of just throwing my hands up in the air, I found it was time to get insane - I went to check out Blogger's API. So looking it over, I found it's really easy to log in, and about everything else after that gets annoying.

Blogger provides a way to get a list of articles, create new articles, delete articles, and also managing their comments. But the support is kind of limited if you want to specify what kind of data you want to retrieve.

At first, I thought about analyzing each comment for spam, but I didn't want to run the risk of false positives, and figured my best bet for now is just to identify spammers. I identified 25 different spam accounts.

However, Blogger only offers deleting comments by the comment ID, and then, only one by one. The only way to retrieve the comment ID is to retrieve the comments for a particular article, which includes the comments themselves and a bunch of other data. All this data is in a rather large XML file.

It would be rather easy to delete comments if Blogger provided a function like deleteCommentsOf(userId, blogId), or getCommentIdsOf(userId, blogId), or something similar. But no, one needs 4 steps just to get an XML file which contains the comments IDs along with a lot of other unnecessary data. This has to be repeated for each article.

It seems Blogger's API is really only geared towards providing various types of news feeds of a blog, and minimal remote management to allow others to create an interface for one to interact with blogger on a basic level. Nothing Blogger provides is geared towards en masse management.

Blogger also has the nice undocumented caveat that when retrieving a list of articles for a site, it includes all draft articles not published yet, if the requester is currently logged in.

But no matter, I create APIs wrapped around network requests and parsing data for a living. So using the libraries I created and use at work for this kind of thing, and 200 lines later which includes plenty of comments and whitespace, I got an API which allows me to delete all comments from a particular user from a Blogger site. So I arm an application using my new API with the 25 users I identified, and a few minutes later, presto, they're all gone.

As of the time of this posting, there should be no spam in any of the articles here. I will have to rerun my application periodically, as well as update it with the user IDs of new spam accounts, but it shouldn't be a big deal any more.

Remember the old programming dictum: Annoyance+Laziness = Great Software. It surely beats deleting things by hand every couple of days.

Monday, October 19, 2009


Why online services suck



Does anyone other than me think online services suck?

The thing that annoys me the most is language settings. Online service designers one day had this great idea to check the geographical IP address the user visited their site from, and use it to automatically set the language to the native one for the country they visited from. While this sounds nice in theory, most people only know their mother tongue, and also go on vacation now and then, or visit some other country for business purposes.

So here I am, on business in a foreign country, and I connect my laptop into the Ethernet jack in my hotel room which comes with free Internet access, so I can check my e-mail. What's the first thing I notice? The entire interface is no longer in English. Even worse is that the various menu items and buttons are moved around in this other language.

Even Google, known for being ahead of the curve when it comes to web services can't help but make the same mistakes. I'm sitting here looking at the menu on top of Blogger, wondering which one is login.

For Google this is a worse offense compared to other service providers, as I already was logged into their main site.

Google keeps their cookies set for all eternity (well, until the next time rollover disaster), and they know I always used Google in English. Now it sees me connecting from a different country than usual and thinks I want my language settings switched? Even after I set it to English on their main page, I have to figure out how to set it to English again on Blogger and YouTube?

What's really sad about all this is that every web browser sends each website as part of its request a "user agent", which tells the web server the name of the browser, a version number, operating system details, and language information. My browser is currently sending: "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)". Notice the en-US? That tells the site the browser is in English, for the United States. If I downloaded a different version of Firefox, or installed a language package and switched Firefox to a different language, it would tell the web server that I did so. If one uses Windows in another language, Internet Explorer will also tell the web server the language Windows/Internet Explorer is in.

Why are these service providers ignoring browser information, and instead solely looking at geographical information? People travel all the times these days. Let us also not forget those in restrictive countries who use foreign proxy servers to access the internet.


However, common issues, such as annoying language support is hardly the end of the problems. In terms of online communication, virtually all of them suffer from variations of spam. Again, where is Google here? Every time I go read comments on Blogger, I see nothing but spam posts. Even when I go to cleanup my own site, the spam just fills up again a few days later.

Where's the flag as spam button? Where's the flag this user as solely a spammer button?

Sure Google as a site manager lets me block all comments on my site till I personally review them to see if they're spam, but in today's need for hi-speed communication is that really an option when you may have a hot topic on hand? Why can't readers flag posts on their own?

In terms of management, why doesn't Blogger's site management features include a list where I can check off posts and hit one mass delete, instead of having to click delete and "Yes I'm sure" on each and every spam post? Why can't I delete all posts from user X and ban that user from ever posting on my site again?


Okay, maybe this isn't so much an article why online services suck, but more about language and spam complaints, and mostly at Google for the moment. Jet-lag, and getting your E-mail interface in Gibberish does wonders for a friendly post. I'll try to come up something better for my next article.