Insane Coding: Google

Showing posts with label Google. Show all posts

Saturday, February 15, 2014

HTTP 308 Incompetence Expected

Posted by insane coder at Saturday, February 15, 2014

Internet History

The Internet from every angle has always been a house of cards held together with defective duct tape. It's a miracle that anything works at all. Those who understand a lot of the technology involved generally hate it, but at the same time are astounded that for end users, things seem to usually work rather well.

Today I'm going to point out some proposed changes being made to HTTP, the standard which the World Wide Web runs on. We'll see how not even the people behind the standards really know what they're doing anymore.

The World Wide Web began in the early 90s in a state of flux. The Internet Engineering Task Force, as well as major players like Netscape released a bunch of quasi-standards to quickly build up a set of design rules and techniques used until HTTP v1.0 came out in 1995. Almost immediately after, HTTP v1.1 was being worked on, and despite not being standardized until 1999, it was pretty well supported in 1996. This is around the same time Internet Explorer started development, and a lot of their initial work was basically duplicating functionality and mechanics from Netscape Navigator.

Despite standards and sane way of doing things, implementers always deviate from them, or come up with incorrect alternatives. Misunderstandings, and ideas on how things should work is how things were shaped in the early days.

Thankfully though, over the years, standards online are finally being more strictly adhered to, and bugs are being fixed. Exact precise specifications exist for many things, as well as unit-tests to ensure adherence to standards. Things like Internet Explorer 6 are now a distant memory for most (unless you're in China).

Existing Practice

A key point which led to many standards coming into existence was existing practice. Some browser or server would invent something, and the others would jump on board, and a standard would be created. Those who deviated were told to fix their implementation to match either the majority, or what was correct and would cause the least amount of issues for the long term stability of the World Wide Web.

Now we'll see how today's engineers want to throw existing practice out the window, loosen up standards to the point of meaninglessness, and basically bust the technology you're currently using to view this article.

HTTP Responses

One of the central designed structures of HTTP is that every response from a server has a code which identifies what the result is, and servers and clients should understand how to work with the particular responses. The more precise the definition, the better online experience we'll all have.

HTTP v0.9 was in a constant state of fluctuation, but offered three basic kinds of page redirects, permanent, temporary, and one which wasn't fully specified and unclear. These were defined as status codes 301, 302, and 303 respectively:

Moved 301: The data requested has been assigned a new URI, the change is permanent.
Found 302: The data requested actually resides under a different URL, however, the redirection may be altered on occasion.
Method 303: Note: This status code is to be specified in more detail. For the moment it is for discussion only.
Like the found response, this suggests that the client go try another network address. In this case, a different method may be used.

The explanation behind a permanent and temporary redirect seems pretty straight forward. 303 is less clear, although it's the only one which mentions the method used is allowed to change, it's even the name associated with the response code.

Several HTTP methods exist, for different kinds of activities. GET is a method to say, hey, I want a page. POST is a method to say, hey here's some data from me, like my name and my credit card number, go do something with it.

The idea with the different redirects essentially was that 303 should embody your requested was processed, please move on (hence a POST request should now become a GET request), whereas 301 and 302 were to say what you need to do is elsewhere (permanently or temporarily), please take your business there (POST should remain POST).

In any case, the text here was not as clear as can be, and developers were doing all kinds of things in general. HTTP v1.0 came out to set the record straight.

301 Moved Permanently

   The requested resource has been assigned a new permanent URL and
   any future references to this resource should be done using that
   URL. Clients with link editing capabilities should automatically
   relink references to the Request-URI to the new reference returned
   by the server, where possible.

       Note: When automatically redirecting a POST request after
       receiving a 301 status code, some existing user agents will
       erroneously change it into a GET request.

302 Moved Temporarily

   The requested resource resides temporarily under a different URL.
   Since the redirection may be altered on occasion, the client should
   continue to use the Request-URI for future requests.

       Note: When automatically redirecting a POST request after
       receiving a 302 status code, some existing user agents will
       erroneously change it into a GET request.

HTTP v1.0 however did not define 303 at all. Some developers not understanding what a temporary redirect is supposed to be thought it meant hey, this is processed, now move on, however if you need something similar in the future, come here again. We can hardly blame developers at that point for misusing 302, and wanting 303 semantics.

HTTP v1.1 decided to rectify this problem once and for all. 302 was renamed to Found and a new note was added:

      Note: RFC 1945 and RFC 2068 specify that the client is not allowed
      to change the method on the redirected request.  However, most
      existing user agent implementations treat 302 as if it were a 303
      response, performing a GET on the Location field-value regardless
      of the original request method. The status codes 303 and 307 have
      been added for servers that wish to make unambiguously clear which
      kind of reaction is expected of the client.

Since 302 was being used in two different ways, two new codes were created, one for each technique, to ensure proper use in the future. 302 retained its definition, but with so many incorrect implementations out there, 302 should essentially never be used if you want to ensure correct semantics are followed, instead use 303 - See Other (processing, move on...), or 307 Temporary Redirect (The real version of 302).

In all my experience working with HTTP over the past decade, I've found 301, 303, and 307 to be implemented and used correctly as defined in HTTP v1.1, with 302 still being used incorrectly as 303 (instead of 307 semantics), generally by PHP programmers. But as above, never use 302, as who knows what the browser will do with it.

Since existing practice today is that 301, 303, and 307 are used correctly pretty much everywhere, if someone misuses it, they should be told to correct their usage or handling. 302 is still so misused till this day, it's a lost cause.

HTTP2 Responses

Now, in their infinite wisdom, the new HTTP2 team has decided to create problems. 301 status definition now brilliantly includes the following:

      Note: For historical reasons, a user agent MAY change the request
      method from POST to GET for the subsequent request.  If this
      behavior is undesired, the 307 (Temporary Redirect) status code
      can be used instead.

Let me get this straight, you're now taking a situation which hasn't been a problem for over a decade now, and asking it to begin happening anew by now allowing 301 to act as a 303???

If you don't think that paragraph above was problematic, wait till you see this one:

 +-------------------------------------------+-----------+-----------+
 |                                           | Permanent | Temporary |
 +-------------------------------------------+-----------+-----------+
 | Allows changing the request method from   | 301       | 302       |
 | POST to GET                               |           |           |
 | Does not allow changing the request       | -         | 307       |
 | method from POST to GET                   |           |           |
 +-------------------------------------------+-----------+-----------+

301 is allowed to change the request method? Excuse me, I have to go vomit.

It was clear in the past that 301 was not allowed to change its method. But now, I don't even understand what this 301 is supposed to mean anymore. So I should permanently be using the new URI for GET requests. Where do my POSTs go? Are they processed? What the heck am I looking at?

To add insult to injury, they're adding the new 308 Permanent Redirect as the I really really mean I want true 301 semantics this time. So now you can use a new status code which older browsers won't know what to do with, or the old status code that you're now allowing new browsers to utterly butcher for reasons I cannot fathom.

Here's how the status codes work with HTTP 1.1:

+------+-------------------------------------+-----------+-----------------+
| Code | Meaning                             | Duration  | Method Change   |
+------+-------------------------------------+-----------+-----------------+
| 301  | Permanent Redirect.                 | Permanent | No              |
| 302  | Temporary Redirect, misused often.  | Temporary | Only by mistake |
| 303  | Process and move on.                | Temporary | Yes             |
| 307  | The true 302!                       | Temporary | No              |
| 308  | Resume Incomplete, see below.       | Temporary | No              |
+------+-------------------------------------+-----------+-----------------+

So here's how the status codes will work now with the HTTP2 updates:

+------+------------------------------+-----------+---------------+
| Code | Meaning                      | Duration  | Method Change |
+------+------------------------------+-----------+---------------+
| 301  | Who the heck knows.          | Permanent | Surprise Me   |
| 302  | Who the heck knows.          | Temporary | Surprise Me   |
| 303  | Process and move on.         | Temporary | Yes           |
| 307  | The true 302!                | Temporary | No            |
| 308  | The true 301!                | Permanent | No            |
+------+------------------------------+-----------+---------------+

And here's how one will have to do a permanent redirect in the future:

 +------+----------------+----------------+
 | Code | Older Browsers | Newer Browsers |
 +------+----------------+----------------+
 | 301  | Correct.       | Who Knows?     |
 | 308  | Broken!!!      | Correct.       |
 +------+----------------+----------------+

This is how they want to alter things. Does this seem like a sane design to you?

If the new design decisions of the HTTP2 team is to now capitulate to rare mistakes made out there, what's to stop here? I can see some newbie developers reading about how 307 and 308 are for redirects, misunderstanding them, and then misusing them too. So in five years we'll have 309 and 310 as we really really really mean it this time? This approach the HTTP2 team is taking is absurd. If you're going to invent new status codes each time you find an isolated instance of someone misusing one, where does it end?

HTTP 308 is already taken!

One last point. Remember how earlier, I mentioned how a key point for the design of the Internet is to work with existing practice? 308 is in fact already used by something else, Resume Incomplete for resumable uploading. Which is used by Google, king of the Internet, and many others.

Conclusion

I'm now dubbing HTTP 308 as Incompetence Expected, as that's clearly the only meaning it has. Or maybe that should be the official name for HTTP2 and the team behind it, I'll let you decide.

Edit:
Thanks to those who read this article and sent in images. I added them where appropriate.

Sunday, October 16, 2011

Goodbye Google

Posted by insane coder at Sunday, October 16, 2011

Google announced they're shutting down Code Search.

I found Google Code Search to be invaluable in the kind of work I do. I saved tons of time by being able to find existing code for something tricky already invented. Or I could compare multiple implementations for things to learn about what different techniques there are for various operations and learn immensely from their pros and cons. If you're reimplementing something yourself, it's also nice to be easily able to find tests cases and other things with Google Code Search.

Now all that is going away. Are there any feasible alternatives? Do we need to start a competing search engine? What are programmers to do when Google cuts the number one online tool for researching code?

Saturday, April 9, 2011

The failure of fragmented security

Posted by insane coder at Saturday, April 09, 2011

With recent attacks against SSL/TLS and certificates, everyone has been thinking a lot about security. What can we do to prevent security problems in the future?

The problem really stems from the fact that our different security components are separate from one another, and don't entirely see eye to eye, leaving gaps for attackers to walk right on through. The current certificate system for certifying the identity of a website is flawed in theory, and in its implementation in many browsers.

The current system works as follows: An entity submits proof of ownership of the domain(s) it owns to one of hundreds of certificate authorities out there, who follow some kind of verification process, and then proceed to give a certificate identifying the site to that entity. This certificate is digitally signed by the certificate authority itself using their private unknown keys. Since no one but the certificate authority itself has their private keys, they're the only ones able to sign certificates in their own name. Browsers ship with a certificate bundle identifying the certificate authorities they trust. In this way, when you see a site with a certificate signed by a known certificate authority, you know it's the site you intended to visit.

Except there's some flaws with this idea. If terrorists wanted to, they could attack a certificate authority's physical headquarters and steal their private keys from their server and sign whatever they want for whichever domain they wish. Or, hackers could hack into machines remotely and perhaps get lucky and find some private keys on them. Or, anyone could start their own certificate authority. It really isn't that hard. Once your new authority becomes trusted by the various browsers, you can proceed to generate certificates for any domain desired.

This entire system has multiple points of failure. Further compounding the issue is that several "trusted" certificate authorities also are in themselves ISPs or run various links in the vast internet. Having both components in your control allow you to impersonate any site for any information passing through your systems. America Online for example is both a trusted certificate authority and an ISP, and anyone who works there and has access to their infrastructure and private keys can view all HTTPS encrypted data passing through their network as unencrypted. Want to buy something with your credit card online? You might want to traceroute your connection first and ensure no one along the way is also a certificate authority your browser trusts.

In order to mitigate a certificate authority signing something it shouldn't have, they invented Certificate Revocation Lists. Where an authority can revoke specific certificates it once signed, since every certificate also has an ID number associated with it. But, some browsers don't even bother checking these lists. Further, some browsers which make use of CRLs and their friends, resume as if nothing happened if they couldn't access a CRL for some reason. Further, these CRLs are subject to the same security problems just described for domains in general. How do I know this is indeed the real CRL? Also, browsers themselves don't have CRLs for the root certificates they ship with, so they are unable to revoke a certificate of a rogue CA if they need to.

But in reality, this entire system is flawed from the ground up. It's so flawed, it doesn't even make the slightest bit of sense. Imagine the following scenario where my boss asks me to inform him of all purchasing details for our web presence needs, and explain why they're needed.

<Me> Okay, we're going to need $35 a year to register all the domains we want for our company, such as company.com and company.net and so on.
<Him> Sure, that's fine, what else?
<Me> Then we're going to need $200 a year for each domain for certificates.
<Him> Why do we need these certificates?
<Me> To prove that we own the domain in question.
<Him> Prove it? Why?
<Me> Browsers like Internet Explorer and Firefox won't realize when they visit our domain that its really our domain, and not some hacker out there trying to impersonate us.
<Him> So if we don't buy these certificates, hackers will be able to get the domain names registered as their own instead of ours?
<Me> No, the domain names are protected by a central authority, they know that we own them, and we tell them to point the domains at our servers, but hackers in between our customer's browser and our server can hijack the connection and make believe they're us without a certificate.
<Him> I don't get it, why can't our customer's browser just check the domain registry and make sure the server they reach is the one we told the domain registry about? Why do we need to buy something from a 3rd party?

This seems strange to you? He's absolutely right. Why can't the hierarchy for domain management also distribute the public keys for our servers? The systems will need to be modified to combine several components and have encryption at each level, but does anything else make an ounce of sense?

Imagine you wanted to buy some property. You have your lawyer, accountant, realtor, and other people directly related to the purchase. After everything is taken care of, and you submit forms to city hall and everything else, you then go down to Joe's House of Fine Refrigerators and have him give you a signed deed that you indeed own the property in question. Makes a lot of sense, right?

Now you call a construction company down to work on your new property, say to merge it with the property next door to it. They want proof you own both properties before beginning. What do you do? You pull out your deed from Joe's House of Fine Refrigerators.

This is the exact state of internet security today. This problem is even pervasive down to every level of infrastructure we use.

Take cookies for example, the system it uses to match domain names runs completely counter to how the domain name system works. It's actually impossible for any browser to properly know for every set of domains in existence whether they're paired or not when it comes to handling cookies for them. It will either fail to submit cookies to some sites that it should, or submit cookies to some sites it shouldn't. Some browsers try to solve this problem with a massive hack, a list of domains that cookies should know are or aren't paired together, which is also incomplete, and needs never ending updates. Without the list, the only difference is that the browser is just wrong more often than without it.

Really, if the hackers were out there, we'd be in big trouble.

Saturday, October 30, 2010

This just in, 20% of enterprises and most IT people are idiots

Posted by insane coder at Saturday, October 30, 2010

So, who still runs Internet Explorer 6? I do, because sometimes, I'm a web developer.
Along with IE 6, I also run later versions of IE, as well as Firefox, Opera, Chrome, Safari, Arora, Maxthon, and Konqueror. So do my web developer friends and coworkers.

The reason why we do is simple. We want to test our products with every browser with any sort of popularity. Or is a browser that comes with some sort of OS or environment with any sort of popularity. Same goes for browsers with a specific engine.

By playing with so many browsers, we get a feel for things which seem to not be well known (even when documented well), or completely missed by the "pros" who write the most noise on the subject at hand. After work, sometimes my coworkers and I like to get together and joke about how Google security researchers put out a security memo on IE, citing no recourse, while the solution is clearly documented on MSDN, or any similar scenario.

Perhaps we're bad guys for not writing lengthy articles on every subject, and keeping knowledge to ourselves. On the other hand, we get to laugh at the general population on a regular basis. Something which I'm sure every geek at heart revels in.

Here's a small sampling of popular beliefs that we snicker at:

Internet Explorer 6 can't properly display transparent PNGs without resorting to fancy CSS+JS hacks.

JavaScript event handling code can't receive the event handle with IE.

Menus are required to be written in JavaScript to work with all browsers.

IE is unable to receive multiple cookies in a single Set-Cookie field.

Cookies is the (only) proper way to store HTTP state.

IE is not the most advanced browser when it comes to text and typography (because it is, by far).

SSL/TLS can never be used with multiple virtual hosts.

Common examples of how to write proper cross browser code is accurate. (Such as common methods to support embedded fonts for IE and other browsers break Konqueror, when supporting all of them is a piece of cake. Or use embed tags for flash.)

HTTP doesn't have native authentication abilities.

JavaScript should be included in HTML head.

SQL query parameters should be escaped.

Faith in standards committees, large corporations, open source projects, or security researchers.

That multiple versions of IE can't be run on Windows easily, especially older versions of IE on newer versions of Windows.

That last one is actually what prompted this article. 20% of enterprises say they can't upgrade to newer versions of Windows because they need to use IE 6. On top of this, almost every IT guy who had anything to say about this believe in this situation or mentions virtualization as an out. Heck, even Microsoft themselves are saying you need a special XP mode in Windows 7 for IE 6, as is every major article site on the net who comment on this situation.

Hilarious considering that you can install and run IE 6 just fine in Windows 7. There's plenty of solutions out there besides that one too. They've also been around for several years.

Anyways, experts, pros, designers, IT staff and average Internet surfers out there, just keep on being clueless on every single topic, some of us are having a real laugh.

Monday, December 21, 2009

Happy "Holidays" From Google

Posted by insane coder at Monday, December 21, 2009

If you use GMail, you probably got this message recently:

Happy Holidays from Google

Hello,

As we near the end of the year, we wanted to take a moment to thank you for the time, energy, commitment, and trust you've shared with us in 2009.

With sharing in mind, this year we've decided to do something a little different. We hope you'll find it fits the spirit of the holiday season.

We're looking forward to working with you to build lasting success in 2010.

Happy Holidays,
Your Google Team

While on the surface it seems like a nice gesture, wouldn't it be nice if these big companies actually put some thought into what they wrote?

The use of terms or "codewords" like "Happy Holidays" or "holiday season" is meant to be all inclusive of the various winter holidays celebrated by different religions or cultural groups, without singling out any one of them in particular. It's primarily meant to include minorities that celebrate Kwanzaa and Hanukkah.

But this letter was sent on December 21, after Hanukkah was already completed two days earlier. If they really wanted to be all inclusive, perhaps they should have sent it the first week in December, instead of waiting till soon after Hanukkah was over, portraying Antisemitism.

All these "codewords" used are actually born out of "Political Correctness", a practice designed to discriminate against your average white male, while not actually caring about the minorities you're trying to protect. Isn't it nice to see another big company show that they aim for Political Correctness, yet show they couldn't care less about those minorities?

On a similar note, a friend of mine tells me that he recently applied for a job at Google, and they sent him a form asking him to specify his Race on it. Wonder why?

Friday, October 23, 2009

Blogger Spam

Posted by insane coder at Friday, October 23, 2009

If you remember, the other day I had a bit of a meltdown in terms of all the spam I saw piling up over here.

I only have ~30 articles here, yet I had over 300 comments which were spam, and it is quite an annoying task to go delete them one by one. Especially when a week later, I'll have to go delete them one by one yet again.

Instead of just throwing my hands up in the air, I found it was time to get insane - I went to check out Blogger's API. So looking it over, I found it's really easy to log in, and about everything else after that gets annoying.

Blogger provides a way to get a list of articles, create new articles, delete articles, and also managing their comments. But the support is kind of limited if you want to specify what kind of data you want to retrieve.

At first, I thought about analyzing each comment for spam, but I didn't want to run the risk of false positives, and figured my best bet for now is just to identify spammers. I identified 25 different spam accounts.

However, Blogger only offers deleting comments by the comment ID, and then, only one by one. The only way to retrieve the comment ID is to retrieve the comments for a particular article, which includes the comments themselves and a bunch of other data. All this data is in a rather large XML file.

It would be rather easy to delete comments if Blogger provided a function like deleteCommentsOf(userId, blogId), or getCommentIdsOf(userId, blogId), or something similar. But no, one needs 4 steps just to get an XML file which contains the comments IDs along with a lot of other unnecessary data. This has to be repeated for each article.

It seems Blogger's API is really only geared towards providing various types of news feeds of a blog, and minimal remote management to allow others to create an interface for one to interact with blogger on a basic level. Nothing Blogger provides is geared towards en masse management.

Blogger also has the nice undocumented caveat that when retrieving a list of articles for a site, it includes all draft articles not published yet, if the requester is currently logged in.

But no matter, I create APIs wrapped around network requests and parsing data for a living. So using the libraries I created and use at work for this kind of thing, and 200 lines later which includes plenty of comments and whitespace, I got an API which allows me to delete all comments from a particular user from a Blogger site. So I arm an application using my new API with the 25 users I identified, and a few minutes later, presto, they're all gone.

As of the time of this posting, there should be no spam in any of the articles here. I will have to rerun my application periodically, as well as update it with the user IDs of new spam accounts, but it shouldn't be a big deal any more.

Remember the old programming dictum: Annoyance+Laziness = Great Software. It surely beats deleting things by hand every couple of days.

Monday, October 19, 2009

Why online services suck

Posted by insane coder at Monday, October 19, 2009

Does anyone other than me think online services suck?

The thing that annoys me the most is language settings. Online service designers one day had this great idea to check the geographical IP address the user visited their site from, and use it to automatically set the language to the native one for the country they visited from. While this sounds nice in theory, most people only know their mother tongue, and also go on vacation now and then, or visit some other country for business purposes.

So here I am, on business in a foreign country, and I connect my laptop into the Ethernet jack in my hotel room which comes with free Internet access, so I can check my e-mail. What's the first thing I notice? The entire interface is no longer in English. Even worse is that the various menu items and buttons are moved around in this other language.

Even Google, known for being ahead of the curve when it comes to web services can't help but make the same mistakes. I'm sitting here looking at the menu on top of Blogger, wondering which one is login.

For Google this is a worse offense compared to other service providers, as I already was logged into their main site.

Google keeps their cookies set for all eternity (well, until the next time rollover disaster), and they know I always used Google in English. Now it sees me connecting from a different country than usual and thinks I want my language settings switched? Even after I set it to English on their main page, I have to figure out how to set it to English again on Blogger and YouTube?

What's really sad about all this is that every web browser sends each website as part of its request a "user agent", which tells the web server the name of the browser, a version number, operating system details, and language information. My browser is currently sending: "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11)". Notice the en-US? That tells the site the browser is in English, for the United States. If I downloaded a different version of Firefox, or installed a language package and switched Firefox to a different language, it would tell the web server that I did so. If one uses Windows in another language, Internet Explorer will also tell the web server the language Windows/Internet Explorer is in.

Why are these service providers ignoring browser information, and instead solely looking at geographical information? People travel all the times these days. Let us also not forget those in restrictive countries who use foreign proxy servers to access the internet.

However, common issues, such as annoying language support is hardly the end of the problems. In terms of online communication, virtually all of them suffer from variations of spam. Again, where is Google here? Every time I go read comments on Blogger, I see nothing but spam posts. Even when I go to cleanup my own site, the spam just fills up again a few days later.

Where's the flag as spam button? Where's the flag this user as solely a spammer button?

Sure Google as a site manager lets me block all comments on my site till I personally review them to see if they're spam, but in today's need for hi-speed communication is that really an option when you may have a hot topic on hand? Why can't readers flag posts on their own?

In terms of management, why doesn't Blogger's site management features include a list where I can check off posts and hit one mass delete, instead of having to click delete and "Yes I'm sure" on each and every spam post? Why can't I delete all posts from user X and ban that user from ever posting on my site again?

Okay, maybe this isn't so much an article why online services suck, but more about language and spam complaints, and mostly at Google for the moment. Jet-lag, and getting your E-mail interface in Gibberish does wonders for a friendly post. I'll try to come up something better for my next article.

Insane Coding

Saturday, February 15, 2014

HTTP 308 Incompetence Expected

Internet History

Existing Practice

HTTP Responses

HTTP2 Responses

HTTP 308 is already taken!

Conclusion

Sunday, October 16, 2011

Goodbye Google

Saturday, April 9, 2011

The failure of fragmented security

Saturday, October 30, 2010

This just in, 20% of enterprises and most IT people are idiots

Monday, December 21, 2009

Happy "Holidays" From Google

Friday, October 23, 2009

Blogger Spam

Monday, October 19, 2009

Why online services suck

Insane Coding Sites

Blog Archive

Search This Blog

Please Add This

Saturday, February 15, 2014

Internet History

Existing Practice

HTTP Responses

HTTP2 Responses

HTTP 308 is already taken!

Conclusion

Sunday, October 16, 2011

Saturday, April 9, 2011

Saturday, October 30, 2010

Monday, December 21, 2009

Friday, October 23, 2009

Monday, October 19, 2009

Insane Coding Sites

Blog Archive

Search This Blog

Subscribe To

Please Add This