February 2, 2010

Playing with localStorage

This seems to be something of a well-kept secret. It appears that almost all newer web browsers support various types of client storage without the use of cookies. The web storage specification (part of HTML5) describes new facilities that web authors can use to store arbitrary data on the client using a simple key/value store.

The best part: instead of being limited to 4K of data total as with cookies, each website can store megabytes of data on the client. (Implementations vary, but the draft specification recommends somewhere in the area of 5MB per origin.) This ultimately means that we'll start seeing a lot of web applications (E.g., Google Docs) storing data locally for offline access. The specification describes (so far) session storage and local storage.

Two other kinds of local storage were once proposed, but eventually dropped: global storage and database storage. Global storage is like session and local storage, except that the website could scope data along the domain name heirarchy. (See Mozilla's explanation for details.) Database storage is essentially an SQL engine inside the browser. The only browsers that support it are those based on recent versions of WebKit, namely Chrome and Safari.

Back to local storage, however. These are the browsers that I successfully tested with support for local storage:
  • Chrome 4.0.302.2
  • Firefox 3.5, 3.6
  • Safari for iPhone/iPod Touch 3.1.2
  • Internet Explorer 8
While these browsers are not the majority of the ones currently in use, they or their descendants will be soon. Users are finally starting to realize that it's in their best interest to stay reasonably up to date on their software and some large web properties are beginning to drop support entirely for crusty old ancient browsers (ahem). It's safe to say that local storage will be a big part of future web apps.

So, here's how it works. Items are stored as simple key/value pairs. Programmers know this as a hash table. The object in Javascript to manipulate the storage is called, unsurprisingly, localStorage. This object has the following methods:
  • getItem(key) - retrieve a value via the key
  • setItem(key, value) - add or change a key/value pair
  • removeItem(key) - remove an item from storage
  • key(index) - retrieve the key via its index
  • clear() - empty the local storage
And one property:
  • length - the number of items in local storage
The above should be mostly self-explanatory. You can get the key for a specific item by specifying its index number but note that since this is a hash table, the index-to-key mappings are not static. You can never presume that a particular index belongs to a certain key. The key() method is primarily meant to be used with the length property so that you can iterate over each item in storage.

Pretty simple, eh? This tiny feature adds a load of capability and, together with the rest of HTML5, will finally start to turn the web browser into a proper applications platform, even if it never will be a particularly efficient one.

A quick note on privacy and security: each fully-qualified domain name that you visit has its own storage. For example, yahoo.com and google.com can never share storage. And as far as I understand, the policy is the same for subdomains as well, so mail.google.com and code.google.com can't access each others' storage either.

I put together a demo that lets you directly play around with localStorage in your browser. Take a peek at the source code, save it, steal it, whatever. Everything except the jQuery library is contained within the one HTML file.


M.E. said...

That's interesting to note. I wonder, though--could sub-level sites within a single domain access each other's data? So for instance if you visited a site such as http://members.cox.net/homebusiness and it left data of various types on there, then visited http://members.cox.net/joeblow and they had a program to harvest all data from anything it could read on the domain name, could they in theory corrupt or steal the information from the other site(s) on that same domain, even though they have nothing to do with the other sites?

Seems like a potentially big security hole, to me.

charles said...

Calling it a security hole is a bit of a stretch. Web security has always revolved around the "origin" (the protocol, hostname, and port parts of the URL). This is called same origin policy. So, Javascript and cookies on http://members.cox.net/joeblow can also modify data set by http://members.cox.net/homebusiness. Local Storage follows the same strategy.

Web hosts and web authors simply need to be aware of same origin policy if their website contains dynamic content. If it does, they probably need their own domain or subdomain. (Notice how each Blogger blog has a unique subdomain.)