The State of HTML5 Local Data Storage

posted by
joshua

power tube

Of all the new features being implemented as HTML5, I think I’m most excited about offline storage. Despite more and more ubiquitous Wifi, despite the ability to tether our laptops to our 3G mobile phones, as web apps have become more and more sophisticated, the need to be online has often felt like the the last barrier separating the web and the desktop.

There are actually two kinds of offline storage on offer in HTML5. Client-side session and persistent storage (also sometimes referred to with the vaguely misleading “DOM Storage”, or simply “web storage”) is a simple, cookie-like key-value store. The key and value are treated as strings, but it’s possible to store more complex objects as stringified JSON.

If you’ve worked with cookies before, you’ll probably find yourself getting familiar with this storage mechanism pretty quickly. However, there are some significant differences between web storage and cookies:

The other kind of offline storage - variously called “JavaScript Database” (Webkit), “Storage” (Mozilla), Web SQL storage or “webdb” (various) - is far more robust. This is offline storage using real SQL - Webkit and Mozilla both use an embedded SQLite engine, and expose it through various client-side scripting interfaces.

Here’s an example taken from Apple’s developer documentation for Safari:

Unfortunately, unlike “DOM Storage”, the various browser vendors are less committed to standardizing on “webdb” - Opera’s support is still forthcoming, Webkit is fully on-board and shipping, and Mozilla has been vocally skeptical of the whole idea, and labels their API as ‘unfrozen’ - meaning it’s likely to change over time. There’s excellent documentation from both Webkit and Mozilla, but the APIs are drastically different.

Even worse than two drastically different APIs for the two major supporting browsers, the larger developer community hasn’t quite bought into SQL storage, yet. There are concerns that SQLite isn’t the most standards-compatible SQL implementation. Some had hoped to see the browser vendors adopt one of the modern, flexible, “document model” database formats, made popular by CouchDB, MongoDB and SimpleDB. For now, the W3C Working Group has officially declared SQL storage as being “at an impasse”.

metro woman

In the mean-time, there are some interesting alternatives. If SQL storage sounds suspiciously familiar, it might be because it’s largely based on Google’s browser plugin Gears. Gears has some issues - like the need to handle cases when it’s not installed, or how to “encourage” users to install extra software (Gears is built-in to Chrome, but an additional install on other browsers), but it at least provides a consistent storage API across multiple browsers, and the additional Gears functions are pure gravy (background threads, desktop integration). Unfortunately, it seems development of Gears has stalled, as developer attention has shifted, with focus now on providing native support for these features in Chrome.

Another stop-gap solution is Paul Duncan’s PersistJS, which layers an abstract API interface on top of a variety of browser storage backends. PersistJS uses HTML5 native storage by default, Gears when available, and can fall back to Flash, and userdata behaviors for older versions of IE. (I should also mention that Dojo Storage has similar goals, but PersistJS seemed to cover more browsers, and has a smaller footprint.) Unfortunately, the cost of cross-browser compatibility is that PersistJS’s interface resembles the simple key-value storage you get with DOM Storage. While, as with DOM Storage, it’s possible to store serialized JSON in PersistJS (see examples here), some applications will suffer from poor support for more complicated data relationships.

I’ve been disappointed to discover that none of these implementation options provides any mechanism to support syncing data once the browser is back online! In today’s extremely social environment, the kinds of apps I imagine building with offline storage would need some mechanism to pull user data up to the server when the user is back online, and push any new data out to the browser. (Something like Caleb Crane’s Impel but without the baggage.)

I’m considering writing a javascript library that would layer jLinq on top of an implementation-obscuring storage API, but I suspect I’ll wait until Mozilla’s API reaches “frozen” status. I’d also want a library that would provide some kind of basic support for syncing, maybe something based on Thoughtbot’s Jester.

"Power Tube" by maury.mccown - http://flic.kr/p/SQx6r / CC BY-NC-ND 2.0
"Metro Woman" by Extra Medium - http://flic.kr/p/4doc3C / CC BY-NC-ND 2.0