Navigation
Translate
Search
Elsewhere
Archives
Friday
11Sep2009

Take All Your Stuff With You

The new Squarespace blog importer has officially been released and it solves a long standing problem of migrating your media assets along with your content when you decide to move your blog to a new platform.  More importantly, this newly improved importer will properly redirect requests for old URLs to the new Squarespace hosted version, preserving references across the Internet and maintaining your SEO ranking with search engines via 301 redirects. All media assets, not just images but also including audio and video files, are copied over to Amazon S3 and link references within posts are rewritten to point to their new location.   As far as we can tell, this is the only deployed instance of an asset preserving CMS importer.  We've looked far and wide for others to emulate and improve upon, but came up empty handed.  It's a big step towards fulfilling the promise of data portability.

We currently support the major blogging platforms of Blogger, Wordpress, Typepad and Movable Type with designs of perhaps integrating microblogging sites such as Tumblr.  When work on the new importer started we were initially planning on casting a wider net and supporting more blogging platforms.  But providing a consistent import experience and result with the platforms we were already supporting was trickier than expected.  Each platform provided a different set of tools to export content.  Blogger provided web services access via the GData API.  Wordpress provided both an XML-RPC API and an XML export file.  Typepad and Movable Type provided a number of similar entry points (XML-RPC APIs, AtomPub and a non-XML export file), none of which were sufficient enough to extract all of a user's blog content.  Compared to Typepad and Movable Type, both Blogger and Wordpress were a dream to work with (for our purposes at least). Their APIs were sufficient enough to extract blog entries, comments, tag information and any associated permalinks.  Typepad and Movable Type on the other hand, were a nightmare to work with. It required us to mash together their export file format, which were missing permalinks, and their AtomPub web service, which was missing comment data. 

The Blogger GData API and the Wordpress.com XML-RPC API seem stable enough for us to count on.  Those running their own installation of Wordpress might have to increase the amount of RAM they dedicate to PHP, though.  Our web service queries for large chunks of content to self-hosted Wordpress installations were met with frequent out of memory errors.  Typepad and Movable Type are still using an AtomPub draft spec conformant implementation.  I hope they maintain backwards compatibility if and when they move to the more recent AtomPub 1.0 specification.  I have a feeling that they'll yank support for the AtomPub draft spec implementation without notice and kill our importer integration with them.  I wouldn't be surprised if that happened.

The other half of this data portability equation would be to create web-service APIs to allow users to export and consume all the content contained with their Squarespace sites. We already provide a complete XML file dump of a site but these snapshots must be made manually and can take some time depending how much content a site has.  A web service implementation would allow users to acquire site data in real time.  I've already started looking into designing a web-service API that either builds upon an existing specification like AtomPub or goes in a separate direction like GData.  Hopefully this won't take us too long.

This newly refreshed importer is a sign of our commitment towards preserving and advancing data portability across the Internet. It also marks the beginning of our efforts to scale our highly specialized grid infrastructure with more utility services from the cloud.  It will certainly help keep hosting costs low for our users and it will enable us to implement new features in a shorter amount of time without compromising quality. 

PrintView Printer Friendly Version

References (2)

References allow you to track sources for this article, as well as articles that were written in response to this article.