Navigation
Translate
Search
Elsewhere
Archives

Entries in squarespace (2)

Wednesday
Dec162009

SEO Friendly Stocking Stuffer

After announcing our updated blog importer back in September, we got a lot of positive comments about how we seamlessly (301) redirect requests for existing URLs of imported content to their new home on Squarespace.  This ensured that all the Google link juice users have gathered over time was transferred over in a SEO friendly way.  This feature mainly lived in the deep recesses of our backend routing code, but starting today, we're bringing this feature out of the dark and letting our users create SEO friendly shortcuts.

 

Until today, users were only allowed to create URL shortcuts to their site's content via a simple URL rewriting method.  It allowed users to create shorter, perhaps even more user-friendly URLs than the ones Squarespace generates. Requests for the friendly, shortcut URL loaded the contents of the existing URL while preserving the requested URL in the browser navigation bar.  Unfortunately, when search engines crawl other sites that link to either of these URLs, this technique ends up splitting the page rank and other measures of link value between the two URLs.  Not very good for SEO.  One way to alleviate this problem would be to use the "canonical link rel" element to tell search engines to focus on indexing a new page for the content it encounters.  Originally intended for duplicate content within the same domain, Google is now supporting its use for cross-domain content duplication. It is only seen as a hint and not an absolute directive, though.  It is intended to supplement and not replace a 301 redirect.  Yahoo and MSN have yet to follow suit, though there have been grumblings that they have agreed to support it.

In addition to the URL rewriting method, we have added the ability for a user to choose between on-domain 301 and 302 redirects.  A 301 redirect will signal to a search engine that the requested URL has moved permanently to a new URL.  All three major search engines handle the 301 redirect directive the same way. They ignore the original URL and instead index the destination URL.  The link value of any keywords contained in the original URL will be transferred over to the new URL.

A 302 redirect is treated differently depending on the search engine.  It essentially tells a search engine that the move is only temporary, and that the content at the original URL might still be valid in the future. When Google encounters a 302 redirect it maintains all link value with the original URL.  MSN/Bing, on the other hand, treats 302 redirects exactly how it treats 301 redirects, it will always ignore the original URL and instead indexes the destination URL.  With the current Yahoo-Microsoft search deal, it follows that Yahoo's indexing behavior will soon be the same as Microsoft's.

So how do you decide between our default URL rewrite method, our on-domain 301 redirect or the often misunderstood on-domain 302 direct?  If you don't care about SEO, then the default URL rewrite method will probably be a good, no-hassle choice.  It loads content the fastest among our three shortcut navigation methods.  Also, it is the only method that preserves the shortcut URL on the browser address bar.  The 301 redirect is the best all around option if you want consistent results across all search engines.  If you're not sure what to do, pick the 301 redirect.  The real question is when to use the 302 redirect.  The on-domain 302 redirect should be used if you want a URL to recycle among different posts/pages.  For example, a news blog following Tiger Woods' growing harem collection might use a 302 redirect to funnel readers to the latest news by creating a shortcut from

"http://www.tigersden.com/ladies-of-tiger-woods"

to

"http://www.tigersden.com/news/2009/12/10/tiger-woods-bones-waitress.html"

on one day and then update the url with some new content scribed at

"http://www.tigersden.com/news/2009/12/14/tiger-woods-bones-call-girl.html" 

a few days later.   A 302 redirect will allow a reader to google for "tiger woods ladies" and land on the page with Tiger's most recent conquest.  A contrived example, yes. But illustrative nonetheless. Anyway, Santa doesn't have any more feature enhancements in store for 2009. More fun features will be coming in the new year. Happy Holidays!

Friday
Sep112009

Take All Your Stuff With You

The new Squarespace blog importer has officially been released and it solves a long standing problem of migrating your media assets along with your content when you decide to move your blog to a new platform.  More importantly, this newly improved importer will properly redirect requests for old URLs to the new Squarespace hosted version, preserving references across the Internet and maintaining your SEO ranking with search engines via 301 redirects. All media assets, not just images but also including audio and video files, are copied over to Amazon S3 and link references within posts are rewritten to point to their new location.   As far as we can tell, this is the only deployed instance of an asset preserving CMS importer.  We've looked far and wide for others to emulate and improve upon, but came up empty handed.  It's a big step towards fulfilling the promise of data portability.

We currently support the major blogging platforms of Blogger, Wordpress, Typepad and Movable Type with designs of perhaps integrating microblogging sites such as Tumblr.  When work on the new importer started we were initially planning on casting a wider net and supporting more blogging platforms.  But providing a consistent import experience and result with the platforms we were already supporting was trickier than expected.  Each platform provided a different set of tools to export content.  Blogger provided web services access via the GData API.  Wordpress provided both an XML-RPC API and an XML export file.  Typepad and Movable Type provided a number of similar entry points (XML-RPC APIs, AtomPub and a non-XML export file), none of which were sufficient enough to extract all of a user's blog content.  Compared to Typepad and Movable Type, both Blogger and Wordpress were a dream to work with (for our purposes at least). Their APIs were sufficient enough to extract blog entries, comments, tag information and any associated permalinks.  Typepad and Movable Type on the other hand, were a nightmare to work with. It required us to mash together their export file format, which were missing permalinks, and their AtomPub web service, which was missing comment data. 

The Blogger GData API and the Wordpress.com XML-RPC API seem stable enough for us to count on.  Those running their own installation of Wordpress might have to increase the amount of RAM they dedicate to PHP, though.  Our web service queries for large chunks of content to self-hosted Wordpress installations were met with frequent out of memory errors.  Typepad and Movable Type are still using an AtomPub draft spec conformant implementation.  I hope they maintain backwards compatibility if and when they move to the more recent AtomPub 1.0 specification.  I have a feeling that they'll yank support for the AtomPub draft spec implementation without notice and kill our importer integration with them.  I wouldn't be surprised if that happened.

The other half of this data portability equation would be to create web-service APIs to allow users to export and consume all the content contained with their Squarespace sites. We already provide a complete XML file dump of a site but these snapshots must be made manually and can take some time depending how much content a site has.  A web service implementation would allow users to acquire site data in real time.  I've already started looking into designing a web-service API that either builds upon an existing specification like AtomPub or goes in a separate direction like GData.  Hopefully this won't take us too long.

This newly refreshed importer is a sign of our commitment towards preserving and advancing data portability across the Internet. It also marks the beginning of our efforts to scale our highly specialized grid infrastructure with more utility services from the cloud.  It will certainly help keep hosting costs low for our users and it will enable us to implement new features in a shorter amount of time without compromising quality.