Skip to main content

Highly scalable web solutions

Amazon Web Services user’s group meeting last night was slightly derailed from the presenter’s topic to the discussion about current approaches to highly scalable web architecture. Question is simple: in the new world of Web/Facebook applications when millions of users could be added virtually overnight, question of scalability should be addressed from the day one of application design.

In order to make scalable Web application the most important consideration is the foundation it is build upon – choice of the database architecture. I know have seen four radically different approaches:

  • Database clusters
  • Database replication
  • Federated data layer
  • Non-relational databases

Clusters require virtually no code change, scaled up to 32 nodes and in cases of Oracle (do not know anyone else who did it right) are prohibitively expensive for start-ups

Database replication seems so appealing at the beginning… Basically forward all data update/write requests to the master and read the data from slaves. Not much of a code change, cheap/free databases could be used (think MySQL), any slave could become the master when needed… Unfortunately like with the cluster you can go only so far with that – performance will start degrading to the level of inusability after 10-15 slave nodes.

Federated data layer is what Hi5 and many other smart companies are using today. Basic idea – split data horizontally by assigning range of item ids to separate db instances and have “smart” data layer figure out where to go for data request/updates. Major problems: hard to implement “right”, aggregated queries (like selects/count across large data sets, etc) should be executed against many data sources and then aggregated before returning data to the app. Rather challenging to implement and labor intensive any non-trivial scenarios.

Non-relational approach. Sort of new kid on the block (or may be old fellow we used to call object db?). Anyway, with introduction of SimpleDb by Amazon, people start taking it more serious. Basic promise – you trade database features for scalability. Perfect if data is used just to store and retrieve objects and their properties. But what to do about data set aggregation? Navigate through the whole set of objects? I guess, we have to wait for the answer for a while…

As usual – no silver bullets, no magic wands. Well, at least software architects have job security...

Comments

Popular posts from this blog

Posting to FaceBook feed using Graph API

Graph API was announced at F8 with a promise to dramatically simplify the FB API. I checked the read access over the new interface during the presentations and to my big surprise it worked flawlessly and from the first time. When I tried https://graph.facebook.com/facebook , JSON-formatted info about the FaceBook page was returned (as expected). Then I tried OAuth 2.0 way of accessing the API to post a message to the feed. And to my even bigger surprise it worked too! Here is what you need to do to access Graph API over OAuth: 1. Create a FB app, store app properties to a file: $appkey = '7925873fbfb5347e571744515a9d2804' ; $appsecret = 'THE SECRET' ; $canvas = 'http://apps.facebook.com/graphapi/' ; 2. Create a page that will prompt user the access permission (I am prompting for the publish_stream and offline_access permissions at the same time) //http://apps.facebook.com/graphapi/ require 'settings.php' ; $url = "https://graph.face...

Freebase Hack Day

Amazon Simple Email Service (Amazon SES) and PHP

This morning Amazon announced availability of a bulk email delivery service called " Simple Email Service ". Anyone who knows how much pain is it to set-up scalable email solution (and it is not just spammers who need it!) should celebrate the occasion. I know of a company that spent several years cleaning ip addresses it sends email and found itself locked into the contract with internet provider since it would take forever to reach required level of email deliver ability anywhere else. Anyway, this evening I decided to check the Amazon claim that the service is "simple". Found out that it is indeed simple! Since there is not much in terms of the documentation yet, here is my code where I used AWS PHP library : // Enable full-blown error reporting. http://twitter.com/rasmus/status/7448448829 error_reporting (- 1 ); // Set plain text headers header ( "Content-type: text/plain; charset=utf-8" ); // Include the SDK require_once '../sdk.class.php' ...