Amazon Web Services user’s group meeting last night was slightly derailed from the presenter’s topic to the discussion about current approaches to highly scalable web architecture. Question is simple: in the new world of Web/Facebook applications when millions of users could be added virtually overnight, question of scalability should be addressed from the day one of application design.
In order to make scalable Web application the most important consideration is the foundation it is build upon – choice of the database architecture. I know have seen four radically different approaches:
- Database clusters
- Database replication
- Federated data layer
- Non-relational databases
Clusters require virtually no code change, scaled up to 32 nodes and in cases of Oracle (do not know anyone else who did it right) are prohibitively expensive for start-ups
Database replication seems so appealing at the beginning… Basically forward all data update/write requests to the master and read the data from slaves. Not much of a code change, cheap/free databases could be used (think MySQL), any slave could become the master when needed… Unfortunately like with the cluster you can go only so far with that – performance will start degrading to the level of inusability after 10-15 slave nodes.
Federated data layer is what Hi5 and many other smart companies are using today. Basic idea – split data horizontally by assigning range of item ids to separate db instances and have “smart” data layer figure out where to go for data request/updates. Major problems: hard to implement “right”, aggregated queries (like selects/count across large data sets, etc) should be executed against many data sources and then aggregated before returning data to the app. Rather challenging to implement and labor intensive any non-trivial scenarios.
Non-relational approach. Sort of new kid on the block (or may be old fellow we used to call object db?). Anyway, with introduction of SimpleDb by Amazon, people start taking it more serious. Basic promise – you trade database features for scalability. Perfect if data is used just to store and retrieve objects and their properties. But what to do about data set aggregation? Navigate through the whole set of objects? I guess, we have to wait for the answer for a while…As usual – no silver bullets, no magic wands. Well, at least software architects have job security...