TvE 2100

At 2100 feet above Santa Barbara

Automatic Scaling in Response to Load

One of the amazing benefits of EC2 is that it becomes easy to automatically scale the number of servers in operation as a function of load. So if a whole bunch of users suddenly show up in response to marketing activity or a digg event then your site doesn’t have to come to a crawl: you can simply fire-up additional EC2 servers to handle the load.

The screencast below shows a simple scenario we are encountering a lot in action. This is what I demoed at the end of March at the O’Reilly Emerging Technologies conference in the keynote of Amazon’s CTO Werner Vogels. The demo lasts a mere 3 minutes and shows me enqueueing 100 music tracks on the SQS queue using a front-end web site. The queue size jump can be seen in the RightScale interface and the 10 servers that RightScale starts get to work on the queue. A few minutes later the queue size starts reducing and eventually the servers go idle and terminate themselves. Simple yet incredibly effective.

The set-up for this demo is relatively simple: a web front-end server provides the user interface and when the user kicks off expensive operations, in this example the transcoding of music tracks, it enqueues the operations as items onto an Amazon SQS queue. Each SQS queue item represents a unit of work and typically contains links to data files stored on Amazon S3. In this example, the music tracks to be transcoded as well as the resulting files are all stored on S3.

A separate array of worker servers runs a worker framework that repeatedly pulls an item off the SQS queue and performs the requested operation. If the queue remains empty for several minutes, the worker server terminates itself in order to reduce expenses. In the background the RightScale service is also monitoring the queue in order to ensure that an adequate number of servers are running to keep the queue service time within the desired bounds. An “elasticity” function is set-up on the RightScale site and determines how many servers need to be running at any point in time as a function of the size of the queue or as a function of the age of items at the head of the queue. If RightScale detects that too few worker servers are running, it simply launches additional ones.