Cloud Computing: Capacity Planning

A capacity planner seeks to meet the future demands on a system by providing the additional capacity to fulfill those demands. Many people equate capacity planning with system optimization (or performance tuning, if you like), but they are not the same. System optimization aims to get more production from the system components you have. Capacity planning measures the maximum amount of work that can be done using the current technology and then adds resources to do more work as needed.

If system optimization occurs during capacity planning, that is all to the good; but capacity planning efforts focus on meeting demand. If that means the capacity planner must accept the inherent inefficiencies in any system, so be it.

Capacity planning is an iterative process with the following steps:
1. Determine the characteristics of the present system.
2. Measure the workload for the different resources in the system: CPU, RAM, disk, network, and so forth.
3. Load the system until it is overloaded, determine when it breaks, and specify what is required to maintain acceptable performance. Knowing when systems fail under load and what factor(s) is responsible for the failure is the critical step in capacity planning.
4. Predict the future based on historical trends and other factors.
5. Deploy or tear down resources to meet your predictions.
6. Iterate Steps 1 through 5 repeatedly.

Performance Monitoring

Performance should be monitored to understand the capacity. Capacity planning must measure system-level statistics, determining what each system is capable of, and how
resources of a system affect system-level performance.

Below is a list of some LAMP performance tools:
Alertra - Web site monitoring service
Cacti - Open source RRDTool graphing module
Collectd - System statistics collection daemon
Dstat - System statistics utility; replaces vmstat, iostat, netstat, and ifstat
Ganglia - Open source distributed monitoring system
RRDTool - Graphing and performance metrics storage utility
ZenOSS - Operations monitor, both open source and commercial versions

Load testing

Examining your server under load for system metrics isn't going to give you enough information to do meaningful capacity planning. You need to know what happens to a system when the load increases. That is the aim of Load Testing. Load testing is also referred to as performance testing, reliability testing, stress testing, and volume testing.

If you have one production system running in the cloud, then overloading that system until it breaks isn't going to make you popular with your colleagues. However, cloud computing offers virtual solutions. One possibility is to create a clone of your single system and then use a tool to feed that clone a recorded set of transactions that represent your application workload.

Two examples of applications that can replay requests to Web servers are HTTPerf
(http://hpl hp.com/research/linux/httperf/) and Siege (http://www.joedog.org/JoeDog/Siege), but both of these tools run the requests from a single client, which can be a resource limitation of its own. You can run Autobench (http://www xenoclast.org/autobench/) to run HTTPerf from multiple clients against your Web server, which is a better test.

You may want to consider these load generation tools as well:

• HP LodeRunner
• IBM Rational Performance Tester
• JMeter
• OpenSTA (http://opensta.org/)
• Micro Focus (Borland) SilkPerfomer

Resource Ceiling

During load testing over a certain load for a particular server, the CPU, RAM, and Disk I/O utilization rates rise but do not reach their resource ceiling. In this instance, the Network I/O reaches its maximum 100-percent utilization at about 50 percent of the tested load, and
this factor is the current system resource ceiling.

Network I/O is often a bottleneck in Web servers, and this is why, architecturally, Web sites prefer to scale out using many low-powered servers instead of scaling up with fewer but more powerful servers.

Let's consider a slightly more complicated case that you might encounter in MySQL database systems. Database servers are known to exhibit resource ceilings for either their file caches or their Disk I/O performance. To build high-performance applications, many developers replicate their master MySQL database and create a number of slave MySQL databases. All READ operations are performed on the slave MySQL databases, and all WRITE operations are performed on the master MySQL database. A master/slave database system has two competing processes and the same server that are sharing a common resource: READs and WRITEs to the master/slave databases, and replication traffic between the master database and all the slave databases.

These types of servers reach failure when the amount of WRITE traffic in the form of INSERT, UPDATE, and DELETE operations to the master database overtakes the ability of the system to replicate data to the slave databases that are servicing SELECT (query/READ) operations.

Network Capacity

There are three aspects to assessing network capacity:
• Network traffic to and from the network interface at the server, be it a physical or virtual interface or server
• Network traffic from the cloud to the network interface
• Network traffic from the cloud through your ISP to your local network interface (your computer)

To measure network traffic at a server's network interface, you need to employ what is commonly known as a network monitor, which is a form of packet analyzer. Microsoft includes a utility called the Microsoft Network Monitor as part of its server utilities, and there are many third-party products in this area.

Alternative names for network analyzer include packet analyzer, network traffic monitor, protocol analyzer, packet sniffer, and Ethernet sniffer, and for wireless networks, wireless or wifi sniffer, network detector, and so on.

Scaling

In capacity planning, after you have made the decision that you need more resources, you are faced with the fundamental choice of scaling your systems. You can either scale vertically (scale up) or scale horizontally (scale out), and each method is broadly suitable for different types of applications.

To scale vertically, you add resources to a system to make it more powerful. For example, during scaling up, you might replace a node in a cloud-based system that has a dual-processor machine instance equivalence with a quad-processor machine instance equivalence. You also can scale up when you add more memory, more network throughput, and other resources to a single node. Scaling out indefinitely eventually leads you to an architecture with a single powerful supercomputer.

Vertical scaling allows you to use a virtual system to run more virtual machines (operating system instance), run more daemons on the same machine instance, or take advantage of more RAM (memory) and faster compute times. Applications that benefit from being scaled up vertically include those applications that are processor-limited such as rendering or memory-limited such as certain database operations—queries against an in-memory index, for example.

Horizontal scaling or scale out adds capacity to a system by adding more individual nodes. In a system where you have a dual-processor machine instance, you would scale out by adding more dual-processor machines instances or some other type of commodity system. Scaling out indefinitely leads you to an architecture with a large number of servers (a server farm), which is the model that many cloud and grid computer networks use.

Horizontal scaling allows you to run distributed applications more efficiently and is effective in using hardware more efficiently because it is both easier to pool resources and to partition them. Although your intuition might lead you to believe otherwise, the world's most powerful computers are currently built using clusters of computers aggregated using high speed interconnect technologies such as InfiniBand or Myrinet. Scale out is most effective when you have an I/O resource ceiling and you can eliminate the communications bottleneck by adding more channels. Web server connections are a classic example of this situation.