DoC Computing Support Group


Summer 2012: DoC Private Cloud Hardware Investigations

During the summer of 2012, CSG investigated the hardware that could be used to realise the DoC cloud in a scalable and expandable way, before purchasing agreed cloud hardware in August 2012. These notes are largely historic, and describe our summer 2012 investigations.

Susan expressed a preference for open source software running on commodity hardware. A strong advantage of this approach is that commodity hardware could be repurposed, even if the cloud project failed completely, or got scaled down to use a tiny proportion of the hardware.

However, Peter Pietzuch recommended that we also investigate commercial scalable storage systems - specifically NetApp "scalable NAS" units - which his colleagues at the University of Cambridge Computing Laboratory have used for years. CSG have discussed NetApps with colleagues in ICT and Martyn Johnson at the Cambridge Computing Lab, and have now got a realistic price for a NetApp model that we imagine is suitable (see below).

Looking at commodity hardware first, we identify two types of server for the DoC private cloud: a compute node and a storage node:

  • A compute node contains a large number of CPUs/cores. Its primary role in the cloud is one of computation (virtual machine hosting, distributed computing and the like).

  • A storage node contains a large number of locally attached disks providing a chunk of fault tolerant storage. Its primary role in the cloud is to provide storage (for VM images and associated research filesystems).

We envisage that multiple compute nodes and multiple storage nodes would be needed. The question of what software to run to "glue together" several storage nodes into some form of distributed multi-replica object store (aka Amazon S3) or distributed POSIX filesystem remains unclear.

Alternatively, if we go for a NetApp storage system, we would still need the distinct compute nodes, but a single NetApp system (or ideally two, one per machine room) would (apparently) provide all the storage, performance and redundancy.

In addition, one should not overlook infrastructure (eg. racks to put the servers in, Uninterruptible Power Supplies to power the servers etc!):

Infrastructure Requirements

The basic idea is that we buy two of everything and then have one rack in the Huxley 221 Data Centre containing half the equipment and the other rack in the ICT Mech Eng Data Centre containing the remaining half. All prices quoted below exclude VAT.

Item

Quantity

Price

Comments

APC Smart-UPS RT 6000VA RM 230V

6

£2200

4200 Watts / 6000 VA UPS

ICT LOM switch

2

£1000

(Est.) Remote-management

ICT 10GbE switch

2

£5000

(Est.) Fast intra-cloud comms

1U Rack PDU

12

£350

(Est.) Rack power & monitoring

Rittal 46U rack

2

£1000

(Est.) Houses servers & UPSes

£31,400 total for these infrastructure requirements. Note that the specifications and number of UPS and power distribution units have been selected from the outset to cater for a full rack. This takes into account cloud expansion.

Compute Nodes

Commodity Intel architecture rack mount servers with the highest possible core density (and lowest price per core). These will host the VMs and provide the raw computing capability. Lloyd has done a lot of work getting quotes for commodity compute nodes, shown below:

Commodity Compute Node Details

Storage Nodes: Commodity Intel Architecture

Commodity Intel architecture rack mount servers with the highest possible storage density, ideally with the maximum number of disk spindles possible for performance. These will host the VM images, any associated data, possibly knitted together using some distributed multi-replica filesystem for performance and fault tolerance.

Lloyd has done a lot of work getting quotes for commodity compute nodes, shown below:

Commodity Storage Node Details

Storage: Commercial NetApp Network Attached Storage

Giuseppe and Duncan have done a lot of work investigating NetApp storage systems and here are the details:

NetApp Scalable NAS Details

Summary

  • It is hard to judge whether the NetApp FAS2240-2 option is a good one, as CSG have no experience of NetApps. It's clearly a well regarded set of products from a reliable company who pride themselves (via their partners) on performance and optimization. But it's hard to guarantee sight unseen.

  • On the commodity hardware front, Dell seems to offer the best prices, both for compute and commodity storage servers.
  • There are other costs involved, such as:
    • ICT networking.
    • Power feed organisation and cooling.
    • Software licenses (if needed).
  • it is possible to get different compute & storage node configurations for a given price-target. Consider the following targets (inclusive of VAT and the infrastructure cost), given the Dell C6220 compute node at £20K + VAT, the Dell R720xd at £6K+VAT, and £31K + VAT infrastructure:

    • £100K inc vat (£83K+vat): two Dell C6220s and two Dell R720xds.
    • £150K inc vat (£125K+vat) (1): four C6220s and two R720xds.
    • £150K inc vat (£125K+vat) (2): two C6220s and eight R720xds.
    • £200K inc vat (£166K+vat): four C6220s and eight R720xds.
  • similarly, if we go for the NetApp FAS2240-2 option at approx £50K + VAT, the Dell C6220 compute node at £20K + VAT and £31K + VAT infrastructure.

    • £100K inc vat (£83K+vat): NO COMPUTE NODES, 1 NetApp FAS2240-2.

    • £150K inc vat (£125K+vat): two C6220s and 1 x NetApp FAS2240-2.

    • £200K inc vat (£166K+vat): four C6220s and 1 x NetApp FAS2240-2.

  • of course, if there's only one NetApp, we could abandon the second data centre rack entirely, halving the infrastructure cost. this adds one C6220 compute server.

 
 

project/privatecloud/hardware (last edited 2013-10-22 12:02:22 by dcw)