DoC Computing Support Group


Differences between revisions 1 and 14 (spanning 13 versions)
Revision 1 as of 2012-04-03 12:52:36
Size: 1634
Editor: dcw
Comment:
Revision 14 as of 2012-04-24 11:02:27
Size: 4231
Editor: dwm
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Wiki page for notes on Jan-April 2012 DoC private cloud discussions = = DoC Private Cloud =
Line 3: Line 3:
== Intro == == Services ==

Initially, the following services will be needed:

 * Virtual-machine hosting / automated provisioning facility.
 * Persistent backing-store for VM images.
 * High-performance POSIX file-store access / scratch areas.

Candidate software tooling includes:

 * Storage:
   * [[http://ceph.newdream.net/|Ceph]] distributed object store / block-device / filesystem.
   * [[http://openstack.org/projects/storage/|OpenStack Swift]] distributed object store. (Implements Amazon S3 only)
   * [[http://www.osrg.net/sheepdog/|Sheepdog]] distributed image storage.
   * [[http://www.gluster.org/|Gluster]] distributed filesystem.

We've been thinking along these lines for a while; see also: [[internal/project/Storage-NG|Storage-NG]].

Upgrades to the network connections for existing NFS filers may also be warranted.

 * Virtual-machines:
   * [[http://www.xen.org/|Xen]] paravirtualization tools.
   * [[http://www.linux-kvm.org/|KVM]] (para)-virtualization tools.
   * [[http://libvirt.org/|libvirt]] VM abstraction and management layer.
   * [[http://code.google.com/p/ganeti/|Ganeti]] VM management system.

The virtual-machine management layer will need to support accounting for resource utilization by the VMs spawned for a given user or group, live migration of VMs from one host to another, and will likely need to support automated backups / snapshots of historical virtual-machine disk state. (Note that this differs from existing doctrine, which specifies that the machine-local OS data is expendable, and can be regenerated.)

The use of seed images, data de-duplication, and/or copy-on-write would also be valuable for minimising storage requirements.

== Background ==
Line 6: Line 36:
someone (Jeremy Cohen) for 6 months into CSG, specifically tasked with
building a DoC private cloud [definition unclear].
someone for 6 months into CSG, specifically tasked with building a DoC
private cloud. Essentially she said that Exec Committee has found some
significant pot of money which needs to be spent this financial y
ear.
Line 9: Line 40:
She explained the core idea was "virtualisation even for research clusters",
i.e.
research groups currently buy clusters when they have money, CSG set
She explained the core idea was "virtualisation even for research clusters":
at present
, research groups buy clusters when they have money, CSG set
Line 17: Line 48:
retire them on schedule (4/5/6 years or whatever). Usually, these clusters
are only accesible by members of that research group so the resource may
not be fully utilised.
retire them on schedule (4/5/6 years or whatever). Also these clusters
are often only accessible by members of that research group so the resource
may not be fully utilised.
Line 24: Line 55:
node (VCN)'s OS, researchers work as before - but each node's encapsulated node's OS, researchers work as before - but each node is encapsulated
Line 28: Line 59:
Suppose, for instance, the group needed N nodes x 100% of underlying VM host
x M months [and then less thereafter].
Various discussions with PJM and AON followed, post will be called
"Cloud Manager" and be part of CSG, and do non-cloud things too. Could
be permanent, could be 6 months in the first instance.
Line 31: Line 63:
Susan also added "and it should just scale, manage itself magically." Most crucially: (despite not knowing the exact spec, services to provide, let
alone how to implement them) we therefore need to purchase all the kit
having it delivered in July 2012, before the Olympics. PJM added "build
a private cloud like Amazon EC2 does", AON suggested a budget of £100K,
£150K or even £200K - we will provide possible plans for
these price levels.

DWM has spent a lot of time evaluating Ceph as a possible S3/Elastic Block
Store like storage system for supporting VM storage and possibly very
high speed filesystems eg. staging areas for VM data (scaleout NAS with
replication). So far: it's not there yet, at least as a fast POSIX filesystem.
Alternatives need to be looked at as well..

== Steering meetings ==

[[internal/project/privatecloud/meeting-2012-04-03|Meeting 1 - April 3rd 2012]]

DoC Private Cloud

Services

Initially, the following services will be needed:

  • Virtual-machine hosting / automated provisioning facility.
  • Persistent backing-store for VM images.
  • High-performance POSIX file-store access / scratch areas.

Candidate software tooling includes:

  • Storage:
    • Ceph distributed object store / block-device / filesystem.

    • OpenStack Swift distributed object store. (Implements Amazon S3 only)

    • Sheepdog distributed image storage.

    • Gluster distributed filesystem.

We've been thinking along these lines for a while; see also: Storage-NG.

Upgrades to the network connections for existing NFS filers may also be warranted.

  • Virtual-machines:
    • Xen paravirtualization tools.

    • KVM (para)-virtualization tools.

    • libvirt VM abstraction and management layer.

    • Ganeti VM management system.

The virtual-machine management layer will need to support accounting for resource utilization by the VMs spawned for a given user or group, live migration of VMs from one host to another, and will likely need to support automated backups / snapshots of historical virtual-machine disk state. (Note that this differs from existing doctrine, which specifies that the machine-local OS data is expendable, and can be regenerated.)

The use of seed images, data de-duplication, and/or copy-on-write would also be valuable for minimising storage requirements.

Background

Sometime in early 2012, Susan told DCW that DoC were thinking of hiring someone for 6 months into CSG, specifically tasked with building a DoC private cloud. Essentially she said that Exec Committee has found some significant pot of money which needs to be spent this financial year.

She explained the core idea was "virtualisation even for research clusters": at present, research groups buy clusters when they have money, CSG set them up, install "linux du jour" on them, configure fileservers (if part of cluster), tape backups (if part), processing node special software etc.

Then the servers age, the OS is essentially frozen (it's often difficult to persuade researchers that we should reinstall their fileservers, webservers and compute nodes). They become "fragile". Sometimes it's hard to even retire them on schedule (4/5/6 years or whatever). Also these clusters are often only accessible by members of that research group so the resource may not be fully utilised.

Susan's vision: setup a private cloud, researchers add hardware to that cloud's core resources, then create a VM for each cluster node, perhaps tied (1-1 at first) to their own hardware, CSG install that virtual cluster node's OS, researchers work as before - but each node is encapsulated inside a VM. Later, these VMs could share resources - when the group don't need 100% resources, or new more powerful hardware is purchased.

Various discussions with PJM and AON followed, post will be called "Cloud Manager" and be part of CSG, and do non-cloud things too. Could be permanent, could be 6 months in the first instance.

Most crucially: (despite not knowing the exact spec, services to provide, let alone how to implement them) we therefore need to purchase all the kit having it delivered in July 2012, before the Olympics. PJM added "build a private cloud like Amazon EC2 does", AON suggested a budget of £100K, £150K or even £200K - we will provide possible plans for these price levels.

DWM has spent a lot of time evaluating Ceph as a possible S3/Elastic Block Store like storage system for supporting VM storage and possibly very high speed filesystems eg. staging areas for VM data (scaleout NAS with replication). So far: it's not there yet, at least as a fast POSIX filesystem. Alternatives need to be looked at as well..

Steering meetings

Meeting 1 - April 3rd 2012

 
 

project/privatecloud (last edited 2013-11-13 19:27:43 by dcw)