DoC Computing Support Group


Differences between revisions 2 and 20 (spanning 18 versions)
Revision 2 as of 2012-04-03 20:35:39
Size: 10340
Editor: dcw
Comment:
Revision 20 as of 2012-04-26 18:05:26
Size: 6462
Editor: dcw
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Wiki page for notes on Jan-April 2012 DoC private cloud discussions = = DoC Private Cloud =
Line 3: Line 3:
== Intro == == Services ==

Initially, something like the following services will be needed:

 * Virtual-machine hosting / automated provisioning facility (Amazon EC20-a-like).
 * Persistent backing-store for VM images (perhaps Amazon S3-style, perhaps Amazon Elastic Block Service-style, perhaps distributed filesystem).
 * High-performance POSIX file-store access / scratch areas (perhaps in-cloud or perhaps separate from the cloud).

In relation to scalable storage options, we've already been thinking about these for a while; see also: [[internal/project/Storage-NG|Storage-NG]].

Candidate software to investigate, solving all or part of the private iaas cloud problem, includes:

 * OpenStack: claims to do most of the job - VM provisioning, interface with KVM, Xen etc (subproject "Nova") and S3-alike storage system (project "Swift"). Also can integrate to (eg. Ceph, GlusterFS) for filesystem-based distributed storage. Also, Nova has an iSCSI client storage plugin too.

 * CloudStack: Citrix open source project, donated to Apache Foundation, rather similar to the above.

 * VM seed/runtime image storage:
   * [[http://openstack.org/projects/storage/|OpenStack Swift]] distributed object store. (Implements Amazon S3 only)
   * [[http://www.osrg.net/sheepdog/|Sheepdog]] distributed image storage.
 
 * High-performance distributed POSIX file-stores:
   * [[http://ceph.newdream.net/|Ceph]] distributed object store / block-device / filesystem. DWM evaluated: Conclusion - not ready yet.
   * [[http://www.gluster.org/|Gluster]] distributed filesystem. DWM evaluating. Looking rather promising. No "storage head node", i.e. bottleneck!
   * [[http://www.moosefs.org/|MooseFS]] distributed filesystem.
   * [[http://www.fhgfs.com/cms/|Fraunhofer Parallel File System]] distributed filesystem, no storage head node, BUT NO REPLICATION, pure striping. Discovered by SM. Conclusion: not suitable.

At present, the options for scalable high-performance general purpose NFS filers suitable for storing home dirs, research volumes etc are either expensive or not yet mature.
The above POSIX filestores may well be suitable for the "inner problem" of VM storage.

Upgrading existing NFS fileservers to 10Gb is definitely also worth trying (first).

 * Virtualization software:
   * [[http://www.xen.org/|Xen]] paravirtualization tools.
   * [[http://www.linux-kvm.org/|KVM]] (para)-virtualization tools.
   * VMware ESX virtualization software - commercial.
   * [[http://libvirt.org/|libvirt]] VM abstraction and management layer.
   * [[http://code.google.com/p/ganeti/|Ganeti]] VM management system.
   * [[http://openstack.org/projects/compute/|OpenStack Nova]] VM management system [uses Xen, KVM or VMware under the hood].

The virtual-machine management layer will need to support accounting for resource utilization by the VMs spawned for a given user or group, live migration of VMs from one host to another, and will likely need to support automated backups / snapshots of at least SOME historical virtual-machine disk state. (Note that this differs from existing doctrine, which specifies that the machine-local OS data is expendable, and can be regenerated.)

The use of seed images, data de-duplication, and/or copy-on-write would also be valuable for minimising storage requirements.

== Background ==
Line 6: Line 49:
someone (Jeremy Cohen) for 6 months into CSG, specifically tasked with
building a DoC private cloud [definition unclear]. Essentially, Exec
Committee has found some money and needs to spend it quick!
someone for 6 months into CSG, specifically tasked with building a DoC
private cloud. Essentially she said that Exec Committee has found some
significant pot of money which needs to be spent this financial year.
Line 29: Line 72:
Suppose, for instance, the group needed N nodes x 100% of underlying VM host
x M months [and then less thereafter].
Various discussions with PJM and AON followed. Project will definitely
proceed. In two stages:
Line 32: Line 75:
Susan also added "and it should just scale without limits, manage itself magically",
which is less realistic:-) Saving RAs significant informal sysadmin time
is a goal.
 1. a 6 month phase to build a prototype cloud, recruiting a "Cloud Manager" person to join CSG, either temporary or permanent. The Department will spend some significant amount of money, perhaps in the £100-200K range.
Line 36: Line 77:
Various discussions with PJM and AON followed, Jeremy decided not to accept
the job, DoC still wants to hire a "Cloud Manager" as part of CSG.
Most crucially: the Dept decided it has money now, not next year,
and that (despite not knowing the exact spec, services to provide, let
alone how to implement them) we therefore needed to purchase all the kit
 2. assuming the prototype cloud is successful, it will move into production and the "Cloud Manager" become permanent. Researchers then have the option of adding hardware to the cloud. All members of CSG will become skilled in cloud-related topics, and the "Cloud Manager" will do non-cloud-related problem solving too.

Most crucially: (despite not knowing the exact spec, services to provide, let
alone how to implement them) we therefore need to purchase all the kit
Line 43: Line 83:
£150K or even £200K - we are tasked with providing possible plans for £150K or even £200K - we will provide possible plans for
Line 49: Line 89:
replication). So far: it's not there yet.  Alternatives need to be looked
at as well..
replication). So far: it's not there yet, at least as a fast POSIX filesystem.
Alternatives need to be looked at as well..
Line 52: Line 92:
== Working Group: 3rd April 2012 meeting == == Staff Working Group meetings ==
Line 54: Line 94:
A working group of academics has been set up, this met on 3rd April 2012
for the first time. Things discussed:
[[internal/project/privatecloud/meeting-2012-04-03|Staff Working Group Meeting 1 - April 3rd 2012]]
Line 57: Line 96:
- PJM/Susan: background (spend money now, define services later), acknowledged
  unusual approach.. added (PJM) idea that a group can have a VM per project
  per year if they need, so they build new apps on the latest supported OS,
  while maintaining the ability to run their old versions on the older OS,
  allows people to try old code on new OS releases without "big bang" server
  upgrade problems. old VMs can eventually wither away..

- PJM: start with concept of: every student gets a VM as they walk in through
  the door, keep while at College, have root access [need to fix/avoid NFS
  problem]. users should have the ability to create more VMs programmatically,
  both short term and long term ones.

- PJM: also, are we all agreed: it's got to be a realiable production system.

- JAMM: use cases - projects into cloud technologies, pervasive computing exercises
  could be made more flexible [not sure how], some of her research involves
  streaming data from sensors, need high-capacity filestores.

- PRP: EPSRC call "every research grant puts in for a small cluster" by the
  name "vanity clusters". EPSRC favouring shared resources (Dept, College,
  federated) - will allocate at most first £10K of equipment, then excess
  must have matching funds from Dept! favours (for example) shared services,
  grids, clouds and HPC.

- PRP added: VMs can really speed up provisioning of research project kit,
  instead of purchasing kit, waiting for it to arrive, installing and configuring
  it, continuing to maintain it, then (after project) decide what to do with
  it, can create 16 short term VMs bound to suitable hardware very quickly, do
  quick experiments and release the VMs resources. If spare hardware capacity
  is in hand, of course! Like Julie, Peter added that research into cloud and
  distributed systems performance could be improved if we had a cloud which we
  could monitor and tweak.

- JD: 2 important aspects of cloud here:
  1. easily provisioned VMs; 2. amortization of all resources over multiple
  projects. The latter requires that researchers don't require all of their
  "own" resources "all" of the time - otherwise none spare!

- PJM/Susan: the matching funds model allows Dept to demand up to 50% of
  these shared resources [on average over time, perhaps front-loaded so
  "owners" get the majority of time up front, release nearly all resources
  later for general use].

- CCADAR: will sometimes need exclusive access to all "your" cluster VMs on
  all your hardware for experiments - repeatability is especially important.
  => need ability to pin VMs onto particular classes of node.

- PRP: Yes, and sometime experiments need to happen directly on the
  bare metal. but only a small minority!

- JAMM: performance monitoring very important.

- WJK: yes, including power monitoring of the physical VM hosts, a la picards.
  very useful.

- GCASALE: agreed, subtle point about frequency of monitoring being very
  different between cheap power mon and expensive power mon.. LDK discussing
  with him.

- SUSAN: Maja had mentioned that she makes a very large amount of use of
  Matlab, on Windows clusters, buying extra parallel licenses etc. PJM: why
  not use College standard license? DCW: believe extra modules and parallel
  licenses not included in College Matlab license. DCW added: Note that
  ICT HPC kit doesn't support Matlab for same reason!

- TORA: Lab are very interested in more continuous autotesting, need a better
  sandbox: like a short term VM to run student code in! Also very interested
  in scalable storage (didn't say why?)

- JD/SUSAN discussed: where are other Computing Depts with clouds? at any
  level (Dept, College, federated?) - answer seems to be: none known in
  production.

- DWM added that LESC had done lots of "cloud v1" - grid - related work,
  and mentioned the similarities between grids, private clouds and HPC.

- PRP said that we should make more use of ICT's HPC, as we're paying for it.
  Susan said: some use (PHJK, Kanwal), have found HPC team not very welcoming
  to DoC, sniffy about Java code. DCW said: yes, real programmers in HPC:-)
  DCW added: lots of money still going in though - let's use it. ICT also
  upgrading to VMware ESX 5, which "supports cloud" (buzzword alert).
  DCW added: HPC doesn't even let you access College home dirs cos they're
  "not fast enough".

- PJM asked re: this - does everyone want DoC home dirs and research volumes
  accessible from VMs? everyone agreed, and several people pointed out that
  existing fileservers can be saturated by Condor so need to scale more.
  => cloud storage needs to hold VM images and (some) scalable filesystem
  data too. not clear how much.

- DCW asked: what about Amazon S3 - simple distributed (key,value) storage
  system - do we need that? some people said "might be useful" but noone
  had a solid use case.

- WJK added that he'd love to do experiments using different speed storage
  eg. flash and raid levels.

- TORA added that a large scalable block storage system would be very useful,
  but neglected to say why.

- DWM said there seems to be a need for scalable storage at some level as
  part of the cloud, there are a variety of technologies - open source and
  commercial - to look at.

- PJM channeled PRP in saying that "commercial filers" should be looked into,
  think he meants NetApp/EMC stuff. Susan said DoC prefers open source if
  possible, PRP added that cloud storage is NetApp's bread and butter and
  their support and scalability was really good. DCW: look at.

- SUSAN reported that DR had initially said - CSG do everything his group
  needs, why need a cloud. However, when she asked him - want more scalable
  storage, his eyes lit up!

- DWM: so we conclude that scalable storage is very important?

- GCASALE asked: what type of cloud? private? DCW/PJM: yes. what about
  cloudbursting, Giuliano asked ? [what's that we said] - upload VMs to
  Amazon after development (or when need short term resources). PJM:
  useful if possible.

- "tall chap in green shirt": what about network bandwidth? 10Gb links?
  may also need bandwidth reservation in switch fabric. DWM: talking with
  ICT networking about 10Gb.

- "natasha's phd student in her place": their group are very interested in
  virtualizing algorithms and still using FGPAs and GPUs, and again more
  scalable storage is needed here.

- WL agrees, saying some VM hosts definitely need to have GPUs and FPGAs
  (he can provide details and costs). He added that he'd be very interested
  in "getting under the hood" and tweaking and monitoring how various aspects
  of the cloud operate. PJM said: may be contrary to production cloud - but
  perhaps a "sandpit cloud" could fork off the main cloud on occasion, grab
  some hardware etc. WJK agreed. DCW added that Amazon EC2 had VMs with
  access to GPUs and FPGAs etc in their pricing model.

- PJM talked about a cost accounting model, enforcing 50% maximum usage,
  WJK wondered whether anything that heavy was needed. (god knows how
  that's even implemented! perhaps logging use for post-analysis).

- JD asked: would we give access to people outside of DoC?
  DCW: no. PJM: might be open to sharing with ICT. JD: power of
  clouds - federating.
  
[[internal/project/privatecloud/meeting-2012-04-25|Staff Working Group/Open Meeting 2 - April 25th 2012]]

DoC Private Cloud

Services

Initially, something like the following services will be needed:

  • Virtual-machine hosting / automated provisioning facility (Amazon EC20-a-like).
  • Persistent backing-store for VM images (perhaps Amazon S3-style, perhaps Amazon Elastic Block Service-style, perhaps distributed filesystem).
  • High-performance POSIX file-store access / scratch areas (perhaps in-cloud or perhaps separate from the cloud).

In relation to scalable storage options, we've already been thinking about these for a while; see also: Storage-NG.

Candidate software to investigate, solving all or part of the private iaas cloud problem, includes:

  • OpenStack: claims to do most of the job - VM provisioning, interface with KVM, Xen etc (subproject "Nova") and S3-alike storage system (project "Swift"). Also can integrate to (eg. Ceph, GlusterFS) for filesystem-based distributed storage. Also, Nova has an iSCSI client storage plugin too.

  • CloudStack: Citrix open source project, donated to Apache Foundation, rather similar to the above.

  • VM seed/runtime image storage:
  • High-performance distributed POSIX file-stores:
    • Ceph distributed object store / block-device / filesystem. DWM evaluated: Conclusion - not ready yet.

    • Gluster distributed filesystem. DWM evaluating. Looking rather promising. No "storage head node", i.e. bottleneck!

    • MooseFS distributed filesystem.

    • Fraunhofer Parallel File System distributed filesystem, no storage head node, BUT NO REPLICATION, pure striping. Discovered by SM. Conclusion: not suitable.

At present, the options for scalable high-performance general purpose NFS filers suitable for storing home dirs, research volumes etc are either expensive or not yet mature. The above POSIX filestores may well be suitable for the "inner problem" of VM storage.

Upgrading existing NFS fileservers to 10Gb is definitely also worth trying (first).

  • Virtualization software:
    • Xen paravirtualization tools.

    • KVM (para)-virtualization tools.

    • VMware ESX virtualization software - commercial.
    • libvirt VM abstraction and management layer.

    • Ganeti VM management system.

    • OpenStack Nova VM management system [uses Xen, KVM or VMware under the hood].

The virtual-machine management layer will need to support accounting for resource utilization by the VMs spawned for a given user or group, live migration of VMs from one host to another, and will likely need to support automated backups / snapshots of at least SOME historical virtual-machine disk state. (Note that this differs from existing doctrine, which specifies that the machine-local OS data is expendable, and can be regenerated.)

The use of seed images, data de-duplication, and/or copy-on-write would also be valuable for minimising storage requirements.

Background

Sometime in early 2012, Susan told DCW that DoC were thinking of hiring someone for 6 months into CSG, specifically tasked with building a DoC private cloud. Essentially she said that Exec Committee has found some significant pot of money which needs to be spent this financial year.

She explained the core idea was "virtualisation even for research clusters": at present, research groups buy clusters when they have money, CSG set them up, install "linux du jour" on them, configure fileservers (if part of cluster), tape backups (if part), processing node special software etc.

Then the servers age, the OS is essentially frozen (it's often difficult to persuade researchers that we should reinstall their fileservers, webservers and compute nodes). They become "fragile". Sometimes it's hard to even retire them on schedule (4/5/6 years or whatever). Also these clusters are often only accessible by members of that research group so the resource may not be fully utilised.

Susan's vision: setup a private cloud, researchers add hardware to that cloud's core resources, then create a VM for each cluster node, perhaps tied (1-1 at first) to their own hardware, CSG install that virtual cluster node's OS, researchers work as before - but each node is encapsulated inside a VM. Later, these VMs could share resources - when the group don't need 100% resources, or new more powerful hardware is purchased.

Various discussions with PJM and AON followed. Project will definitely proceed. In two stages:

  1. a 6 month phase to build a prototype cloud, recruiting a "Cloud Manager" person to join CSG, either temporary or permanent. The Department will spend some significant amount of money, perhaps in the £100-200K range.
  2. assuming the prototype cloud is successful, it will move into production and the "Cloud Manager" become permanent. Researchers then have the option of adding hardware to the cloud. All members of CSG will become skilled in cloud-related topics, and the "Cloud Manager" will do non-cloud-related problem solving too.

Most crucially: (despite not knowing the exact spec, services to provide, let alone how to implement them) we therefore need to purchase all the kit having it delivered in July 2012, before the Olympics. PJM added "build a private cloud like Amazon EC2 does", AON suggested a budget of £100K, £150K or even £200K - we will provide possible plans for these price levels.

DWM has spent a lot of time evaluating Ceph as a possible S3/Elastic Block Store like storage system for supporting VM storage and possibly very high speed filesystems eg. staging areas for VM data (scaleout NAS with replication). So far: it's not there yet, at least as a fast POSIX filesystem. Alternatives need to be looked at as well..

Staff Working Group meetings

Staff Working Group Meeting 1 - April 3rd 2012

Staff Working Group/Open Meeting 2 - April 25th 2012

 
 

project/privatecloud (last edited 2013-11-13 19:27:43 by dcw)