= NetApp maintenance procedures  =

== Overview ==

This guide describes the most common procedures to mantain a NetApp system.

   1. [[#phonehome|Notify NetApp People that some maintenance is being done]]
   2. [[#backupconf|Backup the current NetApp config before changing it]]
   3. [[#faileddisk|Change a failed disk]]
   4. [[#sslregen|Update SSL certificate]]


<<Anchor(phonehome)>>
== 1.Notify NetApp People that some maintenance is being done ==

Firstly, remember that a NetApp system phones back home to report
any problem.
When you want to do some maintenance works, you will have to log
this in the log system that NetApp support looks at:

{{{
  babbler01> options autosupport.doit "-------------YYYYMMDD_maintenance_START------------------"
}}}

and after you finish your maintenance works remember to log it as well:

{{{
  babbler01> options autosupport.doit "-------------YYYYMMDD_maintenance_END--------------------"
  babbler01> options autosupport
}}}

Secondly, note that you can twick the autosupport feature as follows:
{{{
  babbler01> options autosupport.partner.to netapp.support@qassociates.co.uk
  babbler01> options autosupport.enable on
}}}

Thirdly, remember to do it for both babbler01 and babbler02.


<<Anchor(backupconf)>>
== 2.Backup the current NetApp config before changing it ==
 

Before changing something in the NetApp configuration, backup the current one.
For example for babbler01:

{{{
  babbler01> config dump -v 20121102babbler01.cfg
}}}

If you want to have a look, it might be easier from oriole:

{{{
  root@oriole:/srv/babbler01_vol0/etc/configs# wc -l 20121102babbler01.cfg
  2475 20121102babbler01.cfg

  root@oriole:/srv/babbler01_vol0/etc/configs# du -hs 20121102babbler01.cfg
  128K	20121102babbler01.cfg
}}}


<<Anchor(faileddisk)>>
== 3.Change a failed disk ==

Note there is a distinction between a completely failed disk
and a partially failed one.

"A disk that is completely failed is no longer counted by Data ONTAP 
as a usable disk, and you can immediately disconnect the disk from 
the disk shelf. However, you should leave a partially failed
disk connected long enough for the Rapid RAID Recovery process 
to complete" is the description provided in NetApp documentation.

Check the status of the disks:
{{{
   babbler02> aggr status -f
   Broken disks (empty)
}}}

Determine the physical location of the disk you want to remove 
from the output of the command below:
{{{
  babbler02> aggr status -s 
  Spare disks
  RAID Disk	Device  	HA  SHELF BAY ... ...
  ---------	------  	------------- ... ...
  Spare disks for block checksum    
  spare   	0a.00.11	0a    0   11  ... ...
  spare   	0a.00.13	0a    0   13  ... ...
  ... ...
}}}

The location is shown in the columns labeled 
HA, SHELF, and BAY.


Remove the disk from the disk shelf and replace it with a new one. Suppose the replaced disk is 0a.00.20. The disk has to be assigned to a controller:

{{{
babbler02> priv set advanced
babbler02> disk assign 0a.00.20 
}}}


http://seriousbirder.com/blogs/netapp-disk-failure-and-replacement-procedures/

<<Anchor(sslregen)>>
== 4.Generate a new SSL certificate ==
On babbler01 ontap CLI (can use either main CLI or service processor):
{{{
priv set advanced
secureadmin setup -f -q ssl t GB London "South Kensington" \
 "Imperial College of Science, Technology and Medicine" \
 "Computing Department" babbler01.doc.ic.ac.uk \
  help@doc.ic.ac.uk 1024
}}}
On babbler02:
{{{
priv set advanced
secureadmin setup -f -q ssl t GB London "South Kensington" \
 "Imperial College of Science, Technology and Medicine" \
 "Computing Department" babbler02.doc.ic.ac.uk \
 help@doc.ic.ac.uk 1024
}}}

Ref: http://community.netapp.com/t5/OnCommand-Storage-Management-Software-Discussions/OnCommand-System-Manager-recieves-error-500/td-p/93621/page/9