= NetApp maintenance procedures = == Overview == This guide describes the most common procedures to mantain a NetApp system. 1. [[#phonehome|Notify NetApp People that some maintenance is being done]] 2. [[#backupconf|Backup the current NetApp config before changing it]] 3. [[#faileddisk|Change a failed disk]] 4. [[#sslregen|Update SSL certificate]] <> == 1.Notify NetApp People that some maintenance is being done == Firstly, remember that a NetApp system phones back home to report any problem. When you want to do some maintenance works, you will have to log this in the log system that NetApp support looks at: {{{ babbler01> options autosupport.doit "-------------YYYYMMDD_maintenance_START------------------" }}} and after you finish your maintenance works remember to log it as well: {{{ babbler01> options autosupport.doit "-------------YYYYMMDD_maintenance_END--------------------" babbler01> options autosupport }}} Secondly, note that you can twick the autosupport feature as follows: {{{ babbler01> options autosupport.partner.to netapp.support@qassociates.co.uk babbler01> options autosupport.enable on }}} Thirdly, remember to do it for both babbler01 and babbler02. <> == 2.Backup the current NetApp config before changing it == Before changing something in the NetApp configuration, backup the current one. For example for babbler01: {{{ babbler01> config dump -v 20121102babbler01.cfg }}} If you want to have a look, it might be easier from oriole: {{{ root@oriole:/srv/babbler01_vol0/etc/configs# wc -l 20121102babbler01.cfg 2475 20121102babbler01.cfg root@oriole:/srv/babbler01_vol0/etc/configs# du -hs 20121102babbler01.cfg 128K 20121102babbler01.cfg }}} <> == 3.Change a failed disk == Note there is a distinction between a completely failed disk and a partially failed one. "A disk that is completely failed is no longer counted by Data ONTAP as a usable disk, and you can immediately disconnect the disk from the disk shelf. However, you should leave a partially failed disk connected long enough for the Rapid RAID Recovery process to complete" is the description provided in NetApp documentation. Check the status of the disks: {{{ babbler02> aggr status -f Broken disks (empty) }}} Determine the physical location of the disk you want to remove from the output of the command below: {{{ babbler02> aggr status -s Spare disks RAID Disk Device HA SHELF BAY ... ... --------- ------ ------------- ... ... Spare disks for block checksum spare 0a.00.11 0a 0 11 ... ... spare 0a.00.13 0a 0 13 ... ... ... ... }}} The location is shown in the columns labeled HA, SHELF, and BAY. Remove the disk from the disk shelf and replace it with a new one. Suppose the replaced disk is 0a.00.20. The disk has to be assigned to a controller: {{{ babbler02> priv set advanced babbler02> disk assign 0a.00.20 }}} http://seriousbirder.com/blogs/netapp-disk-failure-and-replacement-procedures/ <> == 4.Generate a new SSL certificate == On babbler01 ontap CLI (can use either main CLI or service processor): {{{ priv set advanced secureadmin setup -f -q ssl t GB London "South Kensington" \ "Imperial College of Science, Technology and Medicine" \ "Computing Department" babbler01.doc.ic.ac.uk \ help@doc.ic.ac.uk 1024 }}} On babbler02: {{{ priv set advanced secureadmin setup -f -q ssl t GB London "South Kensington" \ "Imperial College of Science, Technology and Medicine" \ "Computing Department" babbler02.doc.ic.ac.uk \ help@doc.ic.ac.uk 1024 }}} Ref: http://community.netapp.com/t5/OnCommand-Storage-Management-Software-Discussions/OnCommand-System-Manager-recieves-error-500/td-p/93621/page/9