Sunday, February 24, 2013

Issues and Workarounds – vSphere Site Recovery Manager with the NetApp Storage Replication Adapter 2.01

I have just completed a project where I had to Install and configure VMware vSphere Site Recovery Manager.  Storage was provided by NetApp FAS and V-Series filers, thus I had to use the NetApp provided Storage Replication Adapter.  As of the time of this writing the latest version was 2.01.  True to form I ran into a couple of bugs, which took a bit of figuring out.

Unable to add a controller: “Error: SRA command ‘discoverArrays’ failed”

Execute the following commands on your filers

  • options httpd.admin.enable on
  • options httpd.enable on
  • options httpd.admin.ssl.enable off

Error when adding an Array Pair: “Internal error: std::exception 'class Dr::Xml::XmlValidateException' "Element 'SourceDevices' is not valid for content model: '(SourceDevice,)”

There are two solutions to this issue

  • Downgrade back to NetApp SRA version 2.0.0
  • Manually include the lists of volumes you want discovered by the SRA.  You’ll need to do this on both controllers in the pair.

This is a documented bug

Reprotect Job fails after recovering to Disaster Recovery Site

The SRM / SRA timeouts seems a bit aggressive to me.  This is highlighted when you do a reprotect on a failed over Protection Group.  Part of the task sequences is to reverse the direction of replication, but this fails consistently due to the SRM not waiting long enough for this reversal to take place.

You can kludge it by:

  • Re-running the reprotect until it works
  • Manually refresh the Array Manager while the Re-Protect job is running

Recovered DataStores have snap-xxx prefixes

More of a cosmetic irritant than a true bug, I wanted this fixed nonetheless.

  1. Within SRM, right-click your site and select Advanced Settings
  2. Click StorageProvider
  3. Select the storageProvider.fixRecoveredDatastoresNames check box

Tip

I would suggest increasing your SRM SAN provider timeout settings to something a bit more sane, like double.  Instructions can be found here.

Also make sure that the ALUA settings on your iGroups in both the protected and recovery sites are the same.