Disaster Recovery

Hi,

I just want to throw something out there for opinions and viewpoints relating to a Disaster Recovery site.

Besides the live production environment, do you think a DR environment should include:

  • pre-production environment
  • QA Environment

......or would this be considered to be OTT and a waste of resources?

Thanks.
D.

Disaster Recovery should include whatever is deemed 'mission critical'.

For most businesses, that would mean getting the production systems back up and running. In this category, relying on backups for those two systems should suffice.

However, if your business is software development and support, then you would probably need Pre-Production and QA systems also.

In thirty years of work with technology, whenever there has been some failure, all resources go towards restoring and supporting users and systems. Development takes a hiatus, or pause.

My first thoughts on the topic...

Kinda depends. Is your goal to keep the invoices and paychecks flowing for a few days or weeks while they restore power and mop the mud out of your server room, or to keep doing all or most aspects of your business for months or more while your main office complex and data center are rebuilt. :eek:

It depends who is paying for it and what they are prepared to pay for :wink:

  • If you are having your own hardware in another site you own, that can be expensive, although it is convenient for testing. This would allow:-
    [list]
  • SAN replication, so fully up-to-date restore
  • DR testing before and after major server upgrades or application changes
  • Ability to switch sites for building maintenance that requires a power off
    [/list]
    However, it costs:-
    [list]
  • Hardware
  • Maintenance
  • Power
  • Checking routine (yes I've been there when we went to a test to find the single boot device broken)
  • Off-site media handling from two sites
  • Network management
  • etc.
    [/list]
  • If you are hiring servers from a 3rd party, then that can be cheaper but less convenient. It gives you:-
    [list]
  • Reduced cost
  • Hardware maintenance free
  • Simpler network management (although still plenty to do)
    [/list]
    But it costs you
    [list]
  • Flexibility of testing time
  • Ability to move your workload to perform work on the building that would affect services (e.g. power off)
    [/list]

You have to work out where you sit and based on that, you can then decide on your capacity needs. The recovery process should be the same for whichever you choose, else you will never be sure what you are recovering. Try to get as simple a process as possible that is generic for your servers.

If you are using a 3rd party provider, then often they will help you work things through. It is their business after all, so they want to help.

I hope that this helps,
Robin
Liverpool/Blackburn
UK