Welcome!

Microservices Expo Authors: Elizabeth White, Zakia Bouachraoui, Pat Romanski, Liz McMillan, Yeshim Deniz

Related Topics: Microservices Expo

Microservices Expo: Article

BCP Lessons Learned and New Ideas for IT Infrastructure Continuity

Learn How to Justify the Creation of Disaster Recovery Facilities

Businesses in the southeastern United States have been hit hard with hurricanes in the last few years, and 2008 was no exception. As a project manager and CBCP for over 1600 disaster recovery deployments I can share real examples of how entire data centers were failed over to the DR operations center in preparation for hurricanes, while others (due to poor planning) did not have the same success. Those that were successful were efficient in organizing the RTO of their communication servers which helped them prioritize the recovery efforts as well utilize creative testing procedures in order to not disrupt normal business activity. The first priority of a BCP is to ensure the safety of the employees, but being able to communicate to those needed is also an important step for successfully executing a BCP. Because of this preparedness many businesses I have heard from were able to proactively allow their employees evacuate and still provide them remote access for business operations from almost anywhere. I will review a few of the  examples of architecture, solutions and best practices for exercising controls in those events as well as discuss what future technology may be utilized to better help justify the creation of disaster recovery facilities.

10 Professional Practices for BCP
There are ten professional practices for business continuity planning; all equally important and if followed appropriately will allow you to create a solid foundation to build upon. For the purpose of this article I will summarize the professional practices, but for more information visit the Disaster Recovery International Institute (www.drii.org). DRII is an excellent resource for BCP and is a consortium of business continuity professionals dedicated to setting industry standards and sharing knowledge around the practice of business continuity management.

The first step in building a BCP is Program Initiation and Management. This step is designed to establish executive approval, support and justification for the need of a resiliency program. Start with building a dedicated team that is committed to supporting the BCP initiative and selecting team members that can effectively manage roles and responsibilities for their portion of the plan. Cost justification is often a hurdle in establishing the need for disaster recovery facilities, so one tip would be to utilize your current assets such as other offices or co-location facilities. You can also work with the IT department to help tie in the IT management budget into the BCP so that you are not just providing continuity in the event of a disaster, but also high availability for day-to-day operational maintenance.

The next couple of steps are important in determining the risk (risk evaluation) your organization faces from either a natural or environment disaster perspective and then determine the business impact (BIA) should one of those events occur. This will help you determine the next step in the business continuity strategy you design and implement to meet your defined recovery point (RPO) and time (RTO) objectives. Once those objectives and controls are defined you will need to integrate emergency response and operations in order to define the process in which a disaster is declared and what prompts the initiation of the BCP.

These previous steps are what allow you to design and implement a comprehensive strategy that meets the requirements of your company’s objectives. I have seen companies try to short cut these previous steps and immediately skip to implementing a solution, only to find out that their infrastructure doesn’t have enough power, bandwidth, resources and or executive approval to support the controls implemented. So the lesson learned is, don’t try to take short cuts and jump into something you have never done before. Following the previous steps will allow you to proceed and likely prevent challenges you may face during the deployment and execution of your plan.

The next three steps include designing and implementing the BCP, generating awareness and training your organization on what to do in event of a disaster, then exercising those plans regularly. Exercising BCP is typically recommended to be tied to your change control process which means the plan should be reviewed any time there is a change within the organization that may affect the plan. (That can be anything as small as a software update to some of the business critical servers to a BCP member leaving the company.) Depending on the situation, exercises could take place as frequently as once a month or at very least 2-3 times per year so that there is a consistent awareness of the plan and procedures.

The last two practices, crisis communication and coordinating with external agencies is really the culmination of the previous practices and will ultimately be the success or failure of your plan. In the event of a disaster, communication is critical to coordinating with emergency responders and your own business continuity team to make sure evacuations and safety procedures are implemented effectively.

When Planning and Exercising is Done Right
Planning is your best friend when it comes to rolling out controls for a business continuity solution. Starting with executive buy in though budget, infrastructure, process, procedures, testing and ultimately execution you can’t plan enough. And when it’s done right deployments go smoothly. However, is more than one way to go about this. As the saying goes “Don’t eat the elephant all in one bite”. Breaking down your overall rollout plan into smaller projects will help you better manage details as well as prioritize the order of the overall deployment. Here are some quotes from companies who did it right and were glad they did after Hurricane Ike made landfall:

  • “All is OK and thanks. Our files were mirrored to our Austin facility with no loss of data or applications. Winds tore a 30'x30' hole in the building roof.  The water damage was bad.  The computer servers were spared but alot of workstations were soaked.  Houston operations were running in Austin just before the hurricane hit and the transfer was seamless.”
  • “Thanks, our company is doing just fine. With our replicated data to one of our other locations, we were up and seeing patients once the patients could get to us. We appreciate your concern, and your overall support of our organization. On behalf of our organization, we want to say thank you!”
  • “Yes we did make it out alive; we activated our business contingency plan, and relocated to Dallas. Luckily our solution allowed us to failover and business continued. “

Exercising the business continuity plan on a regular basis helped these companies not only be prepared but assured that they were ready for anything. And with the adaption of new technologies for IT infrastructure, testing those plans are even easier to exercise while minimizing impact to production operations. In previous years testing business continuity plans for the data center usually required shutting down the entire production facility and running through the restoration process. With the adoption of real-time replication software, co-location facilities and virtualization testing can be accomplished with minimal impact to a production environment. If you have a dedicated disaster recovery facility with hot standby servers you could just segment the networks from each other and bring the site online. However, you had to be very careful about making sure those two sites weren’t talking to each other via domains or active directory services.

How Dynamic Infrastructure Is being used to facilitate BCP Exercise

Dynamic Infrastructure is defined by some as ‘the ability to rapidly move and provision workloads with security and inherent protection’. It may be a new idea to you, but it is being adopted within the IT community with great success. Dynamic Infrastructure not only simplifies the disaster recovery procedures for data center managers, but also provides the ability to use those same controls for day-to-day operations to keep your business operations available all the time - not just during disasters. With the adoption of virtualization technologies saving costs on hardware, power and cooling, data center management budgets can be combined with BCP for maximizing infrastructure availability. These technologies also assist BCP exercises by simulating recovery servers and sites without bringing down production servers. Some solutions like VMware® Site Recovery Manager have this feature but also have some inherent issues. For instance, in the event of a real disaster the virtual solution doesn’t have any failback capability. Typically once that process has been started there is no turning back without a complete restoration which could take days depending on the number of systems and or volume of data that needed to be restored. Dynamic Infrastructure provides the functionality that others are missing as well as allowing for rapid failback capabilities for smaller or “little d” disasters, which are more likely to impact a business critical system.

The Next Generation of BCP
With future technology delivering Dynamic Infrastructure, cloud computing and mobile communication devices, learning how they can protect IT infrastructure for Business Continuity Planning has never been more important. Many management services are offering remote or mobile access for initiating some of these data center management functions. Imagine if you could initiate a failover of a server via your iPhone or BlackBerry®. The reality is that it isn’t very far off. It’s  possible that many business- critical services could be run via cloud computing so that services are available anywhere they are needed - even if there was a disaster at the production facility.

However, this begs the question. Who is protecting the cloud and what is their business continuity plan?

More Stories By Brace Rennels

Brace Rennels is a passionate and experienced interactive marketing professional who thrives on building high energy marketing teams to drive global web strategies, SEO, social media and online PR web marketing. Recognized as an early adopter of technology and applying new techniques to innovative creative marketing, drive brand awareness, lead generation and revenue. As a Sr. Manager Global of Website Strategies his responsibilities included developing and launching global social media, SEO and web marketing initiatives and strategy. Recognized for applying innovative solutions to address unique problems and manage business relationships to effectively accomplish enterprise objectives. An accomplished writer, blogger and author for several publications on various marketing, social media and technical subjects such as industry trends, cloud computing, virtualization, website marketing, disaster recovery and business continuity. Publications include CIO.com, Enterprise Storage Journal, TechNewsWorld, Sys-Con, eWeek and Peer to Peer Magazine. Follow more of Brace's writing on his blog: http://bracerennels.com

Microservices Articles
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
When building large, cloud-based applications that operate at a high scale, it’s important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. “Fly two mistakes high” is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee A...
Lori MacVittie is a subject matter expert on emerging technology responsible for outbound evangelism across F5's entire product suite. MacVittie has extensive development and technical architecture experience in both high-tech and enterprise organizations, in addition to network and systems administration expertise. Prior to joining F5, MacVittie was an award-winning technology editor at Network Computing Magazine where she evaluated and tested application-focused technologies including app secu...
Containers and Kubernetes allow for code portability across on-premise VMs, bare metal, or multiple cloud provider environments. Yet, despite this portability promise, developers may include configuration and application definitions that constrain or even eliminate application portability. In this session we'll describe best practices for "configuration as code" in a Kubernetes environment. We will demonstrate how a properly constructed containerized app can be deployed to both Amazon and Azure ...
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
Using new techniques of information modeling, indexing, and processing, new cloud-based systems can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. In his session at 18th Cloud Expo, John Newton, CTO, Founder and Chairman of Alfresco, described how to scale cloud-based content management repositories to store, manage, and retrieve billions of documents and related information with fast and linear scalability. He addresse...
The now mainstream platform changes stemming from the first Internet boom brought many changes but didn’t really change the basic relationship between servers and the applications running on them. In fact, that was sort of the point. In his session at 18th Cloud Expo, Gordon Haff, senior cloud strategy marketing and evangelism manager at Red Hat, will discuss how today’s workloads require a new model and a new platform for development and execution. The platform must handle a wide range of rec...
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY. DatacenterDynamics is a brand of DCD Group, a global B2B media and publishing company that develops products to help senior professionals in the world's most ICT dependent organizations make risk-based infrastructure and capacity decisions.
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true ...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...