Protecting business data and services must not be solely left to buying hardware and software. Technologists and leaders must come together. Data and systems are critical components of most companies and protecting it goes hand in hand with the ability to run a business. Without it, most companies will not be able to deliver services or products to its customers and sooner than later the chances of falling off the wagon increase. Recent studies report that a company that experiences a system outage and/or substantial data loss lasting more than ten days will find it very difficult to recover financially and that fifty percent of such companies will be out of business within five years. For this, it is critical for organizations to understand the type of risks that can cause data lost and/or service disruption to properly implement cost effective solutions that safeguard business operations and customer satisfaction.
A very common problematic
Although business leaders recognize the importance of keeping data and systems safe, very often solutions are put in place without properly understanding the specific risks that threatens their availability. More than often such solutions are put in place without at least a simple framework of processes that monitors the effectiveness and operations of the solution. As a result, the first problematic that needs to be addressed when trying to implement an effective disaster recovery solution, is to understand the strategic importance of systems, services, and data in relation to business operations. Once this is understood, a more comprehensive analysis must follow to understand the specific threats that can potentially disrupt critical business services.
Second, IT is often seen as an expense to the company and very often businesses end-up with a half way solution that solely focuses on copying data from one place to another and ignores aspects like Recovery Time Objective (RTO) and Recovery Point Objective (RPO), two critical aspects of a successful disaster recovery solution. An effective recovery solution does not need to be one that incurs in prohibited costs to the company, but one that effectively manages the risks and probabilities of occurrence of an incident with the potential cost of downtime and/or data loss. Once this is understood, the costs associated with a disaster recovery solution begins to make a lot more sense and they are justified. This in turn, will provide a financial framework that will guide the business to a reasonable and justifiable disaster recovery solution.
So far, we have touched on the most basic and fundamental pieces of a disaster recovery solution, which are: understanding the strategic business importance of specific systems and services and effectively managing the risks and probabilities of occurrence of a threat(s) and the potential impact that it can have on critical business operations and services.
Because of this, it is extremely important to hire an IT support company, managed service provider, or an IT professional/consultant that understands these principles. Very often, solutions only take a technical aspect and fail to effectively allocate resources and solutions where the business really needs it. The solution must extend beyond full, incremental, or differential backups and consider/asses real risks, probabilities of occurrence, business impact, and total cost of ownership as well as the related processes that assures disaster recovery and business continuity.
So what is a risk assessment? A risk assessment is the exercise of determining the quantitative and/or qualitative value of risks in relation to a concrete/tangible situation and a recognized threat. Although it might sound complex, completing a risk assessment is easier than what it really sounds.
Conducting a risk assessment comes down to calculating/estimating the risk associated with each potential threat. Example, if your company computer room or datacenter is located in South Florida, then one of the most important risks/threats to evaluate are hurricane and tropical storms. To adequately calculate the potential risks of a threat you will need to consider the probability (P) of occurrence of an event and the impact (I) that it can have on the services provided by the company. Later on in this blog I will explain some of the most common practices for assigning an approximate risk value to a threat and how to come up with a risk factor for each potential threat.
Not long ago we conducted a risk assessment for a company and senior executives were surprised at the type of threats that their infrastructures were facing and the potential operational and financial impact that it could have in the business. There was no need to go over lengthy explanations and the report was an obvious wake up call. The very simple threats that for many years went under the radar were the most obvious but very often neglected.
Some of the most common threats to IT infrastructures, data, and services include but are not limited to: fire, loss of network connectivity (local or public), electrical failure, hurricanes, tornados, electric storms, floods, hardware failure, security exploits, and act of terrorism among others. Although this list does not include all the potential threats that IT infrastructures and services face, they are a really good recollection of the most frequent threats.
Assessing the risk
So far we have talked about the importance of strategically protecting business services/data and the need to efficiently and effectively allocate resources to diminish or mitigate these risks. We then moved to explaining what a risk assessment is and the common threats faced by IT infrastructures, data, and critical business operations. So now that we have laid out the why and a general idea of the how, it is time to come down to the details – assessing the risks and producing a risk matrix that can help define the steps to efficiently and effectively protect the business.
The first step is to research and list the type of events/threats that can disrupt business services depending on the geographical location from where those services are rendered. Then you need to continue and consider other threats including but not limited to electrical power failure (EPF), internal and external security exploits/hacking, lack of failover mechanisms/solutions (active-active, active-passive clusters, load balancing, etc.), act of terrorism or sabotage. The intention here is not to go crazy and list every potential threat, but instead, to list the most evident and realistic threats that can affect critical systems, data, and services.
After creating a column with the list of potential threats, then we must create two columns to the right to assess the probability (P) of occurrence of each threat and the possible level of impact (I) that it can have on business operations. For each threat, we then need to assign a value between 1 and 10 for the probability (P) of occurrence (how likely is that it can occur) and for the level of impact (I) that it can have in business operations. The higher the number, the greater level of probability of occurrence and impact. After we have assigned values of probability and impact to each threat, then we need to multiply the Probability (P) and Impact (I) values to come up with a risk factor for each potential threat. It is important to note that these numbers do not have to be scientifically calculated as they are intended to provide an idea of the level of risk (or risk factor) associated with each threat. This will give you a very clear idea of where the focus should be.
Once we have come up with a risk factor for each potential threat, then we need to generate what is called a Risk Treatment Matrix. A Risk Treatment Matrix is a visual representation of how we should treat each of the potential identified threats to effectively mitigate, minimize, and monitor. The Risk Treatment Matrix helps you define which threats should be prevented, accepted, contained, or just know how to react when they occur. It provides a very clear and effective indication of how each risk should be treated to avoid under or over allocation of resources while effectively protecting critical business operations.
Effectively protecting a business from threats that could disrupt critical operations takes a lot more than just replicating or moving data, implementing RAID solutions, and/or buying a backup software. Protecting a business takes at first understanding what is critical to the operations of the business and then effectively managing the risks and probabilities of occurrence of a threat(s) and the potential impact that it can have on the business. When doing so, it is critically important to use the services of knowledgeable IT consultants, IT Service Companies, and/or Managed Service Providers that are experienced in conducting risk assessments. Remember, technology exists in a business to increase productivity and help the business achieve its goals and not to become a laboratory for technology lovers. Technology needs to make business sense.
Was this post helpful? Feel free to post comments, questions, or topics that you would like us to discuss in the comments section below. Or send them to us on Twitter and Facebook.