Thursday, November 21, 2013

Applying military strategy to IT problems

The OODA loop concept was developed more than 50 years ago to aid military combat. Today, it makes perfect sense for solving complex systems issues.

As a U.S. Air Force pilot and veteran of the Korean and Vietnam wars, Colonel John Boyd is well known in military and business circles for his impact on strategic decision-making. Boyd developed a concept called the OODA loop (for observe, orient, decide, and act) to aid military combat. The idea came about as a way to handle decision-making under pressure. 

The basic premise was that victory could occur by creating situations in which one can make appropriate decisions faster and catch the opponent off guard. Harry Hillaker (chief designer of the F-16) said of the OODA theory: "Time is the dominant parameter. The pilot who goes through the OODA cycle in the shortest time prevails because his opponent is caught responding to situations that have already changed."

This is exactly the scenario that IT managers face today given the continuous change within cloud computing, virtual systems and modern, web-based applications.  The confluence of cloud computing and open source code has made it easier than ever to quickly deploy systems into production and iterate frequently. 

However, this agility often comes at a price: as the pace of development speeds up, IT operations staff are constantly playing catch up to the current state of the system. Visibility and context suffer. Ops and Dev people become adversaries instead of allies. In modern IT operations, the OODA loop process can help companies stay ahead of issues and react quickly and accurately. Here’s the construct:

Observe

Through collecting early and detailed information into infrastructure and application changes, IT has the foundation to make winning decisions that save the day. This relies upon the ability to acquire quality data, all the time. In the past, the constraints of data engineering techniques and hardware have made it difficult to obtain a high-quality data set without maxing out resources. The result has been a low-resolution slideshow that is a vague abstraction of the dynamic system it’s meant to represent. Now, advances in computing power and the advent of low-cost, on-demand cloud computing and SaaS have made it viable to gather and analyze terabytes of real-time data. Modern monitoring tools are like purchasing the latest digital cameras: you get a sharp, high resolution picture of application and network behavior by the second. This ultimately feeds insights that are critical for managing performance.

Orient

Orientation is the process of contextualizing data in a larger narrative to gain insight. In a typical monitoring system this is the point where human and machines interact. The system summarizes data as graphs, tables, dashboards, and whatever else. It’s the job of the operator to absorb these representations and construct the narrative behind them. The ability to review the data and see a clear timeline and a storyline is a vital modern IT operations requirement. This provides both the bigger picture view as well as all the data points that relate to it. It takes experience to be able to weave raw data together into the big picture and the tools should help augment this experience and guide operations professionals to the right conclusions.

Decide

As it grows, the narrative guides IT Operations towards the contributing factors in a pending failure. An important point: there is no single root cause of application issues in modern, hybrid or cloud-based environments. The larger the failure, typically the more complex are the causes behind it, which often leads to slower resolution. Therefore, it’s important to take immediate action when an issue is critical, based on the data you have now.

Act

In the above scenario, the operator will adjust the request timeouts in the Web tier to 1 second. The premise is that adjusting latency will significantly lower the rate of request failures. The management system should provide real-time feedback as to whether the action you take is actually the right one. However, as in science, positive results are only tentative confirmation.  Only through successive iterations of observation and orientation will the theory hold firm. 

Underpinning the four steps for completing the OODA loop is speed: the loop must run faster than the rate at which the system can change. If the feedback from action or inaction takes too long to observe or orient, then decisions will be made against either stale data or data resulting from a natural change in the system dynamics. 

For instance, if IT takes action after traffic has already peaked for the day, the problems may naturally resolve regardless of the intervention.  Worse, the action may deepen the problem and without high-resolution data, the regression may fly under the radar until traffic once again starts to rise.

The convergent trends of cloud computing and open source software are principally about enabling agility when developing and deploying code in production. This newfound agility breaks the assumptions of existing monitoring solutions, which grew up in an era of static computing infrastructure. 

The future of monitoring in the cloud is high resolution, real-time and optimized for rapid integration of data into a narrative model. Part and parcel to technology is organizing people around the cause. In resolving multifaceted performance issues, there’s rarely just a single person involved. 

Enabling streamlined collaboration between different parties, preferably in the DevOps fashion, is imperative to supporting the model of OODA and fostering business agility. Companies want to realize the benefits of cloud computing― cost savings, flexibility and driving innovation, to name a few. Making the best decisions rapidly can be a life or death matter for cloud computing in the modern IT operations environment.

About the author:  Cliff Moon is CTO and Co-founder of Boundary.

0 comments:

Post a Comment

Appreciate your concern ...