Does intent-based networking require rip-and-replace?
- July 11, 2018
- Posted by: Sebastian Grabski
- Category: Network Automation, Networking, NFV
Intent-based networking is hitting the headlines lately, mostly due to an intensive marketing campaign by Cisco. Cisco is an absolute behemoth in the networking landscape and we should pay attention to their messages, as sometimes they introduce real game changers like Application-Centric Infrastructure. What is this intent-based networking then? Andrew Lerner from Gartner defines the Intent-Based Networking Solution (IBNS) as a system which has following traits (direct quote from the article):
- Translation and Validation – The system takes a higher-level business policy (what) as input from end users and converts it to the necessary network configuration (how). The system then generates and validates the resulting design and configuration for correctness.
- Automated Implementation – The system can configure the appropriate network changes (how) across existing network infrastructure. This is typically done via network automation and/or network orchestration.
- Awareness of Network State – The system ingests real-time network status for systems under its administrative control, and is protocol- and transport-agnostic.
- Assurance and Dynamic Optimization/Remediation – The system continuously validates (in real time) that the original business intent of the system is being met, and can take corrective actions (such as blocking traffic, modifying network capacity or notifying) when desired intent is not met.
When we analyze those four bullets and try to map it to what exists today, I have a feeling that a proper orchestrator, driven by declarative language combined with closed-loop orchestration (CLO) will meet the requirements of IBNS. Moreover, if this is the case, there would be no need to rip-and-replace existing networks. All we need is to apply the proper orchestration discipline. Does this sound to be good to be true? Let’s analyze it.
Let’s first zoom in on “Translation and Validation” and “Automated Implementation”. These are two points that describe a declarative language-driven orchestrator. In declarative language, we declare what we want to achieve and the orchestrator parser will process this into how it will be implemented. A good example to illustrate this is an application which needs to communicate north-south (NS) and east-west (ES). Let’s see how this works.
In a datacenter, we have the CRM (crm) application which is listening on TCP 8888 and is hosted on a virtual machine (vm_host_1). This CRM application needs to communicate with an internal application (app_int) and an external application (app_ext). For the sake of this example we will consider this communication as “east-west” (E-W) and “north-south” (N-S). The internal application is listening on TCP 8080 and external application is listening on TCP 9000. On N-S direction we have firewall and on E-W we have router.
The fact that we have a firewall and router implies there is a need to perform some specific configuration. In the case of the firewall, let’s say it is a firewall rule and in the case of the router, let’s say it is an access-list. The specific implementation is irrelevant and out of scope of this document. What is important is the fact that we have two different methods of implementation. Today this is a FW and access-list and tomorrow it may be replaced by an ACI fabric or even something else. We need to find a way to describe/declare what we are trying to achieve. This is where the TOSCA-driven DSL can come in very handy.
TOSCA, in a nutshell is a language which describes a service in the form of a topology which is based on nodes and relationships. It was built to describe cloud applications, however its grammar applies superbly to network environments. After all, network environments are nothing more than a combination of nodes and relationships, aren’t they?
Let’s see how we can solve our problem with a TOSCA-based DSL. Cloudify’s DSL is leverages the concept of nodes and relationships. Relationships are key as this relationship will reflect the intent and “hide” imperative actions needed to implement given connection:
In a model we find following nodes:
Our CRM application definition:
– [tcp, 8888]
– type: cloudify.relationships.connected_ew
– type: cloudify.relationships.connected_ns
– type: cloudify.relationships.contained_in
As we can see, we’re declaring what we want to achieve: “I want the crm application to be connected to app_int and app_ext”. That’s it!
Let’s take a minute to understand relationship definition as this is where the “secret sauce” is. The secret sauce is, in fact, that we’re stating how given relationship are implemented:
As we can see, we’re leveraging the rtr_plugin which is implementing connection create and delete methods. In the future, we may have a need to change the router from vendor A to B and our model will still hold as only the implementation needs to be changed. This is a fantastic benefit as it brings proper abstraction of infrastructure.
In short, we did nothing more than what is required by Gartner to create IBNS: “Translation and Validation” through Cloudify DSL and “Automated Implementation” through Cloudify orchestrator.
How about the remaining two: “Awareness of Network State” and “Assurance and Dynamic Optimization/Remediation?” This is where closed loop orchestration architecture comes into play. In order to get the network state we need to collect metrics that represent the state and we also need policy enforcement to dynamically change this state and provide remediation.
Closed loop orchestration
The purpose of closed loop orchestration (CLO), also known as a feedback loop, is to change system state in response to an event (or set of events). In order to change system state, very often we need to understand its model, and this is the role of a proper orchestrator.
Let’s take a closer look at the CLO concept. First, we need to have an orchestrated object – these are the IT system elements which state we need to observe and potentially change. It can be as simple as a single host server which is running some application or more complex such as a cluster of applications. It can be a single virtual firewall or complete, virtual security perimeter which includes multiple virtual firewalls, load balancers and virtual edge routers. Why are we discussing mostly virtual here? Because VNFs are more dynamic in nature than PNFs and CLO is more relevant to them – however there may be the cases where CLO is relevant also to PNFs.
Next we need something which collects metrics. Which metrics? That depends on the system. They can be as simple as CPU or number of VPN users – but also more complex “composite metrics” which are calculated based on multiple parameters. In a nutshell, metrics are a currency in which we measure performance of a given system. What is more and more common is big data analytics which calculate those metrics based on historical data and complex heuristics.
Once we have a metric, then we need something that will decide what to do based on it. This element is called a policy. The policy engine observes/fetches metrics, processes them and enforces the action. Actions represent some lifecycle action which will change the state of orchestrated object. Good examples from NFV can be “scale-out”/”scale-in”, “heal” or even “change placement”. The policy engine is not responsible for executing an action. It only enforces a given policy and tells the orchestrator what to do. It is the orchestrator which acts upon orchestrated objects and implements the given lifecycle action.
We can debate if a metrics collection and policy engine should be part of an orchestration system. There are cases where this is possible and expected, but there’s no general guidance on that. An example could be where they need to be decoupled to give depth and breadth of functionality. This is especially for metrics collection. If our system is very complex and requires composite metrics which are calculated based on big data analysis – then it is better when this system is external to the orchestrator. Same with the policy engine. There are many on the market and if someone is used to some policy engine – then why to push him to use something different than what he is used to. What is important is OPENNESS. Good orchestrators should be capable of integrating with external metrics collection and policy engines.
Cloudify has a good record of closed loop orchestration. A few years ago, when this topic was not that common, Cloudify implemented CLO in Cloudify Manager. The system was based on the Diamond agent and metrics streaming to the Riemann policy engine: https://docs.cloudify.co/4.3.0/about/manager_architecture/metrics-flow/. It was a very innovative approach back then. It allowed us to create dynamic systems where state is changed based on a given metric.
In summary, we just demonstrated an Intent-Based Networking System (IBNS) using existing concepts, today. Why does this matter? Because we can apply intent-based networking concepts to existing networks. If we’re dealing with network elements which have decent API’s and programming interfaces, we can orchestrate them with declarative language-driven orchestration and manage their lifecycle in a feedback loop. For Telcos, this is priceless. We do not need to rip-and-replace existing networks to make them more intelligent. We just need intelligent systems to manage them. Sweat your assets dear network owners!