How to track and improve network health with just one number
May 31, 2024The industry has measured equipment effectiveness for decades. Now it’s time to measure network effectiveness, too. Here’s what to know about overall network effectiveness (ONE), a new metric inspired by and modeled after overall equipment effectiveness (OEE) but designed specifically for networks.
There’s great power in understanding the efficiency and effectiveness of manufacturing operations. Only when users understand productivity levels in real-time can they evaluate the need for — and justify investments in — operational improvements.
Empowered with information about the performance of facilities and equipment compared to their full potential, you can take the right steps to reduce waste, minimize idle time and boost production volume.
When it comes to measuring manufacturing efficiency, OEE is considered the industry’s gold standard, used worldwide to measure availability, performance and quality in nearly every type of production process.
When customers know and understand their plant’s OEE KPI, then they understand how well a plant is operating and can compare it to peers to develop improvement targets.
Plant operations and productivity are impacted by equipment — And the OT network
OEE was created to identify equipment-related productivity losses and benchmark productivity improvement — but equipment isn’t the only contributing factor to a plant’s success. IT (information technology) and OT (operational technology) networks can influence plant productivity and digitization efforts just as much as equipment and machinery.
Despite how critical a network is to plant operations, no single-number KPI exists similar to OEE that network administrators and OT technicians can use to gauge network performance. This makes it difficult for them to answer questions like:
• How’s the network performing?
• How healthy is the network?
• Is the network doing what they want it to?
• Is the network creating productivity or downtime issues?
• How much room do they have on their network to expand operations in the future?
Building networks into the plant-performance equation
Managing networks effectively is just as essential as managing equipment performance and health. The impacts of poor IT and OT network performance can be devastating, leading to unplanned downtime, slow cycles, quality issues and even safety hazards.
Through frequent discussions with plant managers and industry leaders, it became clear to the industrial communications experts at Belden: If plants don’t have a way to monitor the performance and health of their networks, including availability, capacity and quality, then how can they be expected to manage those networks effectively? What data do they use to determine how their network is running? What information do they rely on to make informed decisions?
Without monitoring these significant factors, there’s no way to:
• Qualify and quantify network performance
• Recognize network issues that require corrective action
• Identify potential network sources of lost time and lost production
• Determine the network’s overall effectiveness
• Distinguish patterns and finetune network infrastructure
Given how critical networks are to production efficiency, why is there no objective, consistent, one-number KPI that reports on network performance? Simply put, it’s because networks are complicated.
They involve many vendors, many stakeholders and many metrics. Because every network is unique, each one is made up of different device counts, designs, segmentations and topologies. This makes it difficult to determine what proper network performance might look like. Finally, networks are all designed to support different demands, data, applications and environments.
The creation of ONE: Overall network effectiveness
While networks are indeed complicated, complexity has never stopped industry innovation before. Believing that network administrators and OT technicians deserve a way to get a better handle on —and control over — network performance, Belden decided to tackle the problem head-on. It was time to create a one-number metric for network health.
The result of that work is the formation of a one-number metric that represents ONE. It’s inspired by and modeled after OEE, but it’s designed specifically for networks.
Because ONE is objectively scored, factors such as device manufacturer or vintage aren’t a barrier to the metric. The KPI can be applied to any type of network, regardless of its size, scale, industry, topology or ISO layer. It’s also simple to evaluate, so network administrators and OT technicians can track time-trended values and predict future performance.
This allows them to create countermeasures that prevent past problems and react to new ones as they arise — especially those that may otherwise escape unnoticed. ONE is sensitive to the variables and parameters that control, manage and define a network, bringing attention to them all by providing a one-number metric to focus on.
ONE’s formula considers network elements such as:
• Bandwidth capacity: how much data is able to be transmitted over a connection during a specific amount of time
• Channel interference: signal disrupted during transmission
• Connection availability: how often the network is fully operational
• Device uptime: how often a network device is fully operational
• Legacy devices: connections using outdated or unsupported parameters
• Network congestion: network oversaturated with load due to traffic
The ONE formula
• Packet errors: corrupt packets flowing through the network
• Port utilization: consumption of an individual port's bandwidth
• Redundancies: traffic flow when a path is lost
• Speeds: how long it takes data to transfer back and forth
A close look at the ONE formula
There are three factors that make up the ONE metric for IT and OT networks.
Network availability
Similar to the system’s availability factor in OEE, the network’s availability factor in ONE reflects how much time a certain network connection is expected to be available versus how much time it’s actually available.
Expected availability reflects the network’s level of availability when it’s needed, accounting for planned downtime (for example, perhaps users expect it to be available 99.99% of the time) and unused ports.
Actual availability reflects how often a link is able to transmit or receive data when it’s expected to. This allows the formula to capture events like unplanned downtime.
Network capacity
With factory equipment, the goal is to achieve as much production as possible in the shortest amount of time. But this doesn’t exactly carry over to network performance.
If equipment and processes are designed to process 1,000 pieces per hour, then users want to produce at least 1,000 pieces per hour. But this line of thinking does not hold true for network performance, where best practice is to reserve capacity for a “safety factor."
For example, if a network has a 1 Gbps link, pushing 1 Gbps of traffic through that link all the time leaves no contingency when problems arise, offers no buffer for unexpected tasks and provides no room to grow the network or the systems that rely on it for communications.
Reserve capacity also protects mission-critical data transmission. Consider a congested network that runs at a high percentage of utilization. What happens when a new data stream is added by a technician, an update is distributed to a machine cell or a new production order comes in? If the network is running at maximum capacity, then the new traffic slows everything down — or causes a complete network collapse — and mission-critical messages don't arrive in time (especially those involved in motion control or safety systems).
While the concept of “equipment reserve” isn’t a consideration in OEE, it’s a very important in ONE. Users don’t want links to be 0% utilized, of course, but they do want to have a large “reserve” in capacity.
This means that the network capacity factor is calculated based on the remaining capacity. First, port utilization is established: bits per second received over the negotiated bandwidth. Capacity is then “1 – port utilization” to show the remaining pipeline available.
Network quality
ONE’s quality factor takes into account data-integrity issues and broken messages through data points that relate to detected cyclic redundancy check (CRC) errors (also known as checksum errors). These errors indicate when information within a message has been lost or corrupted. They also detect packet collisions where information is simultaneously transmitted and received across a connection. While this isn’t normally an issue for full-duplex connections, half-duplex connections — like those found in Wi-Fi networks, for example — are particularly susceptible.
This factor represents how many packets a network receives versus how many packets are actually good or usable.
Achieving a world-class ONE target
Just like OEE has a world-class target that represents the highest level of equipment effectiveness achievable (80% to 85%), ONE has its own world-class target.
The world-class target for network effectiveness falls between 75% and 90%, depending on a variety of factors. Similar to OEE, ONE is calculated by rolling up three individual scores that measure availability, capacity and quality.
• World-class availability score = Greater than 98%
• World-class capacity score = Greater than 90%
• World-class quality score = Greater than 85%
These three scores roll up into a one-number score, which reveals the ONE KPI.
It’s important to note that ONE is established on a per-port basis — the metric is calculated for every individual link. In networks with thousands of connections (or more), the values within the metric must be rolled up in a way that does not obscure the performance or impact of an individual link.
For example, the first step in rolling up the port-level ONE number may be to average the collection or pool of values for the ports of interest. For a single switch, the ONE value for the ports of that switch would be used.
Weighting comes into play when dealing with connections of varying importance. To make sure that the rolled-up ONE best represents what is most critical to operations, links can be categorized to reflect use and criticality. Weights can be assigned to categories and applied to the rolled-up ONE metric as a weighted average. Moreover, customers can use as many or as few weight categories as they want.
For example, a weight can be assigned to the average to bias the ONE metric toward certain connectivity types at the:
• End-device layer (connections between a switch and end device)
• Aggregation layer (switch-to-switch connections)
• Backbone layer (switch-to-switch connections that are adjacent to aggregation connections on both ends)
Of course, simply aggregating the pool of values via an average, no matter how creative a weighting scheme may be, can obscure the impact of individual links — especially as the scope of the pooled ONE values grows.
An option to account for this could be to subtract the standard deviation of the pooled links from the average. This would allow ONE to reflect the impact of potential issues in even minor links, while still accurately describing the state of the network as a whole.
When and where to use ONE
Because ONE is a single-number metric, there are many ways it can be used, represented and incorporated. For example, it could be:
Used on its own, which may be helpful for a command line interface (CLI). The user can log into a switch or device and see a number that accurately represents network health.
1. Tracked, documented and trended over time to benchmark performance and monitor for possible problems. For example, visually representing ONE in a chart or graph over time can help users quickly identify cyclical patterns or periodic changes that could indicate the need to look more closely at network performance.
2. Turned into a graphical element to be displayed on screens. This can help users track not only ONE, but also the elements that make up the single-number metric — all in one place.
The ONE number is intended to be an easy-to-digest network measurement for plant and operations managers. It can also be a tool to help solicit investments and then measure the impact of those investments on the network.
Just like plant managers measure OEE, Belden envisions network administrators and technicians measuring ONE in near real-time to evaluate network health and performance, then leveraging it to make informed decisions about IT and OT systems.
Consultants who assess network performance levels, along with service providers who need a metric to help define service level agreements, will also find value in this one-number KPI. It provides a consistent, objective score that can be used to assess performance.
Belden has already begun to integrate ONE metrics into some of its products so it can be included in data analysis engines, network management systems and cybersecurity packages, and even embedded into network infrastructure operating systems through switches and other networked devices.
This will help network administrators and technicians not only understand how their networks are performing from one moment to the next, but also prepare their networks for future-forward Industry 4.0, Industry 5.0 and digital-transformation initiatives.
About Belden
To learn how Belden can help a network achieve its maximum potential effectiveness, visit their website. Belden Inc. delivers the infrastructure that makes the digital journey simpler, smarter and more secure. They are moving beyond connectivity, from what they make to what they make possible through a performance-driven portfolio, forward-thinking expertise and purpose-built solutions. With a legacy of quality and reliability spanning 120-plus years, Belden has a strong foundation to continue building the future. They are headquartered in St. Louis, Missouri, and they have manufacturing capabilities in North America, Europe, Asia and Africa.