10 Gigabit Ethernet
10 Gigabit Ethernet: Integrating a Standard Protocol into High-Speed Real-Time Systems
When it comes to 10 Gigabit Ethernet, the demands of embedded applications, such as real-time record and playback systems, have more requirements than mass-market network applications. Close attention to hardware capabilities and software features is essential.
ROB KRAFT, ADVANCEDIO
Page 1 of 1
Thanks to the universal nature of Ethernet, the 10 Gigabit Ethernet (10GbE) standard promises developers new levels of portability and easier software maintenance at a speed that has never been experienced before. This high-speed, ubiquitous technology has formed a new baseline, and many developers are looking for ways to introduce this technology into a number of high-performance real-time applications.
One such application—as an interconnect for high-bandwidth sensors—is a data plane domain where earlier generations of Ethernet were unable to compete against the likes of (serial) Front Panel Data Port (sFPDP) and Fibre Channel. Now, as 10GbE with its superior speed and facility for bidirectional, full-duplex communications pushes into these applications, the equipment used along with these sensor networks, such as high-speed real-time data record and playback systems, require 10GbE interfaces.
10GbE interfaces are somewhat more complex than earlier generations of Ethernet. Due to the interface’s sheer speed, processors would be entirely consumed running the 10GbE protocol stacks. Because of this, most 10GbE implementations use some form of a protocol offload coprocessor to achieve high data transfer rate performance and reduce processor utilization. The offload coprocessor, such as an ASIC or FPGA, performs the heavy lifting of the protocol, reducing the burden on the CPU. Integrating a 10GbE interface into a real-time record and playback system really involves integrating an offload coprocessor.
High-speed real-time data record and playback systems capture relatively large quantities of data from systems or sensors. The data typically is used for offline analysis or is replayed into a system for training, testing, simulation, or development. Figure 1 contains a block diagram of a typical system. The record and playback systems have some unique functional and performance requirements, which in turn impose some requirements on the 10GbE interface. Notably, these 10GbE interface requirements differ significantly in several key application-level and hardware-level areas from the requirements typical to 10GbE interfaces used in the large server-based markets.
Time-Stamping and Playback
A common requirement in high-speed recording systems, especially multichannel systems, is to accurately time-tag the data with a sufficient level of accuracy. The tagging is used during offline analysis, to enable the alignment of data recorded from multiple sensors. It is also of use during playback, to simulate the re-injection of data into the system with the same timing fidelity with which it was originally captured.
Accomplishing this time-tagging task presents two challenges. The first is to find a means of tagging the packets with sufficient accuracy and precision. When the time-stamp accuracy requirements are sufficiently high—a few microseconds or less—it is not satisfactory to stamp the packets at the application software layer, which is the most straightforward access layer. By the time they reach this layer, they have already made a trip through a PCI-X bus or PCIe fabric, processor memory, possibly a processor cache, and an operating system. This trip is subject to variable latency of a magnitude exceeding a few microseconds, making the solution intolerable.
Assuming recorded packets have been accurately time stamped, a second challenge hinges on meeting precise timing constraints during playback. A typical approach that relies on a processor pulling the data from system memory, flowing it up through the protocol stack, and sending it out over a 10GbE interface, suffers from the same non-deterministic latency that occurs when you time-stamp the application at the software level. This approach will not achieve the required precise playback timing.
Instead, the solution to the recording time-stamp precision/accuracy is to provide an interface to stamp the packets immediately after they arrive over the 10GbE wire. There they can be deterministically tagged before reaching any buses, fabrics, or processors. The playback precision challenge is solved by having an interface and memory located in the outbound path immediately before the 10GbE wire. Outbound packets can be moved from the recorder and staged in the memory ahead of time, and their release onto the wire precisely gated.
Note that ASICs designed for commodity 10GbE NIC cards generally do not support hardware interfaces for packet time-stamping or playback staging. This leaves access for tagging at the socket layer only, which is insufficiently accurate.
Recording Full Ethernet Frame
Some Ethernet record and playback applications require payload data to be transported using the standard protocol. Another class, however, captures the entire Ethernet frame, which includes the packet headers as well as the payload (“raw sockets”). Capturing the full frame is common in intelligence or security applications, where the nature of the traffic is of interest, or in applications where the traffic is collected to be replayed later into a simulation, training, or hardware-in-the-loop test system in place of live data sources.
For these applications to achieve the required data rates, the software for handling these “raw sockets” must be optimized in its interaction downward to the Ethernet interface hardware and upward to the recorder application. Aspects that can be optimized include minimizing instructions and memory copying when passing the incoming buffers up to the application, and optimizing the user application’s control over manipulating and freeing received input buffers. This level of optimization requires that the driver software be designed with intimate familiarity of the 10GbE interface hardware. Integrators planning to port drivers for 10GbE interfaces into data record/playback applications should consider these aspects for achieving performance.
Recording “Corrupt” Packets
Typically, a recorded stream of sampled and digitized sensor data is destined for signal processing algorithms such as filtering, FFT, decoding, or to other processing required for detailed analysis. Some of these algorithms are able to correct for or tolerate a number of scattered errors, but are not amenable to a consecutive swath of missing data. As a simple example, consider your own tolerance for missing a few [sic] lttrs fom ths sentnc verss msing an entire gr.
Ethernet protocol stacks, however, are designed to discard individual packets or entire messages if an error is detected in a checksum, or if a message arrives incomplete. This means discarding anywhere from a series of 1500 or 9000 (depending on the MTU size) to 64000 consecutive bytes, depending on the type and layer at which the error occurs. Ironically, the source of this error could be just a small change in a packet’s header, which does not affect the integrity of the payload data of interest. While this behavior makes good sense in most network applications of Ethernet, where higher protocol or application layers deal with it, it is not ideal for these kinds of recording and sensor processing applications.
Therefore, when it comes to high-speed recording of sensor data, a solution that permits the option of recording all packets, even those with errors, can be essential. Note that 10GbE ASICs, which are designed for the large majority of network applications that do not require this specialized option, typically do not implement it.
Hardware Interface Challenge and Solution
Recording real-time sensor data over 10GbE often brings with it the prospect of receiving multiple consecutive packets burst at full line speed, without the option of controlling the flow by asking the sensor to pause its transmission. This lack of control occurs when sensors have simplified 10GbE interfaces that do not respond to flow control or simply because they do not have resources to buffer the samples—characteristics unique to real-time embedded applications and foreign to the world of servers for which most 10GbE ASICs and NICs were designed.
In applications where there is sensor fusion—several sensors are simultaneously transmitting over the network to the same recorder 10GbE node—the amount of consecutive line-rate burst data increases significantly. A 10GbE interface used for this application must be designed with sufficient local data buffer to accommodate the line rate burst data without relying on access (typically over PCI-X or PCIe) to the host system memory located externally to the 10GbE card. There is often at least momentary contention for accessing system memory, and if the 10GbE card has insufficient local buffering, it would overflow and the card would be forced to drop the incoming packets.
It is important to recognize that this overflow can occur at arbitrarily low sustained data rates. For instance, radar sensors receive a burst of signals for a short duration, followed by a longer period of quiet. If such a recorder receives 100 Kbytes at 10Gigabit line rate every 100 ms, the sustained rate is only 1 Mbyte/s. But for the duration of that 100 Kbyte burst, which is 80 µs, the card must be able to handle all of the incoming data even if it cannot access the system memory. The equations in Figure 2 help determine the burst size that can be accommodated based on buffering and input and output rates.
High-performance embedded real-time applications such as high-speed record and playback place some unique demands in their use of 10GbE for the real-time data plane. To effectively address these demands, products architected for this space must implement features including interfaces for precision time-stamping, the placement of memory to accommodate large full-rate inbound bursts and outbound data staging, and the ability to customize stack behavior for receiving real-time sensor data (Figure 3).
Generally, commodity NIC cards and the ASICs they are based on are optimized for a different set of application requirements, and therefore do not implement all of the required features. However, carefully designed 10GbE products can facilitate the use of the latest incarnation of widespread Ethernet in high-performance real-time applications (Figure 4).