Updates for the Back-to-Back Frame Benchmark in RFC 2544AT&T Labs200 Laurel Avenue SouthMiddletownNJ07748United States of America+1 732 420 1571acmorton@att.comBuffer sizeBuffer delayCorrection FactorFundamental benchmarking methodologies for network interconnect
devices of interest to the IETF are defined in RFC 2544. This memo
updates the procedures of the test to measure the Back-to-Back Frames
benchmark of RFC 2544, based on further experience.This memo updates Section 26.4 of RFC 2544.Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are candidates for any level of Internet
Standard; see Section 2 of RFC 7841.
Information about the current status of this document, any
errata, and how to provide feedback on it may be obtained at
.
Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
() in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License.
Table of Contents
. Introduction
. Requirements Language
. Scope and Goals
. Motivation
. Prerequisites
. Back-to-Back Frames
. Preparing the List of Frame Sizes
. Test for a Single Frame Size
. Test Repetition and Benchmark
. Benchmark Calculations
. Reporting
. Security Considerations
. IANA Considerations
. References
. Normative References
. Informative References
Acknowledgments
Author's Address
IntroductionThe IETF's fundamental benchmarking methodologies are defined in
, supported by the terms and definitions
in . actually
obsoletes an earlier specification, . Over time,
the benchmarking community has updated several
times, including the Device Reset benchmark
and the important Applicability Statement
concerning use outside the Isolated Test Environment (ITE) required for
accurate benchmarking. Other specifications implicitly update , such as the IPv6 benchmarking methodologies in .Recent testing experience with the Back-to-Back Frame test and
benchmark in indicates that an
update is warranted . In particular, analysis of the results indicates
that buffer size matters when compensating for interruptions of software-packet processing, and this finding increases the importance of the
Back-to-Back Frame characterization described here. This memo provides
additional rationale and the updated method.
provides its own requirements language consistent with , since (which it obsoletes) predates . All three memos share common authorship.
Today, clarifies the usage of requirements
language, so the requirements language presented in this memo are expressed in accordance with
. They are intended for those
performing/reporting laboratory tests to improve clarity and
repeatability, and for those designing devices that facilitate these
tests.Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
Scope and GoalsThe scope of this memo is to define an updated method to
unambiguously perform tests, measure the benchmark(s), and report the
results for Back-to-Back Frames (as described in
).The goal is to provide more efficient test procedures where possible
and expand reporting with additional interpretation of the results.
The tests described in this memo address the cases in which the maximum
frame rate of a single ingress port cannot be transferred to
an egress port without loss (for some frame sizes of interest).Benchmarks as described in rely on test conditions with
constant frame sizes, with the goal of understanding what network-device
capability has been tested. Tests with the smallest size stress the
header-processing capacity, and tests with the largest size stress the
overall bit-processing capacity. Tests with sizes in between may
determine the transition between these two capacities.
However,
conditions simultaneously sending a mixture of Internet (IMIX) frame sizes, such as those described in , MUST NOT be
used in Back-to-Back Frame testing. describes buffer-size testing
for physical networking devices in a data center. Those methods measure buffer latency directly with traffic
on multiple ingress ports that overload an egress port on the Device
Under Test (DUT) and are not subject to the revised calculations
presented in this memo. Likewise, the methods of SHOULD be used for test cases where the egress-port
buffer is the known point of overload.Motivation describes the rationale for
the Back-to-Back Frames benchmark. To summarize, there are several
reasons that devices on a network produce bursts of frames at the
minimum allowed spacing; and it is, therefore, worthwhile to understand
the DUT limit on the length of such bursts in
practice. The same document also states:
Tests of this parameter are intended to determine the extent
of data buffering in the device.
Since this test was defined, there have been occasional discussions
of the stability and repeatability of the results, both over time and
across labs. Fortunately, the Open Platform for Network Function
Virtualization (OPNFV) project on Virtual Switch Performance (VSPERF) Continuous Integration (CI)
testing routinely repeats Back-to-Back Frame
tests to verify that test functionality has been maintained through
development of the test-control programs. These tests were used as a
basis to evaluate stability and repeatability, even across lab setups
when the test platform was migrated to new DUT hardware at the end of
2016.When the VSPERF CI results were examined ,
several aspects of the results were considered notable:
Back-to-Back Frame benchmark was very consistent for some fixed
frame sizes, and somewhat variable for other frame sizes.
The number of Back-to-Back Frames with zero loss reported for
large frame sizes was unexpectedly long (translating to 30 seconds
of buffer time), and no explanation or measurement limit condition
was indicated. It was important that the buffering time calculations
were part of the referenced testing and analysis , because the calculated buffer time of
30 seconds for some frame sizes was clearly wrong or highly
suspect. On the other hand, a result expressed only as a large
number of Back-to-Back Frames does not permit such an easy
comparison with reality.
Calculation of the extent of buffer time in the DUT helped to
explain the results observed with all frame sizes. For example,
tests with some frame sizes cannot exceed the frame-header-processing rate of the DUT, thus, no buffering occurs. Therefore,
the results depended on the test equipment and not the DUT.
It was found that a better estimate of the DUT buffer time could
be calculated using measurements of both the longest burst in frames
without loss and results from the Throughput tests conducted
according to . It is
apparent that the DUT's frame-processing rate empties the buffer
during a trial and tends to increase the "implied" buffer-size
estimate (measured according to because many frames have departed the buffer when
the burst of frames ends). A calculation using the Throughput
measurement can reveal a "corrected" buffer-size estimate.
Further, if the Throughput tests of are conducted as a prerequisite, the number of
frame sizes required for Back-to-Back Frame benchmarking can be reduced
to one or more of the small frame sizes, or the results for large frame
sizes can be noted as invalid in the results if tested anyway. These are
the larger frame sizes for which the Back-to-Back Frame rate cannot
exceed the frame-header-processing rate of the DUT and little or no
buffering occurs.The material below provides the details of the calculation to
estimate the actual buffer storage available in the DUT, using results
from the Throughput tests for each frame size and the Max
Theoretical Frame Rate for the DUT links (which constrain the minimum
frame spacing).In reality, there are many buffers and packet-header-processing steps
in a typical DUT. The simplified model used in these calculations for
the DUT includes a packet-header-processing function with limited rate
of operation, as shown in .So, in the Back-to-Back Frame testing:
The ingress burst arrives at Max Theoretical Frame Rate, and
initially the frames are buffered.
The packet-header-processing function (HeaderProc) operates at
the "Measured Throughput" (), removing frames from the buffer (this is the
best approximation we have, another acceptable approximation is the received frame rate
during Back-to-back Frame testing, if Measured Throughput is
not available).
Frames that have been processed are clearly not in the buffer, so
the Corrected DUT Buffer Time equation () estimates and
removes the frames that the DUT forwarded on egress during the
burst. We define buffer time as the number of frames occupying the
buffer divided by the Max Theoretical Frame Rate (on ingress)
for the frame size under test.
A helpful concept is the buffer-filling rate, which is the
difference between the Max Theoretical Frame Rate (ingress) and the
Measured Throughput (HeaderProc on egress). If the actual buffer
size in frames is known, the time to fill the buffer during a
measurement can be calculated using the filling rate, as a check on
measurements. However, the buffer in the model represents many
buffers of different sizes in the DUT data path.
Knowledge of approximate buffer storage size (in time or bytes) may
be useful in estimating whether frame losses will occur if DUT forwarding
is temporarily suspended in a production deployment due to an
unexpected interruption of frame processing (an interruption of duration
greater than the estimated buffer would certainly cause lost frames). In
, the calculations for the correct buffer time use the
combination of offered load at Max Theoretical Frame Rate and header-processing speed at 100% of Measured Throughput. Other combinations are
possible, such as changing the percent of Measured Throughput to account
for other processes reducing the header processing rate.The presentation of OPNFV VSPERF evaluation and development of
enhanced search algorithms was given and discussed at
IETF 102. The enhancements are intended to compensate for transient
processor interrupts that may cause loss at near-Throughput levels of offered
load. Subsequent analysis of the results indicates that buffers within
the DUT can compensate for some interrupts, and this finding increases
the importance of the Back-to-Back Frame characterization described
here.PrerequisitesThe test setup MUST be consistent with Figure 1 of , or Figure 2 of that document when the tester's sender and receiver
are different devices. Other mandatory testing aspects described in
MUST be included, unless explicitly modified in
the next section.The ingress and egress link speeds and link-layer protocols MUST be
specified and used to compute the Max Theoretical Frame Rate when
respecting the minimum interframe gap.The test results for the Throughput benchmark conducted according to
for all frame sizes RECOMMENDED by MUST be available to reduce
the tested-frame-size list or to note invalid results for individual
frame sizes (because the burst length may be essentially infinite for
large frame sizes).Note that:
the Throughput and the Back-to-Back Frame measurement-configuration traffic characteristics (unidirectional or
bidirectional, and number of flows generated) MUST match.
the Throughput measurement MUST be taken under zero-loss conditions,
according to .
The Back-to-Back Benchmark described in MUST be measured directly by the tester, where buffer
size is inferred from Back-to-Back Frame bursts and associated packet-loss measurements. Therefore, sources of frame loss that are unrelated
to consistent evaluation of buffer size SHOULD be identified and removed
or mitigated. Example sources include:
On-path active components that are external to the DUT
Shared-resource contention between the DUT and other off-path
component(s) impacting DUT's behavior, sometimes called the "noisy
neighbor" problem with virtualized network functions.
Mitigations applicable to some of the sources above are discussed in
, with the other measurement requirements described below in
.Back-to-Back FramesObjective: To characterize the ability of a DUT to process
Back-to-Back Frames as defined in .The procedure follows.Preparing the List of Frame SizesFrom the list of RECOMMENDED frame sizes (), select the subset of frame sizes whose Measured
Throughput (during prerequisite testing) was less than the Max
Theoretical Frame Rate of the DUT/test setup. These are the only
frame sizes where it is possible to produce a burst of frames that
cause the DUT buffers to fill and eventually overflow, producing one
or more discarded frames.Test for a Single Frame SizeEach trial in the test requires the tester to send a burst of
frames (after idle time) with the minimum interframe gap and to
count the corresponding frames forwarded by the DUT.The duration of the trial includes three REQUIRED components:
The time to send the burst of frames (at the back-to-back
rate), determined by the search algorithm.
The time to receive the transferred burst of frames (at the
Throughput rate), possibly truncated by
buffer overflow, and certainly including the latency of the
DUT.
At least 2 seconds not overlapping the time to receive the
burst (Component 2, above), to ensure that DUT buffers have depleted. Longer times
MUST be used when conditions warrant, such as when buffer times
>2 seconds are measured or when burst sending times are >2
seconds, but care is needed, since this time component directly
increases trial duration, and many trials and tests comprise a
complete benchmarking study.
The upper search limit for the time to send each burst MUST
be configurable to values as high as 30 seconds (buffer time results
reported at or near the configured upper limit are likely invalid, and
the test MUST be repeated with a higher search limit).If all frames have been received, the tester increases the length
of the burst according to the search algorithm and performs another
trial.If the received frame count is less than the number of frames in
the burst, then the limit of DUT processing and buffering may have
been exceeded, and the burst length for the next trial is determined by the search
algorithm (the burst length is typically reduced,
but see below).Classic search algorithms have been adapted for use in
benchmarking, where the search requires discovery of a pair of
outcomes, one with no loss and another with loss, at load conditions
within the acceptable tolerance or accuracy. Conditions encountered
when benchmarking the infrastructure for network function
virtualization require algorithm enhancement. Fortunately, the
adaptation of Binary Search, and an enhanced Binary Search with Loss
Verification, have been specified in Clause 12.3 of . These algorithms can easily be used for
Back-to-Back Frame benchmarking by replacing the offered load level
with burst length in frames. , Annex B describes
the theory behind the enhanced Binary Search with Loss Verification
algorithm.There are also promising works in progress that may prove useful in
Back-to-Back Frame benchmarking. and are two such examples.Either the Binary Search or Binary Search
with Loss Verification algorithms MUST be used, and input parameters
to the algorithm(s) MUST be reported.The tester usually imposes a (configurable) minimum step size for
burst length, and the step size MUST be reported with the results (as
this influences the accuracy and variation of test results).The original definition is
stated below:
The back-to-back value is the number of frames in the longest burst that the DUT will handle without the loss of any frames.
Test Repetition and BenchmarkOn this topic,
requires:
The trial length MUST be at least 2 seconds and SHOULD be
repeated at least 50 times with the average of the recorded values
being reported.
Therefore, the Back-to-Back Frame benchmark is the average of burst-length values over repeated tests to determine the longest burst of
frames that the DUT can successfully process and buffer without frame
loss. Each of the repeated tests completes an independent search
process.In this update, the test MUST be repeated N times (the number of
repetitions is now a variable that must be reported) for each frame
size in the subset list, and each Back-to-Back Frame value MUST be made
available for further processing (below).Benchmark CalculationsFor each frame size, calculate the following summary statistics for
longest Back-to-Back Frame values over the N tests:
Average (Benchmark)
Minimum
Maximum
Standard Deviation
Further, calculate the Implied DUT Buffer Time and the Corrected
DUT Buffer Time in seconds, as follows:
Implied DUT buffer time =
Average num of Back-to-back Frames / Max Theoretical Frame Rate
The formula above is simply expressing the burst of frames
in units of time.The next step is to apply a correction factor that accounts for the
DUT's frame forwarding operation during the test (assuming the simple
model of the DUT composed of a buffer and a forwarding function,
described in ).Corrected DUT Buffer Time =
/ \
Implied DUT |Implied DUT Measured Throughput |
= Buffer Time - |Buffer Time * -------------------------- |
| Max Theoretical Frame Rate |
\ /where:
The "Measured Throughput" is the Throughput Benchmark for the frame size tested,
as augmented by methods including the Binary Search with Loss
Verification algorithm in where
applicable and MUST be expressed in frames per second in this
equation.
The "Max Theoretical Frame Rate" is a calculated
value for the interface speed and link-layer technology used, and it
MUST be expressed in frames per second in this equation.
The term on the far right in the formula for Corrected DUT Buffer
Time accounts for all the frames in the burst that were transmitted by
the DUT while the burst of frames was sent in. So, these frames are
not in the buffer, and the buffer size is more accurately estimated by
excluding them. If Measured Throughput is not available,
an acceptable approximation is the received frame rate (see Forwarding
Rate in measured during Back-to-back Frame testing).ReportingThe Back-to-Back Frame results SHOULD be reported in the format of a
table with a row for each of the tested frame sizes. There SHOULD be
columns for the frame size and the resultant average frame count for
each type of data stream tested.The number of tests averaged for the benchmark, N, MUST be
reported.The minimum, maximum, and standard deviation across all complete
tests SHOULD also be reported (they are referred to as "Min,Max,StdDev"
in ).The Corrected DUT Buffer Time SHOULD also be reported.If the tester operates using a limited maximum burst length in
frames, then this maximum length SHOULD be reported.
Back-to-Back Frame Results
Frame Size, octets
Ave B2B Length, frames
Min,Max,StdDev
Corrected Buff Time, Sec
64
26000
25500,27000,20
0.00004
Static and configuration parameters (reported with ):
Number of test repetitions, N
Minimum Step Size (during searches), in frames.
If the tester has a specific (actual) frame rate of interest (less
than the Throughput rate), it is useful to estimate the buffer time at
that actual frame rate:Actual Buffer Time =
Max Theoretical Frame Rate
= Corrected DUT Buffer Time * --------------------------
Actual Frame Rate
and report this value, properly labeled.Security ConsiderationsBenchmarking activities as described in this memo are limited to
technology characterization using controlled stimuli in a laboratory
environment, with dedicated address space and the other constraints
of .The benchmarking network topology will be an independent test setup
and MUST NOT be connected to devices that may forward the test traffic
into a production network or misroute traffic to the test management
network. See .Further, benchmarking is performed on an "opaque-box" (a.k.a.
"black-box") basis, relying solely on measurements observable external
to the Device or System Under Test (SUT).The DUT developers are commonly independent from the personnel and
institutions conducting benchmarking studies. DUT developers might have
incentives to alter the performance of the DUT if the test conditions
can be detected. Special capabilities SHOULD NOT exist in the DUT/SUT
specifically for benchmarking purposes. Procedures described in this
document are not designed to detect such activity. Additional testing
outside of the scope of this document would be needed and has been used
successfully in the past to discover such malpractices.Any implications for network security arising from the DUT/SUT SHOULD
be identical in the lab and in production networks.IANA ConsiderationsThis document has no IANA actions.ReferencesNormative ReferencesBenchmarking Terminology for Network Interconnection DevicesThis memo discusses and defines a number of terms that are used in describing performance benchmarking tests and the results of such tests. This memo provides information for the Internet community. It does not specify an Internet standard.Key words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Benchmarking Methodology for Network Interconnect DevicesThis document is a republication of RFC 1944 correcting the values for the IP addresses which were assigned to be used as the default addresses for networking test equipment. This memo provides information for the Internet community.IMIX Genome: Specification of Variable Packet Sizes for Additional TestingBenchmarking methodologies have always relied on test conditions with constant packet sizes, with the goal of understanding what network device capability has been tested. Tests with a constant packet size reveal device capabilities but differ significantly from the conditions encountered in operational deployment, so additional tests are sometimes conducted with a mixture of packet sizes, or "IMIX" ("Internet Mix"). The mixture of sizes a networking device will encounter is highly variable and depends on many factors. An IMIX suited for one networking device and deployment will not be appropriate for another. However, the mix of sizes may be known, and the tester may be asked to augment the fixed-size tests. To address this need and the perpetual goal of specifying repeatable test conditions, this document defines a way to specify the exact repeating sequence of packet sizes from the usual set of fixed sizes and from other forms of mixed-size specification.Ambiguity of Uppercase vs Lowercase in RFC 2119 Key WordsRFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.Data Center Benchmarking MethodologyThe purpose of this informational document is to establish test and evaluation methodology and measurement techniques for physical network equipment in the data center. RFC 8238 is a prerequisite for this document, as it contains terminology that is considered normative. Many of these terms and methods may be applicable beyond the scope of this document as the technologies originally applied in the data center are deployed elsewhere.Network Functions Virtualisation (NFV) Release 3; Testing; Specification of Networking Benchmarks and Measurement Methods for NFVIETSIRapporteur: A. MortonInformative ReferencesMultiple Loss Ratio Search for Packet Throughput (MLRsearch)Cisco SystemsCisco SystemsWork in ProgressProbabilistic Loss Ratio Search for Packet Throughput (PLRsearch)Work in ProgressDataplane Performance, Capacity, and Benchmarking in OPNFVIntel Corp.Spirent CommunicationsAT&T LabsBenchmarking Methodology for Network Interconnect DevicesThis document discusses and defines a number of tests that may be used to describe the performance characteristics of a network interconnecting device. This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind.Benchmarking Methodology for LAN Switching DevicesThis document is intended to provide methodology for the benchmarking of local area network (LAN) switching devices. This memo provides information for the Internet community.IPv6 Benchmarking Methodology for Network Interconnect DevicesThe benchmarking methodologies defined in RFC 2544 are IP version independent. However, RFC 2544 does not address some of the specificities of IPv6. This document provides additional benchmarking guidelines, which in conjunction with RFC 2544, lead to a more complete and realistic evaluation of the IPv6 performance of network interconnect devices. IPv6 transition mechanisms are outside the scope of this document. This memo provides information for the Internet community.Device Reset CharacterizationAn operational forwarding device may need to be restarted (automatically or manually) for a variety of reasons, an event called a "reset" in this document. Since there may be an interruption in the forwarding operation during a reset, it is useful to know how long a device takes to resume the forwarding operation.This document specifies a methodology for characterizing reset (and reset time) during benchmarking of forwarding devices and provides clarity and consistency in reset test procedures beyond what is specified in RFC 2544. Therefore, it updates RFC 2544. This document also defines the benchmarking term "reset time" and, only in this, updates RFC 1242. This document is not an Internet Standards Track specification; it is published for informational purposes.Applicability Statement for RFC 2544: Use on Production Networks Considered HarmfulThe Benchmarking Methodology Working Group (BMWG) has been developing key performance metrics and laboratory test methods since 1990, and continues this work at present. The methods described in RFC 2544 are intended to generate traffic that overloads network device resources in order to assess their capacity. Overload of shared resources would likely be harmful to user traffic performance on a production network, and there are further negative consequences identified with production application of the methods. This memo clarifies the scope of RFC 2544 and other IETF BMWG benchmarking work for isolated test environments only, and it encourages new standards activity for measurement methods applicable outside that scope. This document is not an Internet Standards Track specification; it is published for informational purposes.Back2Back Testing Time Series (from CI)AT&T LabsEvolution of Repeatability in Benchmarking: Fraser Plugfest (Summary for IETF BMWG)Spirent CommunicationsAT&T LabsOPNFV VSPERF CIIntel CorporationAcknowledgmentsThanks to , , and of the VSPERF
project for many contributions to the early testing . has also investigated the topic
and made useful suggestions. and also
provided many comments and suggestions based on extensive integration
testing and resulting search-algorithm proposals -- the most up-to-date
feedback possible. also provided comments and support for the
document. 's review improved readability in several key
passages. , , and 's comments
improved the clarity and configuration advice on trial duration. suggested additional text on DUT design cautions in the Security
Considerations section.Author's AddressAT&T Labs200 Laurel Avenue SouthMiddletownNJ07748United States of America+1 732 420 1571acmorton@att.com