< draft-ietf-quic-recovery-20.txt   draft-ietf-quic-recovery-21.txt >
QUIC J. Iyengar, Ed. QUIC J. Iyengar, Ed.
Internet-Draft Fastly Internet-Draft Fastly
Intended status: Standards Track I. Swett, Ed. Intended status: Standards Track I. Swett, Ed.
Expires: October 25, 2019 Google Expires: January 9, 2020 Google
April 23, 2019 July 08, 2019
QUIC Loss Detection and Congestion Control QUIC Loss Detection and Congestion Control
draft-ietf-quic-recovery-20 draft-ietf-quic-recovery-21
Abstract Abstract
This document describes loss detection and congestion control This document describes loss detection and congestion control
mechanisms for QUIC. mechanisms for QUIC.
Note to Readers Note to Readers
Discussion of this draft takes place on the QUIC working group Discussion of this draft takes place on the QUIC working group
mailing list (quic@ietf.org), which is archived at mailing list (quic@ietf.org), which is archived at
skipping to change at page 1, line 42 skipping to change at page 1, line 42
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 25, 2019. This Internet-Draft will expire on January 9, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 23 skipping to change at page 2, line 23
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4
3. Design of the QUIC Transmission Machinery . . . . . . . . . . 5 3. Design of the QUIC Transmission Machinery . . . . . . . . . . 5
3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5 3.1. Relevant Differences Between QUIC and TCP . . . . . . . . 5
3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 6 3.1.1. Separate Packet Number Spaces . . . . . . . . . . . . 6
3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6 3.1.2. Monotonically Increasing Packet Numbers . . . . . . . 6
3.1.3. No Reneging . . . . . . . . . . . . . . . . . . . . . 6 3.1.3. Clearer Loss Epoch . . . . . . . . . . . . . . . . . 6
3.1.4. More ACK Ranges . . . . . . . . . . . . . . . . . . . 7 3.1.4. No Reneging . . . . . . . . . . . . . . . . . . . . . 7
3.1.5. Explicit Correction For Delayed Acknowledgements . . 7 3.1.5. More ACK Ranges . . . . . . . . . . . . . . . . . . . 7
3.1.6. Explicit Correction For Delayed Acknowledgements . . 7
4. Generating Acknowledgements . . . . . . . . . . . . . . . . . 7 4. Generating Acknowledgements . . . . . . . . . . . . . . . . . 7
4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . . . 7 4.1. Crypto Handshake Data . . . . . . . . . . . . . . . . . . 8
4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . . . 8 4.2. ACK Ranges . . . . . . . . . . . . . . . . . . . . . . . 8
4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . . . 8 4.3. Receiver Tracking of ACK Frames . . . . . . . . . . . . . 8
4.4. Measuring and Reporting Host Delay . . . . . . . . . . . 8 4.4. Measuring and Reporting Host Delay . . . . . . . . . . . 8
5. Estimating the Round-Trip Time . . . . . . . . . . . . . . . 9 5. Estimating the Round-Trip Time . . . . . . . . . . . . . . . 9
5.1. Generating RTT samples . . . . . . . . . . . . . . . . . 9 5.1. Generating RTT samples . . . . . . . . . . . . . . . . . 9
5.2. Estimating min_rtt . . . . . . . . . . . . . . . . . . . 10 5.2. Estimating min_rtt . . . . . . . . . . . . . . . . . . . 10
5.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 10 5.3. Estimating smoothed_rtt and rttvar . . . . . . . . . . . 10
6. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 11 6. Loss Detection . . . . . . . . . . . . . . . . . . . . . . . 11
6.1. Acknowledgement-based Detection . . . . . . . . . . . . . 11 6.1. Acknowledgement-based Detection . . . . . . . . . . . . . 11
6.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 12 6.1.1. Packet Threshold . . . . . . . . . . . . . . . . . . 12
6.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 12 6.1.2. Time Threshold . . . . . . . . . . . . . . . . . . . 12
6.2. Crypto Retransmission Timeout . . . . . . . . . . . . . . 13 6.2. Crypto Retransmission Timeout . . . . . . . . . . . . . . 13
6.2.1. Retry and Version Negotiation . . . . . . . . . . . . 14 6.3. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 14
6.2.2. Discarding Keys and Packet State . . . . . . . . . . 14 6.3.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 14
6.3. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 15 6.3.2. Sending Probe Packets . . . . . . . . . . . . . . . . 15
6.3.1. Computing PTO . . . . . . . . . . . . . . . . . . . . 15 6.3.3. Loss Detection . . . . . . . . . . . . . . . . . . . 16
6.3.2. Sending Probe Packets . . . . . . . . . . . . . . . . 16 6.4. Retry and Version Negotiation . . . . . . . . . . . . . . 16
6.3.3. Loss Detection . . . . . . . . . . . . . . . . . . . 17 6.5. Discarding Keys and Packet State . . . . . . . . . . . . 17
6.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . 17 6.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . 17
7. Congestion Control . . . . . . . . . . . . . . . . . . . . . 17 7. Congestion Control . . . . . . . . . . . . . . . . . . . . . 17
7.1. Explicit Congestion Notification . . . . . . . . . . . . 17 7.1. Explicit Congestion Notification . . . . . . . . . . . . 18
7.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 18 7.2. Slow Start . . . . . . . . . . . . . . . . . . . . . . . 18
7.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 18 7.3. Congestion Avoidance . . . . . . . . . . . . . . . . . . 18
7.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 18 7.4. Recovery Period . . . . . . . . . . . . . . . . . . . . . 18
7.5. Ignoring Loss of Undecryptable Packets . . . . . . . . . 18 7.5. Ignoring Loss of Undecryptable Packets . . . . . . . . . 19
7.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 18 7.6. Probe Timeout . . . . . . . . . . . . . . . . . . . . . . 19
7.7. Persistent Congestion . . . . . . . . . . . . . . . . . . 19 7.7. Persistent Congestion . . . . . . . . . . . . . . . . . . 19
7.8. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 20 7.8. Pacing . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.9. Under-utilizing the Congestion Window . . . . . . . . . . 20 7.9. Under-utilizing the Congestion Window . . . . . . . . . . 21
8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21
8.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 21 8.1. Congestion Signals . . . . . . . . . . . . . . . . . . . 21
8.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 21 8.2. Traffic Analysis . . . . . . . . . . . . . . . . . . . . 21
8.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 21 8.3. Misreporting ECN Markings . . . . . . . . . . . . . . . . 22
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 22
10.1. Normative References . . . . . . . . . . . . . . . . . . 22 10.1. Normative References . . . . . . . . . . . . . . . . . . 22
10.2. Informative References . . . . . . . . . . . . . . . . . 22 10.2. Informative References . . . . . . . . . . . . . . . . . 23
10.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 24 10.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 24 Appendix A. Loss Recovery Pseudocode . . . . . . . . . . . . . . 24
A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 24 A.1. Tracking Sent Packets . . . . . . . . . . . . . . . . . . 25
A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 24 A.1.1. Sent Packet Fields . . . . . . . . . . . . . . . . . 25
A.2. Constants of interest . . . . . . . . . . . . . . . . . . 25 A.2. Constants of interest . . . . . . . . . . . . . . . . . . 25
A.3. Variables of interest . . . . . . . . . . . . . . . . . . 25 A.3. Variables of interest . . . . . . . . . . . . . . . . . . 26
A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 26 A.4. Initialization . . . . . . . . . . . . . . . . . . . . . 27
A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 27 A.5. On Sending a Packet . . . . . . . . . . . . . . . . . . . 27
A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 27 A.6. On Receiving an Acknowledgment . . . . . . . . . . . . . 28
A.7. On Packet Acknowledgment . . . . . . . . . . . . . . . . 29 A.7. On Packet Acknowledgment . . . . . . . . . . . . . . . . 29
A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 29 A.8. Setting the Loss Detection Timer . . . . . . . . . . . . 30
A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 31 A.9. On Timeout . . . . . . . . . . . . . . . . . . . . . . . 32
A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 31 A.10. Detecting Lost Packets . . . . . . . . . . . . . . . . . 32
Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 32 Appendix B. Congestion Control Pseudocode . . . . . . . . . . . 33
B.1. Constants of interest . . . . . . . . . . . . . . . . . . 32 B.1. Constants of interest . . . . . . . . . . . . . . . . . . 33
B.2. Variables of interest . . . . . . . . . . . . . . . . . . 33 B.2. Variables of interest . . . . . . . . . . . . . . . . . . 34
B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 34 B.3. Initialization . . . . . . . . . . . . . . . . . . . . . 35
B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 34 B.4. On Packet Sent . . . . . . . . . . . . . . . . . . . . . 35
B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 34 B.5. On Packet Acknowledgement . . . . . . . . . . . . . . . . 35
B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 35 B.6. On New Congestion Event . . . . . . . . . . . . . . . . . 36
B.7. Process ECN Information . . . . . . . . . . . . . . . . . 35 B.7. Process ECN Information . . . . . . . . . . . . . . . . . 36
B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 36 B.8. On Packets Lost . . . . . . . . . . . . . . . . . . . . . 37
Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 36 Appendix C. Change Log . . . . . . . . . . . . . . . . . . . . . 37
C.1. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 36 C.1. Since draft-ietf-quic-recovery-20 . . . . . . . . . . . . 37
C.2. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 37 C.2. Since draft-ietf-quic-recovery-19 . . . . . . . . . . . . 37
C.3. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 37 C.3. Since draft-ietf-quic-recovery-18 . . . . . . . . . . . . 38
C.4. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 38 C.4. Since draft-ietf-quic-recovery-17 . . . . . . . . . . . . 38
C.5. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 38 C.5. Since draft-ietf-quic-recovery-16 . . . . . . . . . . . . 39
C.6. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 38 C.6. Since draft-ietf-quic-recovery-14 . . . . . . . . . . . . 39
C.7. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 39 C.7. Since draft-ietf-quic-recovery-13 . . . . . . . . . . . . 40
C.8. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 39 C.8. Since draft-ietf-quic-recovery-12 . . . . . . . . . . . . 40
C.9. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 39 C.9. Since draft-ietf-quic-recovery-11 . . . . . . . . . . . . 40
C.10. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 39 C.10. Since draft-ietf-quic-recovery-10 . . . . . . . . . . . . 40
C.11. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 39 C.11. Since draft-ietf-quic-recovery-09 . . . . . . . . . . . . 40
C.12. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 39 C.12. Since draft-ietf-quic-recovery-08 . . . . . . . . . . . . 41
C.13. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 40 C.13. Since draft-ietf-quic-recovery-07 . . . . . . . . . . . . 41
C.14. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 40 C.14. Since draft-ietf-quic-recovery-06 . . . . . . . . . . . . 41
C.15. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 40 C.15. Since draft-ietf-quic-recovery-05 . . . . . . . . . . . . 41
C.16. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 40 C.16. Since draft-ietf-quic-recovery-04 . . . . . . . . . . . . 41
C.17. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 40 C.17. Since draft-ietf-quic-recovery-03 . . . . . . . . . . . . 41
C.18. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 40 C.18. Since draft-ietf-quic-recovery-02 . . . . . . . . . . . . 41
C.19. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 40 C.19. Since draft-ietf-quic-recovery-01 . . . . . . . . . . . . 41
C.20. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 40 C.20. Since draft-ietf-quic-recovery-00 . . . . . . . . . . . . 42
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 41 C.21. Since draft-iyengar-quic-loss-recovery-01 . . . . . . . . 42
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 42
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 42
1. Introduction 1. Introduction
QUIC is a new multiplexed and secure transport atop UDP. QUIC builds QUIC is a new multiplexed and secure transport atop UDP. QUIC builds
on decades of transport and security experience, and implements on decades of transport and security experience, and implements
mechanisms that make it attractive as a modern general-purpose mechanisms that make it attractive as a modern general-purpose
transport. The QUIC protocol is described in [QUIC-TRANSPORT]. transport. The QUIC protocol is described in [QUIC-TRANSPORT].
QUIC implements the spirit of existing TCP loss recovery mechanisms, QUIC implements the spirit of existing TCP loss recovery mechanisms,
described in RFCs, various Internet-drafts, and also those prevalent described in RFCs, various Internet-drafts, and also those prevalent
skipping to change at page 4, line 41 skipping to change at page 4, line 43
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
Definitions of terms that are used in this document: Definitions of terms that are used in this document:
ACK-only: Any packet containing only one or more ACK frame(s). ACK-only: Any packet containing only one or more ACK frame(s).
In-flight: Packets are considered in-flight when they have been sent In-flight: Packets are considered in-flight when they have been sent
and neither acknowledged nor declared lost, and they are not ACK- and are not ACK-only, and they are not acknowledged, declared
only. lost, or abandoned along with old keys.
Ack-eliciting Frames: All frames besides ACK or PADDING are Ack-eliciting Frames: All frames besides ACK or PADDING are
considered ack-eliciting. considered ack-eliciting.
Ack-eliciting Packets: Packets that contain ack-eliciting frames Ack-eliciting Packets: Packets that contain ack-eliciting frames
elicit an ACK from the receiver within the maximum ack delay and elicit an ACK from the receiver within the maximum ack delay and
are called ack-eliciting packets. are called ack-eliciting packets.
Crypto Packets: Packets containing CRYPTO data sent in Initial or Crypto Packets: Packets containing CRYPTO data sent in Initial or
Handshake packets. Handshake packets.
skipping to change at page 6, line 43 skipping to change at page 6, line 43
ambiguity about which packet is acknowledged when an ACK is received. ambiguity about which packet is acknowledged when an ACK is received.
Consequently, more accurate RTT measurements can be made, spurious Consequently, more accurate RTT measurements can be made, spurious
retransmissions are trivially detected, and mechanisms such as Fast retransmissions are trivially detected, and mechanisms such as Fast
Retransmit can be applied universally, based only on packet number. Retransmit can be applied universally, based only on packet number.
This design point significantly simplifies loss detection mechanisms This design point significantly simplifies loss detection mechanisms
for QUIC. Most TCP mechanisms implicitly attempt to infer for QUIC. Most TCP mechanisms implicitly attempt to infer
transmission ordering based on TCP sequence numbers - a non-trivial transmission ordering based on TCP sequence numbers - a non-trivial
task, especially when TCP timestamps are not available. task, especially when TCP timestamps are not available.
3.1.3. No Reneging 3.1.3. Clearer Loss Epoch
QUIC ends a loss epoch when a packet sent after loss is declared is
acknowledged. TCP waits for the gap in the sequence number space to
be filled, and so if a segment is lost multiple times in a row, the
loss epoch may not end for several round trips. Because both should
reduce their congestion windows only once per epoch, QUIC will do it
correctly once for every round trip that experiences loss, while TCP
may only do it once across multiple round trips.
3.1.4. No Reneging
QUIC ACKs contain information that is similar to TCP SACK, but QUIC QUIC ACKs contain information that is similar to TCP SACK, but QUIC
does not allow any acked packet to be reneged, greatly simplifying does not allow any acked packet to be reneged, greatly simplifying
implementations on both sides and reducing memory pressure on the implementations on both sides and reducing memory pressure on the
sender. sender.
3.1.4. More ACK Ranges 3.1.5. More ACK Ranges
QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In QUIC supports many ACK ranges, opposed to TCP's 3 SACK ranges. In
high loss environments, this speeds recovery, reduces spurious high loss environments, this speeds recovery, reduces spurious
retransmits, and ensures forward progress without relying on retransmits, and ensures forward progress without relying on
timeouts. timeouts.
3.1.5. Explicit Correction For Delayed Acknowledgements 3.1.6. Explicit Correction For Delayed Acknowledgements
QUIC endpoints measure the delay incurred between when a packet is QUIC endpoints measure the delay incurred between when a packet is
received and when the corresponding acknowledgment is sent, allowing received and when the corresponding acknowledgment is sent, allowing
a peer to maintain a more accurate round-trip time estimate (see a peer to maintain a more accurate round-trip time estimate (see
Section 4.4). Section 4.4).
4. Generating Acknowledgements 4. Generating Acknowledgements
An acknowledgement SHOULD be sent immediately upon receipt of a An acknowledgement SHOULD be sent immediately upon receipt of a
second ack-eliciting packet. QUIC recovery algorithms do not assume second ack-eliciting packet. QUIC recovery algorithms do not assume
skipping to change at page 8, line 40 skipping to change at page 8, line 49
of 1 RTT of reordering. In cases with ACK frame loss and reordering, of 1 RTT of reordering. In cases with ACK frame loss and reordering,
this approach does not guarantee that every acknowledgement is seen this approach does not guarantee that every acknowledgement is seen
by the sender before it is no longer included in the ACK frame. by the sender before it is no longer included in the ACK frame.
Packets could be received out of order and all subsequent ACK frames Packets could be received out of order and all subsequent ACK frames
containing them could be lost. In this case, the loss recovery containing them could be lost. In this case, the loss recovery
algorithm may cause spurious retransmits, but the sender will algorithm may cause spurious retransmits, but the sender will
continue making forward progress. continue making forward progress.
4.4. Measuring and Reporting Host Delay 4.4. Measuring and Reporting Host Delay
An endpoint measures the delay incurred between when a packet is An endpoint measures the delays intentionally introduced between when
received and when the corresponding acknowledgment is sent. The an ACK-eliciting packet is received and the corresponding
endpoint encodes this host delay for the largest acknowledged packet acknowledgment is sent. The endpoint encodes this delay for the
in the Ack Delay field of an ACK frame (see Section 19.3 of largest acknowledged packet in the Ack Delay field of an ACK frame
[QUIC-TRANSPORT]). This allows the receiver of the ACK to adjust for (see Section 19.3 of [QUIC-TRANSPORT]). This allows the receiver of
any host delays, which is important for delayed acknowledgements, the ACK to adjust for any intentional delays, which is important for
when estimating the path RTT. In certain deployments, a packet might delayed acknowledgements, when estimating the path RTT. A packet
be held in the OS kernel or elsewhere on the host before being might be held in the OS kernel or elsewhere on the host before being
processed by the QUIC stack. Where possible, an endpoint MAY include processed. An endpoint SHOULD NOT include these unintentional delays
these delays when populating the Ack Delay field in an ACK frame. when populating the Ack Delay field in an ACK frame.
An endpoint MUST NOT excessively delay acknowledgements of ack- An endpoint MUST NOT excessively delay acknowledgements of ack-
eliciting packets. The maximum ack delay is communicated in the eliciting packets. The maximum ack delay is communicated in the
max_ack_delay transport parameter, see Section 18.1 of max_ack_delay transport parameter; see Section 18.1 of
[QUIC-TRANSPORT]. max_ack_delay implies an explicit contract: an [QUIC-TRANSPORT]. max_ack_delay implies an explicit contract: an
endpoint promises to never delay acknowledgments of an ack-eliciting endpoint promises to never delay acknowledgments of an ack-eliciting
packet by more than the indicated value. If it does, any excess packet by more than the indicated value. If it does, any excess
accrues to the RTT estimate and could result in spurious accrues to the RTT estimate and could result in spurious
retransmissions from the peer. retransmissions from the peer. For Initial and Handshake packets, a
max_ack_delay of 0 is used.
5. Estimating the Round-Trip Time 5. Estimating the Round-Trip Time
At a high level, an endpoint measures the time from when a packet was At a high level, an endpoint measures the time from when a packet was
sent to when it is acknowledged as a round-trip time (RTT) sample. sent to when it is acknowledged as a round-trip time (RTT) sample.
The endpoint uses RTT samples and peer-reported host delays The endpoint uses RTT samples and peer-reported host delays
(Section 4.4) to generate a statistical description of the (Section 4.4) to generate a statistical description of the
connection's RTT. An endpoint computes the following three values: connection's RTT. An endpoint computes the following three values:
the minimum value observed over the lifetime of the connection the minimum value observed over the lifetime of the connection
(min_rtt), an exponentially-weighted moving average (smoothed_rtt), (min_rtt), an exponentially-weighted moving average (smoothed_rtt),
skipping to change at page 10, line 33 skipping to change at page 10, line 42
the smoothed_rtt based entirely on what it observes (see the smoothed_rtt based entirely on what it observes (see
Section 5.3), and limits potential underestimation due to Section 5.3), and limits potential underestimation due to
erroneously-reported delays by the peer. erroneously-reported delays by the peer.
5.3. Estimating smoothed_rtt and rttvar 5.3. Estimating smoothed_rtt and rttvar
smoothed_rtt is an exponentially-weighted moving average of an smoothed_rtt is an exponentially-weighted moving average of an
endpoint's RTT samples, and rttvar is the endpoint's estimated endpoint's RTT samples, and rttvar is the endpoint's estimated
variance in the RTT samples. variance in the RTT samples.
smoothed_rtt uses path latency after adjusting RTT samples for peer- The calculation of smoothed_rtt uses path latency after adjusting RTT
reported host delays (Section 4.4). A peer limits any delay in samples for host delays (Section 4.4). For packets sent in the
ApplicationData packet number space, a peer limits any delay in
sending an acknowledgement for an ack-eliciting packet to no greater sending an acknowledgement for an ack-eliciting packet to no greater
than the advertised max_ack_delay transport parameter. Consequently, than the value it advertised in the max_ack_delay transport
when a peer reports an Ack Delay that is greater than its parameter. Consequently, when a peer reports an Ack Delay that is
max_ack_delay, the delay is attributed to reasons out of the peer's greater than its max_ack_delay, the delay is attributed to reasons
control, such as scheduler latency at the peer or loss of previous out of the peer's control, such as scheduler latency at the peer or
ACK frames. Any delays beyond the peer's max_ack_delay are therefore loss of previous ACK frames. Any delays beyond the peer's
considered effectively part of path delay and incorporated into the max_ack_delay are therefore considered effectively part of path delay
smoothed_rtt estimate. and incorporated into the smoothed_rtt estimate.
When adjusting an RTT sample using peer-reported acknowledgement When adjusting an RTT sample using peer-reported acknowledgement
delays, an endpoint: delays, an endpoint:
o MUST ignore the Ack Delay field of the ACK frame for packets sent
in the Initial and Handshake packet number space.
o MUST use the lesser of the value reported in Ack Delay field of o MUST use the lesser of the value reported in Ack Delay field of
the ACK frame and the peer's max_ack_delay transport parameter the ACK frame and the peer's max_ack_delay transport parameter
(Section 4.4). (Section 4.4).
o MUST NOT apply the adjustment if the resulting RTT sample is o MUST NOT apply the adjustment if the resulting RTT sample is
smaller than the min_rtt. This limits the underestimation that a smaller than the min_rtt. This limits the underestimation that a
misreporting peer can cause to the smoothed_rtt. misreporting peer can cause to the smoothed_rtt.
On the first RTT sample in a connection, the smoothed_rtt is set to On the first RTT sample in a connection, the smoothed_rtt is set to
the latest_rtt. the latest_rtt.
skipping to change at page 12, line 29 skipping to change at page 12, line 40
(kPacketThreshold) is 3, based on best practices for TCP loss (kPacketThreshold) is 3, based on best practices for TCP loss
detection [RFC5681] [RFC6675]. detection [RFC5681] [RFC6675].
Some networks may exhibit higher degrees of reordering, causing a Some networks may exhibit higher degrees of reordering, causing a
sender to detect spurious losses. Implementers MAY use algorithms sender to detect spurious losses. Implementers MAY use algorithms
developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's developed for TCP, such as TCP-NCR [RFC4653], to improve QUIC's
reordering resilience. reordering resilience.
6.1.2. Time Threshold 6.1.2. Time Threshold
Once a later packet has been acknowledged, an endpoint SHOULD declare Once a later packet packet within the same packet number space has
an earlier packet lost if it was sent a threshold amount of time in been acknowledged, an endpoint SHOULD declare an earlier packet lost
the past. The time threshold is computed as kTimeThreshold * if it was sent a threshold amount of time in the past. To avoid
max(SRTT, latest_RTT). If packets sent prior to the largest declaring packets as lost too early, this time threshold MUST be set
acknowledged packet cannot yet be declared lost, then a timer SHOULD to at least kGranularity. The time threshold is:
be set for the remaining time.
The RECOMMENDED time threshold (kTimeThreshold), expressed as a kTimeThreshold * max(SRTT, latest_RTT, kGranularity)
round-trip time multiplier, is 9/8.
If packets sent prior to the largest acknowledged packet cannot yet
be declared lost, then a timer SHOULD be set for the remaining time.
Using max(SRTT, latest_RTT) protects from the two following cases: Using max(SRTT, latest_RTT) protects from the two following cases:
o the latest RTT sample is lower than the SRTT, perhaps due to o the latest RTT sample is lower than the SRTT, perhaps due to
reordering where the acknowledgement encountered a shorter path; reordering where the acknowledgement encountered a shorter path;
o the latest RTT sample is higher than the SRTT, perhaps due to a o the latest RTT sample is higher than the SRTT, perhaps due to a
sustained increase in the actual RTT, but the smoothed SRTT has sustained increase in the actual RTT, but the smoothed SRTT has
not yet caught up. not yet caught up.
An endpoint might consistently record RTT samples as 0 in extremely The RECOMMENDED time threshold (kTimeThreshold), expressed as a
low latency networks, leading to a smoothed_rtt of 0. Consequently, round-trip time multiplier, is 9/8.
the endpoint could declare all earlier packets as lost immediately
upon receiving an acknowledgement for a later packet. That is, the
endpoint would not provide any reordering tolerance. To avoid
declaring packets as lost too early, the time threshold MUST be set
to at least kGranularity (defined in Appendix A.2).
Implementations MAY experiment with absolute thresholds, thresholds Implementations MAY experiment with absolute thresholds, thresholds
from previous connections, adaptive thresholds, or including RTT from previous connections, adaptive thresholds, or including RTT
variance. Smaller thresholds reduce reordering resilience and variance. Smaller thresholds reduce reordering resilience and
increase spurious retransmissions, and larger thresholds increase increase spurious retransmissions, and larger thresholds increase
loss detection delay. loss detection delay.
6.2. Crypto Retransmission Timeout 6.2. Crypto Retransmission Timeout
Data in CRYPTO frames is critical to QUIC transport and crypto Data in CRYPTO frames is critical to QUIC transport and crypto
skipping to change at page 13, line 28 skipping to change at page 13, line 36
The initial crypto retransmission timeout SHOULD be set to twice the The initial crypto retransmission timeout SHOULD be set to twice the
initial RTT. initial RTT.
At the beginning, there are no prior RTT samples within a connection. At the beginning, there are no prior RTT samples within a connection.
Resumed connections over the same network SHOULD use the previous Resumed connections over the same network SHOULD use the previous
connection's final smoothed RTT value as the resumed connection's connection's final smoothed RTT value as the resumed connection's
initial RTT. If no previous RTT is available, or if the network initial RTT. If no previous RTT is available, or if the network
changes, the initial RTT SHOULD be set to 500ms, resulting in a 1 changes, the initial RTT SHOULD be set to 500ms, resulting in a 1
second initial handshake timeout as recommended in [RFC6298]. second initial handshake timeout as recommended in [RFC6298].
A connection MAY use the delay between sending a PATH_CHALLENGE and
receiving a PATH_RESPONSE to seed initial_rtt for a new path, but the
delay SHOULD NOT be considered an RTT sample.
When a crypto packet is sent, the sender MUST set a timer for twice When a crypto packet is sent, the sender MUST set a timer for twice
the smoothed RTT. This timer MUST be updated when a new crypto the smoothed RTT. This timer MUST be updated when a new crypto
packet is sent and when an acknowledgement is received which computes packet is sent and when an acknowledgement is received which computes
a new RTT sample. Upon timeout, the sender MUST retransmit all a new RTT sample. Upon timeout, the sender MUST retransmit all
unacknowledged CRYPTO data if possible. The sender MUST NOT declare unacknowledged CRYPTO data if possible. The sender MUST NOT declare
in-flight crypto packets as lost when the crypto timer expires. in-flight crypto packets as lost when the crypto timer expires.
On each consecutive expiration of the crypto timer without receiving On each consecutive expiration of the crypto timer without receiving
an acknowledgement for a new packet, the sender MUST double the an acknowledgement for a new packet, the sender MUST double the
crypto retransmission timeout and set a timer for this period. crypto retransmission timeout and set a timer for this period.
skipping to change at page 14, line 4 skipping to change at page 14, line 16
then all unacknowledged CRYPTO data sent in Initial packets should be then all unacknowledged CRYPTO data sent in Initial packets should be
retransmitted. If no data can be sent, then no alarm should be armed retransmitted. If no data can be sent, then no alarm should be armed
until data has been received from the client. until data has been received from the client.
Because the server could be blocked until more packets are received, Because the server could be blocked until more packets are received,
the client MUST ensure that the crypto retransmission timer is set if the client MUST ensure that the crypto retransmission timer is set if
there is unacknowledged crypto data or if the client does not yet there is unacknowledged crypto data or if the client does not yet
have 1-RTT keys. If the crypto retransmission timer expires before have 1-RTT keys. If the crypto retransmission timer expires before
the client has 1-RTT keys, it is possible that the client may not the client has 1-RTT keys, it is possible that the client may not
have any crypto data to retransmit. However, the client MUST send a have any crypto data to retransmit. However, the client MUST send a
new packet, containing only PING or PADDING frames if necessary, to new packet, containing only PADDING frames if necessary, to allow the
allow the server to continue sending data. If Handshake keys are server to continue sending data. If Handshake keys are available to
available to the client, it MUST send a Handshake packet, and the client, it MUST send a Handshake packet, and otherwise it MUST
otherwise it MUST send an Initial packet in a UDP datagram of at send an Initial packet in a UDP datagram of at least 1200 bytes.
least 1200 bytes.
Because packets only containing PADDING do not elicit an
acknowledgement, they may never be acknowledged, but they are removed
from bytes in flight when the client gets Handshake keys and the
Initial keys are discarded.
The crypto retransmission timer is not set if the time threshold The crypto retransmission timer is not set if the time threshold
Section 6.1.2 loss detection timer is set. The time threshold loss Section 6.1.2 loss detection timer is set. The time threshold loss
detection timer is expected to both expire earlier than the crypto detection timer is expected to both expire earlier than the crypto
retransmission timeout and be less likely to spuriously retransmit retransmission timeout and be less likely to spuriously retransmit
data. The Initial and Handshake packet number spaces will typically data. The Initial and Handshake packet number spaces will typically
contain a small number of packets, so losses are less likely to be contain a small number of packets, so losses are less likely to be
detected using packet-threshold loss detection. detected using packet-threshold loss detection.
When the crypto retransmission timer is active, the probe timer When the crypto retransmission timer is active, the probe timer
(Section 6.3) is not active. (Section 6.3) is not active.
6.2.1. Retry and Version Negotiation
A Retry or Version Negotiation packet causes a client to send another
Initial packet, effectively restarting the connection process and
resetting congestion control and loss recovery state, including
resetting any pending timers. Either packet indicates that the
Initial was received but not processed. Neither packet can be
treated as an acknowledgment for the Initial.
The client MAY however compute an RTT estimate to the server as the
time period from when the first Initial was sent to when a Retry or a
Version Negotiation packet is received. The client MAY use this
value to seed the RTT estimator for a subsequent connection attempt
to the server.
6.2.2. Discarding Keys and Packet State
When packet protection keys are discarded (see Section 4.9 of
[QUIC-TLS]), all packets that were sent with those keys can no longer
be acknowledged because their acknowledgements cannot be processed
anymore. The sender MUST discard all recovery state associated with
those packets and MUST remove them from the count of bytes in flight.
Endpoints stop sending and receiving Initial packets once they start
exchanging Handshake packets (see Section 17.2.2.1 of
[QUIC-TRANSPORT]). At this point, recovery state for all in-flight
Initial packets is discarded.
When 0-RTT is rejected, recovery state for all in-flight 0-RTT
packets is discarded.
If a server accepts 0-RTT, but does not buffer 0-RTT packets that
arrive before Initial packets, early 0-RTT packets will be declared
lost, but that is expected to be infrequent.
It is expected that keys are discarded after packets encrypted with
them would be acknowledged or declared lost. Initial secrets however
might be destroyed sooner, as soon as handshake keys are available
(see Section 4.10 of [QUIC-TLS]).
6.3. Probe Timeout 6.3. Probe Timeout
A Probe Timeout (PTO) triggers a probe packet when ack-eliciting data A Probe Timeout (PTO) triggers a probe packet when ack-eliciting data
is in flight but an acknowledgement is not received within the is in flight but an acknowledgement is not received within the
expected period of time. A PTO enables a connection to recover from expected period of time. A PTO enables a connection to recover from
loss of tail packets or acks. The PTO algorithm used in QUIC loss of tail packets or acks. The PTO algorithm used in QUIC
implements the reliability functions of Tail Loss Probe [TLP] [RACK], implements the reliability functions of Tail Loss Probe [TLP] [RACK],
RTO [RFC5681] and F-RTO algorithms for TCP [RFC5682], and the timeout RTO [RFC5681] and F-RTO algorithms for TCP [RFC5682], and the timeout
computation is based on TCP's retransmission timeout period computation is based on TCP's retransmission timeout period
[RFC6298]. [RFC6298].
skipping to change at page 17, line 14 skipping to change at page 16, line 36
6.3.3. Loss Detection 6.3.3. Loss Detection
Delivery or loss of packets in flight is established when an ACK Delivery or loss of packets in flight is established when an ACK
frame is received that newly acknowledges one or more packets. frame is received that newly acknowledges one or more packets.
A PTO timer expiration event does not indicate packet loss and MUST A PTO timer expiration event does not indicate packet loss and MUST
NOT cause prior unacknowledged packets to be marked as lost. When an NOT cause prior unacknowledged packets to be marked as lost. When an
acknowledgement is received that newly acknowledges packets, loss acknowledgement is received that newly acknowledges packets, loss
detection proceeds as dictated by packet and time threshold detection proceeds as dictated by packet and time threshold
mechanisms, see Section 6.1. mechanisms; see Section 6.1.
6.4. Discussion 6.4. Retry and Version Negotiation
A Retry or Version Negotiation packet causes a client to send another
Initial packet, effectively restarting the connection process and
resetting congestion control and loss recovery state, including
resetting any pending timers. Either packet indicates that the
Initial was received but not processed. Neither packet can be
treated as an acknowledgment for the Initial.
The client MAY however compute an RTT estimate to the server as the
time period from when the first Initial was sent to when a Retry or a
Version Negotiation packet is received. The client MAY use this
value to seed the RTT estimator for a subsequent connection attempt
to the server.
6.5. Discarding Keys and Packet State
When packet protection keys are discarded (see Section 4.9 of
[QUIC-TLS]), all packets that were sent with those keys can no longer
be acknowledged because their acknowledgements cannot be processed
anymore. The sender MUST discard all recovery state associated with
those packets and MUST remove them from the count of bytes in flight.
Endpoints stop sending and receiving Initial packets once they start
exchanging Handshake packets (see Section 17.2.2.1 of
[QUIC-TRANSPORT]). At this point, recovery state for all in-flight
Initial packets is discarded.
When 0-RTT is rejected, recovery state for all in-flight 0-RTT
packets is discarded.
If a server accepts 0-RTT, but does not buffer 0-RTT packets that
arrive before Initial packets, early 0-RTT packets will be declared
lost, but that is expected to be infrequent.
It is expected that keys are discarded after packets encrypted with
them would be acknowledged or declared lost. Initial secrets however
might be destroyed sooner, as soon as handshake keys are available
(see Section 4.9.1 of [QUIC-TLS]).
6.6. Discussion
The majority of constants were derived from best common practices The majority of constants were derived from best common practices
among widely deployed TCP implementations on the internet. among widely deployed TCP implementations on the internet.
Exceptions follow. Exceptions follow.
A shorter delayed ack time of 25ms was chosen because longer delayed A shorter delayed ack time of 25ms was chosen because longer delayed
acks can delay loss recovery and for the small number of connections acks can delay loss recovery and for the small number of connections
where less than packet per 25ms is delivered, acking every packet is where less than packet per 25ms is delivered, acking every packet is
beneficial to congestion control and loss recovery. beneficial to congestion control and loss recovery.
skipping to change at page 18, line 9 skipping to change at page 18, line 23
Experienced codepoint in the IP header as a signal of congestion. Experienced codepoint in the IP header as a signal of congestion.
This document specifies an endpoint's response when its peer receives This document specifies an endpoint's response when its peer receives
packets with the Congestion Experienced codepoint. As discussed in packets with the Congestion Experienced codepoint. As discussed in
[RFC8311], endpoints are permitted to experiment with other response [RFC8311], endpoints are permitted to experiment with other response
functions. functions.
7.2. Slow Start 7.2. Slow Start
QUIC begins every connection in slow start and exits slow start upon QUIC begins every connection in slow start and exits slow start upon
loss or upon increase in the ECN-CE counter. QUIC re-enters slow loss or upon increase in the ECN-CE counter. QUIC re-enters slow
start anytime the congestion window is less than ssthresh, which start anytime the congestion window is less than ssthresh, which only
typically only occurs after an PTO. While in slow start, QUIC occurs after persistent congestion is declared. While in slow start,
increases the congestion window by the number of bytes acknowledged QUIC increases the congestion window by the number of bytes
when each acknowledgment is processed. acknowledged when each acknowledgment is processed.
7.3. Congestion Avoidance 7.3. Congestion Avoidance
Slow start exits to congestion avoidance. Congestion avoidance in Slow start exits to congestion avoidance. Congestion avoidance in
NewReno uses an additive increase multiplicative decrease (AIMD) NewReno uses an additive increase multiplicative decrease (AIMD)
approach that increases the congestion window by one maximum packet approach that increases the congestion window by one maximum packet
size per congestion window acknowledged. When a loss is detected, size per congestion window acknowledged. When a loss is detected,
NewReno halves the congestion window and sets the slow start NewReno halves the congestion window and sets the slow start
threshold to the new congestion window. threshold to the new congestion window.
skipping to change at page 22, line 15 skipping to change at page 22, line 38
9. IANA Considerations 9. IANA Considerations
This document has no IANA actions. Yet. This document has no IANA actions. Yet.
10. References 10. References
10.1. Normative References 10.1. Normative References
[QUIC-TLS] [QUIC-TLS]
Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure
QUIC", draft-ietf-quic-tls-20 (work in progress), April QUIC", draft-ietf-quic-tls-21 (work in progress), July
2019. 2019.
[QUIC-TRANSPORT] [QUIC-TRANSPORT]
Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
Multiplexed and Secure Transport", draft-ietf-quic- Multiplexed and Secure Transport", draft-ietf-quic-
transport-20 (work in progress), April 2019. transport-21 (work in progress), July 2019.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
skipping to change at page 22, line 45 skipping to change at page 23, line 22
<https://www.rfc-editor.org/info/rfc8311>. <https://www.rfc-editor.org/info/rfc8311>.
10.2. Informative References 10.2. Informative References
[FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement: [FACK] Mathis, M. and J. Mahdavi, "Forward Acknowledgement:
Refining TCP Congestion Control", ACM SIGCOMM , August Refining TCP Congestion Control", ACM SIGCOMM , August
1996. 1996.
[RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK: [RACK] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "RACK:
a time-based fast loss detection algorithm for TCP", a time-based fast loss detection algorithm for TCP",
draft-ietf-tcpm-rack-04 (work in progress), July 2018. draft-ietf-tcpm-rack-05 (work in progress), April 2019.
[RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte
Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February
2003, <https://www.rfc-editor.org/info/rfc3465>. 2003, <https://www.rfc-editor.org/info/rfc3465>.
[RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton, [RFC4653] Bhandarkar, S., Reddy, A., Allman, M., and E. Blanton,
"Improving the Robustness of TCP to Non-Congestion "Improving the Robustness of TCP to Non-Congestion
Events", RFC 4653, DOI 10.17487/RFC4653, August 2006, Events", RFC 4653, DOI 10.17487/RFC4653, August 2006,
<https://www.rfc-editor.org/info/rfc4653>. <https://www.rfc-editor.org/info/rfc4653>.
skipping to change at page 24, line 36 skipping to change at page 25, line 13
mechanisms described in Section 6. mechanisms described in Section 6.
A.1. Tracking Sent Packets A.1. Tracking Sent Packets
To correctly implement congestion control, a QUIC sender tracks every To correctly implement congestion control, a QUIC sender tracks every
ack-eliciting packet until the packet is acknowledged or lost. It is ack-eliciting packet until the packet is acknowledged or lost. It is
expected that implementations will be able to access this information expected that implementations will be able to access this information
by packet number and crypto context and store the per-packet fields by packet number and crypto context and store the per-packet fields
(Appendix A.1.1) for loss recovery and congestion control. (Appendix A.1.1) for loss recovery and congestion control.
After a packet is declared lost, it SHOULD be tracked for an amount After a packet is declared lost, the endpoint can track it for an
of time comparable to the maximum expected packet reordering, such as amount of time comparable to the maximum expected packet reordering,
1 RTT. This allows for detection of spurious retransmissions. such as 1 RTT. This allows for detection of spurious
retransmissions.
Sent packets are tracked for each packet number space, and ACK Sent packets are tracked for each packet number space, and ACK
processing only applies to a single space. processing only applies to a single space.
A.1.1. Sent Packet Fields A.1.1. Sent Packet Fields
packet_number: The packet number of the sent packet. packet_number: The packet number of the sent packet.
ack_eliciting: A boolean that indicates whether a packet is ack- ack_eliciting: A boolean that indicates whether a packet is ack-
eliciting. If true, it is expected that an acknowledgement will eliciting. If true, it is expected that an acknowledgement will
skipping to change at page 26, line 29 skipping to change at page 27, line 4
largest_acked_packet[kPacketNumberSpace]: The largest packet number largest_acked_packet[kPacketNumberSpace]: The largest packet number
acknowledged in the packet number space so far. acknowledged in the packet number space so far.
latest_rtt: The most recent RTT measurement made when receiving an latest_rtt: The most recent RTT measurement made when receiving an
ack for a previously unacked packet. ack for a previously unacked packet.
smoothed_rtt: The smoothed RTT of the connection, computed as smoothed_rtt: The smoothed RTT of the connection, computed as
described in [RFC6298] described in [RFC6298]
rttvar: The RTT variance, computed as described in [RFC6298] rttvar: The RTT variance, computed as described in [RFC6298]
min_rtt: The minimum RTT seen in the connection, ignoring ack delay. min_rtt: The minimum RTT seen in the connection, ignoring ack delay.
max_ack_delay: The maximum amount of time by which the receiver max_ack_delay: The maximum amount of time by which the receiver
intends to delay acknowledgments, in milliseconds. The actual intends to delay acknowledgments for packets in the
ack_delay in a received ACK frame may be larger due to late ApplicationData packet number space. The actual ack_delay in a
timers, reordering, or lost ACKs. received ACK frame may be larger due to late timers, reordering,
or lost ACKs.
loss_time[kPacketNumberSpace]: The time at which the next packet in loss_time[kPacketNumberSpace]: The time at which the next packet in
that packet number space will be considered lost based on that packet number space will be considered lost based on
exceeding the reordering window in time. exceeding the reordering window in time.
sent_packets[kPacketNumberSpace]: An association of packet numbers sent_packets[kPacketNumberSpace]: An association of packet numbers
in a packet number space to information about them. Described in in a packet number space to information about them. Described in
detail above in Appendix A.1. detail above in Appendix A.1.
A.4. Initialization A.4. Initialization
skipping to change at page 27, line 12 skipping to change at page 27, line 32
At the beginning of the connection, initialize the loss detection At the beginning of the connection, initialize the loss detection
variables as follows: variables as follows:
loss_detection_timer.reset() loss_detection_timer.reset()
crypto_count = 0 crypto_count = 0
pto_count = 0 pto_count = 0
latest_rtt = 0 latest_rtt = 0
smoothed_rtt = 0 smoothed_rtt = 0
rttvar = 0 rttvar = 0
min_rtt = 0 min_rtt = 0
max_ack_delay = 0
time_of_last_sent_ack_eliciting_packet = 0 time_of_last_sent_ack_eliciting_packet = 0
time_of_last_sent_crypto_packet = 0 time_of_last_sent_crypto_packet = 0
for pn_space in [ Initial, Handshake, ApplicationData ]: for pn_space in [ Initial, Handshake, ApplicationData ]:
largest_acked_packet[pn_space] = 0 largest_acked_packet[pn_space] = infinite
loss_time[pn_space] = 0 loss_time[pn_space] = 0
A.5. On Sending a Packet A.5. On Sending a Packet
After a packet is sent, information about the packet is stored. The After a packet is sent, information about the packet is stored. The
parameters to OnPacketSent are described in detail above in parameters to OnPacketSent are described in detail above in
Appendix A.1.1. Appendix A.1.1.
Pseudocode for OnPacketSent follows: Pseudocode for OnPacketSent follows:
skipping to change at page 27, line 51 skipping to change at page 28, line 30
SetLossDetectionTimer() SetLossDetectionTimer()
A.6. On Receiving an Acknowledgment A.6. On Receiving an Acknowledgment
When an ACK frame is received, it may newly acknowledge any number of When an ACK frame is received, it may newly acknowledge any number of
packets. packets.
Pseudocode for OnAckReceived and UpdateRtt follow: Pseudocode for OnAckReceived and UpdateRtt follow:
OnAckReceived(ack, pn_space): OnAckReceived(ack, pn_space):
largest_acked_packet[pn_space] = if (largest_acked_packet[pn_space] == infinite):
max(largest_acked_packet[pn_space], ack.largest_acked) largest_acked_packet[pn_space] = ack.largest_acked
else:
largest_acked_packet[pn_space] =
max(largest_acked_packet[pn_space], ack.largest_acked)
// Nothing to do if there are no newly acked packets. // Nothing to do if there are no newly acked packets.
newly_acked_packets = DetermineNewlyAckedPackets(ack, pn_space) newly_acked_packets = DetermineNewlyAckedPackets(ack, pn_space)
if (newly_acked_packets.empty()): if (newly_acked_packets.empty()):
return return
// If the largest acknowledged is newly acked and // If the largest acknowledged is newly acked and
// at least one ack-eliciting was newly acked, update the RTT. // at least one ack-eliciting was newly acked, update the RTT.
if (sent_packets[pn_space][ack.largest_acked] && if (sent_packets[pn_space][ack.largest_acked] &&
IncludesAckEliciting(newly_acked_packets)) IncludesAckEliciting(newly_acked_packets))
latest_rtt = latest_rtt =
now - sent_packets[pn_space][ack.largest_acked].time_sent now - sent_packets[pn_space][ack.largest_acked].time_sent
UpdateRtt(ack.ack_delay) ack_delay = 0
if pn_space == ApplicationData:
ack_delay = ack.ack_delay
UpdateRtt(ack_delay)
// Process ECN information if present. // Process ECN information if present.
if (ACK frame contains ECN information): if (ACK frame contains ECN information):
ProcessECN(ack) ProcessECN(ack)
for acked_packet in newly_acked_packets: for acked_packet in newly_acked_packets:
OnPacketAcked(acked_packet.packet_number, pn_space) OnPacketAcked(acked_packet.packet_number, pn_space)
DetectLostPackets(pn_space) DetectLostPackets(pn_space)
crypto_count = 0 crypto_count = 0
pto_count = 0 pto_count = 0
skipping to change at page 32, line 6 skipping to change at page 33, line 6
SetLossDetectionTimer() SetLossDetectionTimer()
A.10. Detecting Lost Packets A.10. Detecting Lost Packets
DetectLostPackets is called every time an ACK is received and DetectLostPackets is called every time an ACK is received and
operates on the sent_packets for that packet number space. operates on the sent_packets for that packet number space.
Pseudocode for DetectLostPackets follows: Pseudocode for DetectLostPackets follows:
DetectLostPackets(pn_space): DetectLostPackets(pn_space):
assert(largest_acked_packet[pn_space] != infinite)
loss_time[pn_space] = 0 loss_time[pn_space] = 0
lost_packets = {} lost_packets = {}
loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt) loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt)
// Minimum time of kGranularity before packets are deemed lost. // Minimum time of kGranularity before packets are deemed lost.
loss_delay = max(loss_delay, kGranularity) loss_delay = max(loss_delay, kGranularity)
// Packets sent before this time are deemed lost. // Packets sent before this time are deemed lost.
lost_send_time = now() - loss_delay lost_send_time = now() - loss_delay
// Packets with packet numbers before this are deemed lost.
lost_pn = largest_acked_packet[pn_space] - kPacketThreshold
foreach unacked in sent_packets[pn_space]: foreach unacked in sent_packets[pn_space]:
if (unacked.packet_number > largest_acked_packet[pn_space]): if (unacked.packet_number > largest_acked_packet[pn_space]):
continue continue
// Mark packet as lost, or set time when it should be marked. // Mark packet as lost, or set time when it should be marked.
if (unacked.time_sent <= lost_send_time || if (unacked.time_sent <= lost_send_time ||
unacked.packet_number <= lost_pn): largest_acked_packet[pn_space] >=
unacked.packet_number + kPacketThreshold):
sent_packets[pn_space].remove(unacked.packet_number) sent_packets[pn_space].remove(unacked.packet_number)
if (unacked.in_flight): if (unacked.in_flight):
lost_packets.insert(unacked) lost_packets.insert(unacked)
else: else:
if (loss_time[pn_space] == 0): if (loss_time[pn_space] == 0):
loss_time[pn_space] = unacked.time_sent + loss_delay loss_time[pn_space] = unacked.time_sent + loss_delay
else: else:
loss_time[pn_space] = min(loss_time[pn_space], loss_time[pn_space] = min(loss_time[pn_space],
unacked.time_sent + loss_delay) unacked.time_sent + loss_delay)
skipping to change at page 36, line 13 skipping to change at page 37, line 13
CongestionEvent(sent_packets[ack.largest_acked].time_sent) CongestionEvent(sent_packets[ack.largest_acked].time_sent)
B.8. On Packets Lost B.8. On Packets Lost
Invoked from DetectLostPackets when packets are deemed lost. Invoked from DetectLostPackets when packets are deemed lost.
InPersistentCongestion(largest_lost_packet): InPersistentCongestion(largest_lost_packet):
pto = smoothed_rtt + max(4 * rttvar, kGranularity) + pto = smoothed_rtt + max(4 * rttvar, kGranularity) +
max_ack_delay max_ack_delay
congestion_period = pto * kPersistentCongestionThreshold congestion_period = pto * kPersistentCongestionThreshold
// Determine if all packets in the window before the // Determine if all packets in the time period before the
// newest lost packet, including the edges, are marked // newest lost packet, including the edges, are marked
// lost // lost
return IsWindowLost(largest_lost_packet, congestion_period) return AreAllPacketsLost(largest_lost_packet,
congestion_period)
OnPacketsLost(lost_packets): OnPacketsLost(lost_packets):
// Remove lost packets from bytes_in_flight. // Remove lost packets from bytes_in_flight.
for (lost_packet : lost_packets): for (lost_packet : lost_packets):
bytes_in_flight -= lost_packet.size bytes_in_flight -= lost_packet.size
largest_lost_packet = lost_packets.last() largest_lost_packet = lost_packets.last()
CongestionEvent(largest_lost_packet.time_sent) CongestionEvent(largest_lost_packet.time_sent)
// Collapse congestion window if persistent congestion // Collapse congestion window if persistent congestion
if (InPersistentCongestion(largest_lost_packet)): if (InPersistentCongestion(largest_lost_packet)):
congestion_window = kMinimumWindow congestion_window = kMinimumWindow
Appendix C. Change Log Appendix C. Change Log
*RFC Editor's Note:* Please remove this section prior to *RFC Editor's Note:* Please remove this section prior to
publication of a final version of this document. publication of a final version of this document.
Issue and pull request numbers are listed with a leading octothorp. Issue and pull request numbers are listed with a leading octothorp.
C.1. Since draft-ietf-quic-recovery-19 C.1. Since draft-ietf-quic-recovery-20
o Path validation can be used as initial RTT value (#2644, #2687)
o max_ack_delay transport parameter defaults to 0 (#2638, #2646)
o Ack Delay only measures intentional delays induced by the
implementation (#2596, #2786)
C.2. Since draft-ietf-quic-recovery-19
o Change kPersistentThreshold from an exponent to a multiplier
(#2557)
o Send a PING if the PTO timer fires and there's nothing to send o Send a PING if the PTO timer fires and there's nothing to send
(#2624) (#2624)
o Set loss delay to at least kGranularity (#2617) o Set loss delay to at least kGranularity (#2617)
o Merge application limited and sending after idle sections. Always o Merge application limited and sending after idle sections. Always
limit burst size instead of requiring resetting CWND to initial limit burst size instead of requiring resetting CWND to initial
CWND after idle (#2605) CWND after idle (#2605)
skipping to change at page 37, line 10 skipping to change at page 38, line 22
packet is ack-eliciting but the largest_acked is not (#2592) packet is ack-eliciting but the largest_acked is not (#2592)
o Don't arm the handshake timer if there is no handshake data o Don't arm the handshake timer if there is no handshake data
(#2590) (#2590)
o Clarify that the time threshold loss alarm takes precedence over o Clarify that the time threshold loss alarm takes precedence over
the crypto handshake timer (#2590, #2620) the crypto handshake timer (#2590, #2620)
o Change initial RTT to 500ms to align with RFC6298 (#2184) o Change initial RTT to 500ms to align with RFC6298 (#2184)
C.2. Since draft-ietf-quic-recovery-18 C.3. Since draft-ietf-quic-recovery-18
o Change IW byte limit to 14720 from 14600 (#2494) o Change IW byte limit to 14720 from 14600 (#2494)
o Update PTO calculation to match RFC6298 (#2480, #2489, #2490) o Update PTO calculation to match RFC6298 (#2480, #2489, #2490)
o Improve loss detection's description of multiple packet number o Improve loss detection's description of multiple packet number
spaces and pseudocode (#2485, #2451, #2417) spaces and pseudocode (#2485, #2451, #2417)
o Declare persistent congestion even if non-probe packets are sent o Declare persistent congestion even if non-probe packets are sent
and don't make persistent congestion more aggressive than RTO and don't make persistent congestion more aggressive than RTO
verified was (#2365, #2244) verified was (#2365, #2244)
o Move pseudocode to the appendices (#2408) o Move pseudocode to the appendices (#2408)
o What to send on multiple PTOs (#2380) o What to send on multiple PTOs (#2380)
C.3. Since draft-ietf-quic-recovery-17 C.4. Since draft-ietf-quic-recovery-17
o After Probe Timeout discard in-flight packets or send another o After Probe Timeout discard in-flight packets or send another
(#2212, #1965) (#2212, #1965)
o Endpoints discard initial keys as soon as handshake keys are o Endpoints discard initial keys as soon as handshake keys are
available (#1951, #2045) available (#1951, #2045)
o 0-RTT state is discarded when 0-RTT is rejected (#2300) o 0-RTT state is discarded when 0-RTT is rejected (#2300)
o Loss detection timer is cancelled when ack-eliciting frames are in o Loss detection timer is cancelled when ack-eliciting frames are in
skipping to change at page 38, line 5 skipping to change at page 39, line 15
controller (#2138, 2187) controller (#2138, 2187)
o Process ECN counts before marking packets lost (#2142) o Process ECN counts before marking packets lost (#2142)
o Mark packets lost before resetting crypto_count and pto_count o Mark packets lost before resetting crypto_count and pto_count
(#2208, #2209) (#2208, #2209)
o Congestion and loss recovery state are discarded when keys are o Congestion and loss recovery state are discarded when keys are
discarded (#2327) discarded (#2327)
C.4. Since draft-ietf-quic-recovery-16 C.5. Since draft-ietf-quic-recovery-16
o Unify TLP and RTO into a single PTO; eliminate min RTO, min TLP o Unify TLP and RTO into a single PTO; eliminate min RTO, min TLP
and min crypto timeouts; eliminate timeout validation (#2114, and min crypto timeouts; eliminate timeout validation (#2114,
#2166, #2168, #1017) #2166, #2168, #1017)
o Redefine how congestion avoidance in terms of when the period o Redefine how congestion avoidance in terms of when the period
starts (#1928, #1930) starts (#1928, #1930)
o Document what needs to be tracked for packets that are in flight o Document what needs to be tracked for packets that are in flight
(#765, #1724, #1939) (#765, #1724, #1939)
skipping to change at page 38, line 36 skipping to change at page 39, line 46
o Limit ack_delay by max_ack_delay (#2060, #2099) o Limit ack_delay by max_ack_delay (#2060, #2099)
o Initial keys are discarded once Handshake are avaialble (#1951, o Initial keys are discarded once Handshake are avaialble (#1951,
#2045) #2045)
o Reorder ECN and loss detection in pseudocode (#2142) o Reorder ECN and loss detection in pseudocode (#2142)
o Only cancel loss detection timer if ack-eliciting packets are in o Only cancel loss detection timer if ack-eliciting packets are in
flight (#2093, #2117) flight (#2093, #2117)
C.5. Since draft-ietf-quic-recovery-14 C.6. Since draft-ietf-quic-recovery-14
o Used max_ack_delay from transport params (#1796, #1782) o Used max_ack_delay from transport params (#1796, #1782)
o Merge ACK and ACK_ECN (#1783) o Merge ACK and ACK_ECN (#1783)
C.6. Since draft-ietf-quic-recovery-13 C.7. Since draft-ietf-quic-recovery-13
o Corrected the lack of ssthresh reduction in CongestionEvent o Corrected the lack of ssthresh reduction in CongestionEvent
pseudocode (#1598) pseudocode (#1598)
o Considerations for ECN spoofing (#1426, #1626) o Considerations for ECN spoofing (#1426, #1626)
o Clarifications for PADDING and congestion control (#837, #838, o Clarifications for PADDING and congestion control (#837, #838,
#1517, #1531, #1540) #1517, #1531, #1540)
o Reduce early retransmission timer to RTT/8 (#945, #1581) o Reduce early retransmission timer to RTT/8 (#945, #1581)
skipping to change at page 39, line 4 skipping to change at page 40, line 16
o Corrected the lack of ssthresh reduction in CongestionEvent o Corrected the lack of ssthresh reduction in CongestionEvent
pseudocode (#1598) pseudocode (#1598)
o Considerations for ECN spoofing (#1426, #1626) o Considerations for ECN spoofing (#1426, #1626)
o Clarifications for PADDING and congestion control (#837, #838, o Clarifications for PADDING and congestion control (#837, #838,
#1517, #1531, #1540) #1517, #1531, #1540)
o Reduce early retransmission timer to RTT/8 (#945, #1581) o Reduce early retransmission timer to RTT/8 (#945, #1581)
o Packets are declared lost after an RTO is verified (#935, #1582) o Packets are declared lost after an RTO is verified (#935, #1582)
C.7. Since draft-ietf-quic-recovery-12 C.8. Since draft-ietf-quic-recovery-12
o Changes to manage separate packet number spaces and encryption o Changes to manage separate packet number spaces and encryption
levels (#1190, #1242, #1413, #1450) levels (#1190, #1242, #1413, #1450)
o Added ECN feedback mechanisms and handling; new ACK_ECN frame o Added ECN feedback mechanisms and handling; new ACK_ECN frame
(#804, #805, #1372) (#804, #805, #1372)
C.8. Since draft-ietf-quic-recovery-11 C.9. Since draft-ietf-quic-recovery-11
No significant changes. No significant changes.
C.9. Since draft-ietf-quic-recovery-10 C.10. Since draft-ietf-quic-recovery-10
o Improved text on ack generation (#1139, #1159) o Improved text on ack generation (#1139, #1159)
o Make references to TCP recovery mechanisms informational (#1195) o Make references to TCP recovery mechanisms informational (#1195)
o Define time_of_last_sent_handshake_packet (#1171) o Define time_of_last_sent_handshake_packet (#1171)
o Added signal from TLS the data it includes needs to be sent in a o Added signal from TLS the data it includes needs to be sent in a
Retry packet (#1061, #1199) Retry packet (#1061, #1199)
o Minimum RTT (min_rtt) is initialized with an infinite value o Minimum RTT (min_rtt) is initialized with an infinite value
(#1169) (#1169)
C.10. Since draft-ietf-quic-recovery-09 C.11. Since draft-ietf-quic-recovery-09
No significant changes. No significant changes.
C.11. Since draft-ietf-quic-recovery-08 C.12. Since draft-ietf-quic-recovery-08
o Clarified pacing and RTO (#967, #977) o Clarified pacing and RTO (#967, #977)
C.12. Since draft-ietf-quic-recovery-07 C.13. Since draft-ietf-quic-recovery-07
o Include Ack Delay in RTO(and TLP) computations (#981) o Include Ack Delay in RTO(and TLP) computations (#981)
o Ack Delay in SRTT computation (#961) o Ack Delay in SRTT computation (#961)
o Default RTT and Slow Start (#590) o Default RTT and Slow Start (#590)
o Many editorial fixes. o Many editorial fixes.
C.13. Since draft-ietf-quic-recovery-06 C.14. Since draft-ietf-quic-recovery-06
No significant changes. No significant changes.
C.14. Since draft-ietf-quic-recovery-05 C.15. Since draft-ietf-quic-recovery-05
o Add more congestion control text (#776) o Add more congestion control text (#776)
C.15. Since draft-ietf-quic-recovery-04 C.16. Since draft-ietf-quic-recovery-04
No significant changes. No significant changes.
C.16. Since draft-ietf-quic-recovery-03 C.17. Since draft-ietf-quic-recovery-03
No significant changes. No significant changes.
C.17. Since draft-ietf-quic-recovery-02 C.18. Since draft-ietf-quic-recovery-02
o Integrate F-RTO (#544, #409) o Integrate F-RTO (#544, #409)
o Add congestion control (#545, #395) o Add congestion control (#545, #395)
o Require connection abort if a skipped packet was acknowledged o Require connection abort if a skipped packet was acknowledged
(#415) (#415)
o Simplify RTO calculations (#142, #417) o Simplify RTO calculations (#142, #417)
C.18. Since draft-ietf-quic-recovery-01 C.19. Since draft-ietf-quic-recovery-01
o Overview added to loss detection o Overview added to loss detection
o Changes initial default RTT to 100ms o Changes initial default RTT to 100ms
o Added time-based loss detection and fixes early retransmit o Added time-based loss detection and fixes early retransmit
o Clarified loss recovery for handshake packets o Clarified loss recovery for handshake packets
o Fixed references and made TCP references informative o Fixed references and made TCP references informative
C.19. Since draft-ietf-quic-recovery-00 C.20. Since draft-ietf-quic-recovery-00
o Improved description of constants and ACK behavior o Improved description of constants and ACK behavior
C.20. Since draft-iyengar-quic-loss-recovery-01 C.21. Since draft-iyengar-quic-loss-recovery-01
o Adopted as base for draft-ietf-quic-recovery o Adopted as base for draft-ietf-quic-recovery
o Updated authors/editors list o Updated authors/editors list
o Added table of contents o Added table of contents
Acknowledgments Acknowledgments
Authors' Addresses Authors' Addresses
Jana Iyengar (editor) Jana Iyengar (editor)
Fastly Fastly
Email: jri.ietf@gmail.com Email: jri.ietf@gmail.com
 End of changes. 73 change blocks. 
194 lines changed or deleted 237 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/