Grow Status PagesGlobal Routing Operations (Active WG)
Ops Area: Robert Wilton, Warren Kumari | 2003-May-02 —Chairs:
IETF-110 grow minutes
Session 2021-03-08 1300-1500: Room 3 - grow chatroom
GROW Starting 1hr late due to a scheduling Conflict. 14:00 CET instead of 13:00 CET. Job Snijders kicks off at 14:00 CET, due to a scheduling conflict. * Network Telemetry with BMP && YANG - Thomas Graf * https://datatracker.ietf.org/meeting/110/materials/slides-110-grow-network-telemetry-with-bmpyang-hackathon-results-00 * BMP Seamless Session (draft-tppy-bmp-seamless-session-01) * https://datatracker.ietf.org/meeting/110/materials/slides-110-grow-draft-tppy-bmp-seamless-session-00-00 * draft-ietf-grow-as-path-prepending-03 - Michael McBride * https://datatracker.ietf.org/meeting/110/materials/slides-110-grow-draft-ietf-grow-as-path-prepending-03-00 Charter update conversation: * Refresh produced * Traffic on the highway was heavy :( * Work still required to word-wrangle for wider audiences * 'ongoing process may take a few more months' Thomas Graf - BMP/YANG Hackerz * Focus on performance, not functionality * loc-rib performance and cpu impact * verification that bmp / rib match closely * measure bgp propogation delay in transit routers * YANG slide (missed context) * Hackathon network topology slide * Automated test setup for bgp/bmp/collection of metrics + cpu/memory usage * Can now measure bgp/bmp prefix drops && cpu/memory * current focus appears to be via exaBGP * PMACCT collector integrated with new features from the hackathon * throughput measurements included in slides * Huawei VRP - manages bmp/bgp/yang without issue in test setup * Improving timestamp accuracy/inclusion in next hackathon * Test results included in slides as well * bmp flapping * bgp prefix floods * Lessons learned on slide 14 * Questions: * jeff haas - bmp / bgp convergence questions about test infra * Looks like the BGP is preferred on the CPU over BMP * jeff haas - timestamp info * Junos has second level accuracy, with increment for events in the same second * Notes about different vendors have different ideas about timestamps... Thomas Graf - BMP Seamless Session * Overview about Swisscom telemetry/data-collection * notes about scale of rates of collection * notes about BMP behavior not being smooth, rather quite spikey. * 600k peak, 8k avg * Problems with this scale * data duplication concerns * with more and more metrics, the amount of information collected is quite high * Current BMP re-pushes full RIB on establish/re-establishment * Proposal to use tcp-fast-open and slow close bmp waiting for TCP to re-establish * configurable timeout * clients must buffer messages for timeout time (or buffer full) * configure buffer as well * Maintain BMP as unidirectional collection protocol * existing extensions for fast-open * buffer exists already * Comments/questions to the list please * Question: Job - continue work please, the impact of flapping is painful * there are many other collection/instrumentation systems which have similar problems/solutions * Question: Lars Prehn - What does the current buffer cover ? how much change required? * buffer can't be forever, of course, so provide a method to set max buffer and fail if full. Questions to Implementers about TCP Fast-Open usage today? * Jeff Haas - not sure about support for fast-open today in Junos * comment to Thomas about 'most of this state is already in the kernel to manage' * this shouldn't cost more, really... (off to read RFC for jeff) * Hiding the state in the application is the key, if you can keep the state change to just the kernel you won't incur impact. * Job - proposes that the proposal is to propose using existing opensource tooling * linux kernel already has fast-open support * applications can already store this state properly(?) * obviously interactions with graceful restart may happen, so testing required(tm) * Jeff Haas - "graceful restart should not be an issue...phew!" Michael McBride - ASPath Prepending Update * draft update from previous IETF (on -03 version as of today) * background: presentations about prepending from various operator fora * Requests at said fora asked for 'bcp for prepending?' * draft aims to cover said BCP requirement * coverage of use-cases include * primary/backup isp * path capacity signaling * no 1997 support from isp * ... * Lists problems as well, as examples and how to use prepending to shortcut the problems * Considerations covered in the draft include impacts on peers (memory/cpu/etc) * Alternatives to prepend as well covered in draft (usage of alternate BGP attributes) * Best Practices included in the draft now as well * (please read draft) * Request for Comments about the draft/etc * Comment Jeff Haas - communities are only 1 as hop useful... * Jeff Haas comment on comment - Using communities doesn't always work, if the adjacent ISP isn't helping you must use a different tool (prepending, origin)... and prepending really is the best option over the last long period of time. * Comment - Ben Maddison - "clarification of 'scope of communities'" - Some ASN will scrub very carefully communities, some customers MAY send 'alternate transit' communities, and those may pass through. Immediate upstream MAY NOT scrub ALL communities, possibly worth adding to text? * Points out that localpref to support customer over all-else should be strongly worded * Comment - Job Snjiders - comments about audiences of such drafts - not all folk read drafts/rfcs/BCPs. * Comment - Ben Maddison - "when you run up against a brick wall, you should probably stop banging your head on it" * Comment - Lars Prehn - Explain how 5 is correct for as-path avg length? * Michael McBride - Generally speaking, prepending more than 5 seems unlikely to help based upon public data sources which shows 5 as the average. * Job Snjiders - perhaps do not prepend more times than the count of peers you have? * Comment - Alexander Azimov - Idea is to limit propogation, generally, of a leaked route. Prepending longer than 2-3 times does not help this cause. (missed the brunt of next comment) Many layers of localpref can be complex, and may not be as helpful to networks as they believe. * Comment - Randy Bush - if there is not a clear/simple algorithm for how/whom to prepend, it's hard to see why we should discuss this. "Get to the point" meaning, please define the algorithm clearly and distinctly. * job snjiders - Agree, lots of discussion without clear algorithm is not terrific * Comment - Ben Maddison - In a perfect world a magic number would be great - Perhaps there is not a one-size-fits-all ? The perspective of the viewer matters a bunch... Is the '5 is enough(in 2021)' acceptable? * Side comment about origin-code ... this is silly and not actually helpful. If people started using Origin we would all just reset that value. * Comment - Jared Mauch - Agrees with Randy/Ben - Internet changes shape, and a rule today is not effective tomorrow (perhaps). There are a wide variety of provider capabilities available (major networks do not offer steering communities even in 2021). Some folk also limit as-path max to accept based on vendor faults and protections for individual network(s). * it seems hard to make a long term guess here... for GROW. * Comment - Randy Bush - Question/Comment to Ben: "if the 'where is waldo' is part of the algorithm, that's also fine". Pushing for clear and concise guidance, because that SHOULD be possible to do still. * Comment - Jeff Haas - Worth pointing out that the draft MAY also want to cover prior art bugs... Specifically as-path prepending has caused real problems in the real world :( * possibly origin code problems (says Michael) * job snjiders - the comment that drove origin MAY be that "you may have other knobs" * Summary from Job - 'Effort may be best spent on finding the algorithm' End of time Has Arrived!