Skip to content

Add more control over loadgen/latencies#394

Open
drebelsky wants to merge 3 commits into
stellar:mainfrom
drebelsky:configure-delay
Open

Add more control over loadgen/latencies#394
drebelsky wants to merge 3 commits into
stellar:mainfrom
drebelsky:configure-delay

Conversation

@drebelsky

@drebelsky drebelsky commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

To support experimentation with measuring tx e2e latency, this PR does the following

  • Adds a new pubnet data format that
    • Specifies per-edge latency (so different latency models can be tested by pre-processing the graph differently, instead of requiring a supercluster change)
    • Specifies which nodes generate load (for MaxTPS and MinBlockTime tests)
  • Allows configuring PEER_AUTHENTICATION_TIMEOUT (so we can simulate large network delays)
  • Gates the metrics introduced in OverlayV2: Add metrics for tracking tx e2e latency stellar-core#5330 behind a flag, so we only pay the cost for nodes we are using to generate load

@drebelsky

Copy link
Copy Markdown
Contributor Author

I think using the delays from the network survey is probably better than using the artificial latency calculated from geo locations. Pre this PR, for nodes we don't have the location for, we just assign them one of the locations from nodes we do know about, which may be overly optimistic. The network survey RTTs do include processing delay, but if the processing delay is substantial enough to shift the RTT, it might also be worth simulating.

@marta-lokhova

Copy link
Copy Markdown
Contributor

The network survey RTTs do include processing delay, but if the processing delay is substantial enough to shift the RTT, it might also be worth simulating.

We can't do this. The simulation adds its own processing delay, so you'd end up double counting latency. We should use geolocations, because going forward we'll also want to experiments with manually assigned geos to measure impact on latency.

This feels a bit like going down a rabbit hole. Stepping back a bit, can we please start with a one-pager proposal on what the e2e setup is? Specifically, we need clarity on:

  • How's e2e metric in core calculated
  • What topology is selected (with parameters like total nodes, average tier1 distance and fanout, etc)
  • Which nodes generate load (and hop count to Tier1)
  • Geolocations of all nodes, and how delay is computed
  • Some sort of test harness measuring e2e latency with a realistic setup, and outputting an actionable result.

Let me know if we need to hash out the requirements more.

@drebelsky

Copy link
Copy Markdown
Contributor Author

TL;DR

How's e2e metric in core calculated

Time from HerderImpl::recvTransaction to HerderImpl::processExternalized (still open to discussion: should it be time to when meta would be generated, instead?)

What topology is selected (with parameters like total nodes, average tier1 distance and fanout, etc)

For the baseline measurement, a trimmed form of the most recent pubnet survey. (particular parameters not listed here since the survey is semi-sensitive)

Which nodes generate load (and hop count to Tier1)

For the baseline measurement, the nodes corresponding to Horizon instances. Open question as to whether there are other nodes we should pick/how.

Geolocations of all nodes, and how delay is computed

For the baseline measurement, we can sidestep the issue by using some modified form of the network survey delays. Otherwise, we can use a topology that only includes the nodes we have geo data for and use the same existing delay model. It's probably worth checking both delay models to see how much they differ.

Some sort of test harness measuring e2e latency with a realistic setup, and outputting an actionable result.

As an initial pass, we can just do a manual max tps and/or min block run and look at the metric. Open question: is this sufficient, or should we, e.g., have min block report the P50/P75/P99/etc... latencies?

Proposal for e2e latency measurement

At a high level, I would like to set up supercluster/core so that we can easily tune the realism of the simulation of the e2e latency without having to change supercluster/core, just the graph file. In particular, I'd like to have the pubnet graph file include

  • Whether the node is a tier1 (already present)
  • Whether the node is a load generator (not currently present)
  • The per-edge latency (not currently present)
    • Putting this in the graph file allows us to easily switch between different latency models. We can have a simple (not part of supercluster) script that takes in a graph file and assigns the delays based on the current geo location calculation, but we can also change the preprocessing, so e.g., we categorize the sort of path to get a better estimate of the RTT. Consider this comment from the current latency model

      // Empirical slowdown is surprisingly variable: some paths (even
      // long-distance ones) run at nearly 80% of the ideal speed of light in
      // fibre; others barely 20%. We consider 50% or a 2x slowdown as "vaguely
      // normal" with significant outliers; unfortunately to get much better you
      // really need to know the path (eg. trans-atlantic happens to be very fast,
      // but classifying a path as trans-atlantic vs. non is .. complicated!)
      

Core measurement

It's a little artificial, but it is easiest to measure same-node e2e latency. It's pretty simple to have the nodes that do load generation have a metric that measures the time from HerderImpl::recvTransaction (which would be called by the /tx endpoint and is called by load generation) to HerderImpl::processExternalized for each transaction. This is a little bit of an undercount, so it might be worth switching from processExternalized to when the meta for a transaction is (/would be) generated.

Realistic baseline

To get the baseline number, using a topology based on the network survey seems reasonable to me.

Tier1 Nodes

We can just set the tier1 nodes based on the current tier1.

Load Generators

Unfortunately, as far as I can tell, neither RPC nor Horizon list the public keys for the stellar-core nodes they use. A brief search also didn't show any of the Horizon/RPC providers listing their core public keys. So, as a first pass, we can identify the SDF horizon nodes (by looking at the logs and matching the public keys to the graph file) and use those as load generators. We could add a few more load generators by identifying nodes that are connected in a similar way.

Latency Model

I think the most realistic number would come from using the survey latencies. It's true that the survey latencies include processing delay, but I think it makes sense to reflect this in the simulation: if a node was slow (e.g., because it was underpowered), then having the simulated latency be long makes sense. It's true that there will be some added latency from the simulation itself, but this also happens with geo locations, and we can subtract from the per-edge latencies the average if necessary1.

I believe the ping time is based on messages that are at a lower priority than SCP, so the delay may be a little artificially bigger. However, I do think these numbers are closer to representing the true network baseline than the geolocation-based numbers. We also get the benefit that we don't have to assign a fake geolocation to each node.

It's probably worth comparing this latency model to the geolocation-based one (we can prune the graph to only include nodes that we have the geolocations for); having the per-edge latencies allows us to swap between the models by just preprocessing the graph differently.

Pruning

The recent survey was much bigger than previous surveys, so we can do the same first pass tuning that we've done in internal. We can also prune to only include survey respondents (and optionally prune edges/nodes with delays that are excessively large (> 1000ms)).

Experiments

Having the realistic baseline will let us evaluate various models for latency (e.g., we can compare to the current geo loc 2x slowdown model). For non-topology dissemination changes, we can either re-use the same graph as for the baseline or (especially for small tests) we can prune it further (e.g., include only the tier1 nodes and the nodes along the shortest path(s) between the load generators and tier1). For topology changes, we can do a before/after that doesn't (necessarily) use the survey graph: having the graph specify the latencies, tier1s, and load generators allows easy ad-hoc changes to the modeling assumptions (e.g., for latency).

Test Harness

To get a baseline, just running max tps/min block time at some fixed value and looking at a dashboard seems reasonable.

Min Block Time

To make this more general, we can add it to the results that min block time prints. I don't think it makes sense to use it as one of the criteria for min block time (since the latency may not be monotonically increasing with block time). Doing it this way (one number at the end instead of looking at a dashboard) may change how we want to store the e2e latency, though. The current core change just uses a Medida timer, but since the histogram percentiles are reset every 30 seconds, it's harder to get a sense of the overall P50/75/99/etc.

Footnotes

  1. In the actual survey, e.g., the SDF <-> SDF latencies are only 1 ms, so the processing delay in simulation may be small enough that we don't really have to worry about it. It's probably worth doing a small test without latency installed to get a sense for the simulation processing delay.

@marta-lokhova

Copy link
Copy Markdown
Contributor

Time from HerderImpl::recvTransaction to HerderImpl::processExternalized (still open to discussion: should it be time to when meta would be generated, instead?)

Full e2e until data is available to downstream clients, like PRC. So this means ledger is applied and meta is emitted. I think this also means we should start measuring from /tx endpoint, which is what RPC interacts with.

It's true that there will be some added latency from the simulation itself, but this also happens with geo locations

Are you sure? With geolocations, we compute networking latency only. Then ping RTT includes that synthetic latency plus time inside core (scheduling, response time, etc) I don't see how this is the same. By using survey latency, we'll be double-counting processing time, so you'll get overly pessimistic results (I also don't think we need to replicate super slow nodes that would just further skew the simulation).

Unfortunately, as far as I can tell, neither RPC nor Horizon list the public keys for the stellar-core nodes they use. A brief search also didn't show any of the Horizon/RPC providers listing their core public keys. So, as a first pass, we can identify the SDF horizon nodes (by looking at the logs and matching the public keys to the graph file) and use those as load generators. We could add a few more load generators by identifying nodes that are connected in a similar way.

Watchers don't have public keys (they're rotated randomly on restart), so I think this will be tricky. What we can do is add a parameter "txSubHopCount" that assigns loadgen nodes depending on how far they are from Tier1. This will also tell us how much hop count impacts e2e latency.

The recent survey was much bigger than previous surveys, so we can do the same first pass tuning that we've done in internal. We can also prune to only include survey respondents (and optionally prune edges/nodes with delays that are excessively large (> 1000ms)).

I recommend minimal amount of pruning, so we can actually end up with a realistic view of the network. So removing suspected "fake nodes" on the outer rim should be good enough.

@drebelsky drebelsky changed the title DRAFT/Experimental: more realistic pubnet modeling Add more control over loadgen/latencies Jun 25, 2026
@drebelsky drebelsky marked this pull request as ready for review June 26, 2026 00:50
Copilot AI review requested due to automatic review settings June 26, 2026 00:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends supercluster’s pubnet simulation and load-generation controls to better support experimentation with end-to-end transaction latency, including a new pubnet data format that can carry per-edge one-way delays and explicit “generates load” node selection, plus new mission/config knobs to simulate large network delays and gate loadgen latency metrics to only the relevant nodes.

Changes:

  • Add a new --pubnet-data-delay mode that loads per-edge one-way delays and per-node generatesLoad from pubnet data, and plumbs edgeDelays / generatesLoad through CoreSetOptions.
  • Add mission/config support for PEER_AUTHENTICATION_TIMEOUT and gated LOADGEN_MEASURE_TX_LATENCY_FOR_TESTING.
  • Update MaxTPS/MinBlockTime to optionally select load-generator nodes explicitly via generatesLoad.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/FSLibrary/StellarNetworkDelays.fs Refactors delay command generation to accept precomputed per-peer delays; adds pubnet-delay-mode script path.
src/FSLibrary/StellarNetworkData.fs Introduces delay-format pubnet JSON parsing, edge-delay extraction/validation, and propagates edgeDelays + generatesLoad into CoreSet options.
src/FSLibrary/StellarMissionContext.fs Adds mission context fields for delay-format pubnet data, e2e latency flag, and peer auth timeout.
src/FSLibrary/StellarCoreSet.fs Extends CoreSetOptions with edgeDelays and generatesLoad.
src/FSLibrary/StellarCoreCfg.fs Adds gating for loadgen e2e latency metrics and optional PEER_AUTHENTICATION_TIMEOUT.
src/FSLibrary/MinBlockTimeTest.fs Uses explicit generatesLoad selection when present, otherwise preserves old behavior.
src/FSLibrary/MaxTPSTest.fs Uses explicit generatesLoad selection when present, otherwise preserves old behavior.
src/FSLibrary/json-type-samples/sample-network-data-delay.json Adds a sample delay-format pubnet data file for type inference.
src/FSLibrary.Tests/Tests.fs Updates test MissionContext defaults and adjusts tc-command test for new delay API.
src/App/Program.fs Adds CLI options and validation for delay-format pubnet data, e2e metrics flag, and peer auth timeout.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


let getNetworkDelayCommands (loc1: GeoLoc) (locsAndNames: (GeoLoc * PeerDnsName) array) (delay: int option) : ShCmd =
let getPeerDelays (loc1: GeoLoc) (locsAndNames: (GeoLoc * PeerDnsName) array) : (int * PeerDnsName) array =
// Get the one way delays from loc1 to the locationss in locsAndNames
Comment on lines +366 to +371
| None ->
if self.missionContext.flatNetworkDelay.IsNone then
failwith
"Failed to construct network delay script: no preferred peers map or flat network delay"
else
[||]
Comment on lines +412 to +417
let (allPubnetNodes: PubnetNode array, edgeDelays: Map<string * string, int> option) =
if context.pubnetDataDelay then
if newNodes.Length > 0 then
failwith "--pubnet-data-delay cannot be used with --tier1-orgs-to-add or --non-tier1-nodes-to-add"

let nodes = PubnetNodeDelayJSON.Load context.pubnetData.Value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants