Split resume session relays#1399
Conversation
Coverage Report for CI Build 27162641746Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage increased (+0.01%) to 85.369%Details
Uncovered Changes
Coverage Regressions504 previously-covered lines in 17 files lost coverage.
Coverage Stats
💛 - Coveralls |
arminsabouri
left a comment
There was a problem hiding this comment.
CAck cf8b0b7
big pickle? Cool.
Should we / do we already ensure that there is > 2 ohttp relays ?
|
This PR also includes the commit from #1390, is it meant to supersede that PR? I left a comment there that I think we should address holistically (i.e. is ohttp_keys "cache" config field relevant at all since we refetch every time anyways?) |
|
I should add blocked as this simply builds on top of #1390 This PR would have simply needed to also add the commit from #1390 as the ohttp_keys have been "cached" by that point so relay selection is skipped and resume would just choose |
cf8b0b7 to
5062488
Compare
bc1cindy
left a comment
There was a problem hiding this comment.
tACK 5062488
minimal fix, each resumed session gets a fresh RelayManager, isolating relay state between sessions
but I think removing state from RelayManager and moving randomization into the struct as a method could simplify the design further, as commented in #1385 (comment)
|
Reviewed the fix, giving each resumed session a fresh
|
| let mut app = self_clone; | ||
| app.relay_manager = Arc::new(Mutex::new(RelayManager::new())); |
There was a problem hiding this comment.
This seems a bit off to me. Can add a getter to god app struct which randomizes which relay is use instead of cloning the app and mutating?
Maybe as simple as:
pub(crate) fn relay_manager (&self) -> RelayManager {
RelayManager::new()
}
Thanks for resurfacing that. Would a warning string be enough? Eitherway would like to get this PR through. This can be follow up work |
5062488 to
582259e
Compare
| &self, | ||
| sender: Sender<WithReplyKey>, | ||
| persister: &SenderPersister, | ||
| relay_manager: &mut RelayManager, |
There was a problem hiding this comment.
Why do we need to pass a mut reference if we can get a relay manager from self.relay_manager() ?
Not requesting a change, just a question
There was a problem hiding this comment.
No you're right the current behavior would I think just return back to caching the relay when it finds one that works during any long polling No actually I think I am getting mixed up, I'm jumping ahead towards per-request logic. I don't think the logic fundamentally changes though I think its unnecessary to bubble it up to these calls instead of right before unwrap_ohttp_keys_or_else_fetch
There was a problem hiding this comment.
@benalleng per your last comment. Are you still making this change?
There was a problem hiding this comment.
I think I misspoke earlier. the failures I was having the other day were due to the RelayManager::new() not being given a default to work without any context of the unwrap_ohttp_keys_or_else_fetch level the RelayManager state was falling back to the payjo.in default thus causing a failure in our nix checks
There was a problem hiding this comment.
Final followup, the mut reference is needed here as long as we are still using the same relay per session. This can be removed but really only once we move to per-request relay selection. Though I want to move towards that I think this is not the PR to do it yet.
d06ba2e to
a9dfc78
Compare
4931cac to
b86ecd4
Compare
b86ecd4 to
db9a904
Compare
| if relays.len() < 2 { | ||
| tracing::warn!( | ||
| "Only one OHTTP relay configured. Add more ohttp_relays to improve privacy." | ||
| ); | ||
| } |
There was a problem hiding this comment.
This belongs in config validation, not in fetch_ohttp_keys - otherwise it re-emits on every call (multiple sessions, retries).
db9a904 to
07dd4da
Compare
This commit ensures that each resumed payjoin-cli session is using a separate instance of the RelayManager to then check the ohttp connection independently. This fixes a bug where resuming would converge all existing sessions to one ohttp relay.
07dd4da to
36e1f6b
Compare
|
tACK 36e1f6b Tested on this branch via [v2]
pj_directory = "https://payjo.in"
ohttp_relays = [
"https://pj.benalleng.com",
"https://pj.bobspacebkk.com",
"https://payjoin.achow101.com",
]To isolate relay selection I locally patched the code to skip the bitcoind connection at startup and skip creating a receiver session in the payjoin directory, plus added a Works as expected: -> % RUST_LOG=debug payjoin-cli receive 1000
Selected OHTTP relay: https://pj.bobspacebkk.com/
DEBUG reqwest::connect: proxy(https://pj.bobspacebkk.com/) intercepts 'https://payjo.in/'
-> % RUST_LOG=debug payjoin-cli receive 1000
Selected OHTTP relay: https://pj.benalleng.com/
DEBUG reqwest::connect: proxy(https://pj.benalleng.com/) intercepts 'https://payjo.in/'Side note - fetching OHTTP keys from the achow101 relay errors out for some reason. Unrelated to this PR, opened #1614. |
|
Still in the process of reviewing, but maybe additional relays should be added so we don't have a bunch of warnings about only one relay being configured when running the see warnings |
There was a problem hiding this comment.
Sorry to completely derail this PR, but if the ultimate intention is to randomize the relay chosen for each request I'm not understanding the additional utility of this PR.
Additionally it seems much simpler to just update the RelayManger implementation to something like below, which only allows the use of choose_relay which always picks a random relay and would only touch a couple places in the existing code base that would need to be updated.
Is there a reason to store the currently selected relay when we always want it to be random?
#[derive(Debug, Clone)]
pub struct RelayManager {
config: Config,
failed_relays: Vec<Url>,
}
impl RelayManager {
pub fn new(config: Config) -> Self { RelayManager { config, failed_relays: Vec::new() } }
pub fn add_failed_relay(&mut self, relay: Url) { self.failed_relays.push(relay); }
pub fn choose_relay(&self) -> Result<Url> {
use payjoin::bitcoin::secp256k1::rand::prelude::SliceRandom;
let relays = self.config.v2()?.ohttp_relays.clone();
let remaining_relays: Vec<_> =
relays.iter().filter(|r| !self.failed_relays.contains(r)).cloned().collect();
if remaining_relays.is_empty() {
return Err(anyhow!("No valid relays available"));
}
match remaining_relays.choose(&mut payjoin::bitcoin::key::rand::thread_rng()) {
Some(relay) => Ok(relay.clone()),
None => Err(anyhow!("Failed to select from remaining relays")),
}
}
}| let ohttp_relay = self | ||
| .unwrap_relay_or_else_fetch(Some(&session.pj_uri().extras.endpoint()), relay_manager) | ||
| .await?; |
There was a problem hiding this comment.
Unrelated to this PR, but shouldn't this relay selection be inside the loop so that a different relay can be used for each poll?
There was a problem hiding this comment.
Yes, though with a lot of the discussion from #919 I do want to be sure that this is not going to be immediately overwritten or cause some additional complexity in the refactor so I am trying to keep these PRs as minimal as possible before we transition to the AS-aware relay selection.
|
Responding to #1399 (review) it was my attempt to keep this PR as minimal and focused on the #1391 issue at hand instead of creating a full payjoin-cli refactor here. Immediately after merging this is incoming https://github.com/va-an/rust-payjoin/tree/refactor/relay-manager which will merge all the functions relevant to RelayManager into methods. After which I think it is perhaps appropriate to address better randomization. |
|
Superseded by #1628 |
Closes #1391
This commit ensures that each resumed payjoin-cli session is using a
separate instance of the RelayManager to then check the ohttp connection
independently. This fixes a bug where resuming would converge all
existing sessions to one ohttp relay.
Also, the incompatible_msrv allow seems not needed anymore?
Big pickle wrote most of this code. The free open weight models aren't too shabby now either. Though it did want to recreate the entire config instead of just making the app mutable. I thought this seemed a bit easier to parse as its clear everything else is the same.
Pull Request Checklist
Please confirm the following before requesting review:
AI
in the body of this PR.