[DNM][Test CEDA NGINX] Reduce maxthreads#335
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #335 +/- ##
=======================================
Coverage 93.13% 93.13%
=======================================
Files 7 7
Lines 685 685
=======================================
Hits 638 638
Misses 47 47 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I think we have to start being more clear about what is stressing what. PyActiveStorage threads are one thing, but from nginx storage point of view, all they are seeing is standard range-gets from a client ... albeit a lot at a time ... we need to make that clear to the storage folks: while the ultimate client might be pyactivestorage, what they are seeing (and where their load balancing is failing) is on legitimate range-gets (from reductionist), and this could well replicate what ESGF-NG life looks like - with or without reductionist. |
Mapping the CEDA NGINX threaded concurrency issue
See GHA https://github.com/NCAS-CMS/PyActiveStorage/actions/runs/26487146543 (see right hand tab that records reruns: click Latest batch 4 to open the list of previous (re-)runs): full run 10 jobs running GETS on the same files on NGINX store, 100 max threads/requests/file (this is, of course, theoretical, we only request 100 threads, in practice, it's a fair fewer that get sent):
100 maxthreads
(10-13min for
pytest -n 2 tests/test_real_https.py)Reducing maxthreads to 10
(10-13min for
pytest -n 2 tests/test_real_https.py)Upping maxthreads to 20
(8-11min for
pytest -n 2 tests/test_real_https.py)Upping maxthreads to 40
(8-9 min for
pytest -n 2 tests/test_real_https.py)Upping maxthreads to 60
(9-11 min for
pytest -n 2 tests/test_real_https.py)The CEDA NGINX storage is starting to show signs of failure around 400 odd concurrent GETs per file, whereas the DKRZ storage is a lot more resilient starting to crumble at 2-3x the number of GETs per file. That said, we using anything more than
max_threads = 30is probably overkill from a number of points: file chunking should be a lot coarser, effective Python threads can't be gained after 30 odd, and, above all, we can always makemax_threadsanActiveparameter.Varsiha's test script from #333 shows the same behaviour - things starting to cook up at around 50 threads, at 100 things are completely broken; see https://github.com/NCAS-CMS/PyActiveStorage/actions/runs/26521834459
attn: @varsiha-sothilingam @bnlawrence