Skip to content

Logging interface#7

Merged
lkdvos merged 23 commits intomainfrom
ld-debug
Nov 8, 2025
Merged

Logging interface#7
lkdvos merged 23 commits intomainfrom
ld-debug

Conversation

@lkdvos
Copy link
Copy Markdown
Collaborator

@lkdvos lkdvos commented Oct 16, 2025

Very WIP and rough start of getting this on the road. I've basically spent the entire day toying around with a bunch of different strategies and approaches, based on as many different frameworks as I could find, and wasn't really happy with most of them.
I think I'm starting to settle on some parts I like, but this still needs a lot of work.

(this is not ready for review, just wanted to post some progress here to let you know that I am actually looking into it)

Some comments on what I have:

  • I somehow wanted to replace debug with logging, since I get confused about the meaning of the word debug, which for me means figuring out things that arent working. Happy to change that if this is too radical, just a suggestion.
  • Lots and lots of bikeshedding about names
  • The logging system as a whole is painful because there are too many ways that all look very equivalent to achieve the same thing.

Some vague goals:

  • separate out generating log records and processing them, in order to unify printing and recording data
  • have a system which is extensible, so a specific algorithm may introduce new logging events
  • have a default system which works in most of the cases
  • have a system which is compatible with subsolvers

I've started by migrating some of the Manopt debug things. I am for now trying to avoid having to wrap many things, although I might go back on this. Having problem, alg, state also makes it not an obvious choice what to wrap.

Notably missing from the things I've started porting is the logic for deciding when to print something. I have some vague intuition/notion that this should be handled by the logger, although I have not figured out a satisfying way to achieve this.

Also missing is the formatting parts. In order to unify logging and recording, the default implementation does not return strings, rather it produces data. I'm hoping to be able to tie unique id's to these logs, which in turn the logger could use to format into a string if it wants to print, or to push to a vector of data if it wants to record.

I am probably going to try use LoggingExtras.jl to do some of the routing, ie having a logger that splits to both a file and an io and a recording device sounds important

The factories probably will have to return both a logger to handle the events, as well as a set of action to generate the events.


all of this is already sounding incredibly involved, and I'm slightly scared about overengineering it to a point where it becomes virtually unusable, but for now I'm hopeful and will at least continue a little bit to see how it shakes out. Take everything with a grain of salt, it might be necessary to discard everything in the end.

@kellertuer
Copy link
Copy Markdown
Member

Using Logging instead of Debug in naming is totally fine. Debug might only be one reason one wants to see prints :)

@kellertuer
Copy link
Copy Markdown
Member

My system also only very slowly evolved into the current state, so I do see that starting that anew from scratch might look like quite some work. I can take a look at the sub solver ideas once you have settled your ideas a bit as well as the factory.

About the “what-to-warp” I think that is a state-property to know wether to log or not; neither problem nor algorithm are changed.

The Manopt subsolver debug there is the “sending side”

https://github.com/JuliaManifolds/Manopt.jl/blob/392ee0e31e61beeffcca1e27a71796d71e6e7ddd/src/plans/debug.jl#L196-L202

as well as the “receiving side”

https://github.com/JuliaManifolds/Manopt.jl/blob/392ee0e31e61beeffcca1e27a71796d71e6e7ddd/src/plans/debug.jl#L841

and sure that wraps a certain debugger inside. - and the communicate via the set_parameter! mechanism I used in several areas of Manopt, since that is even designed to remember a chain where to pass things to (here the substates debug).

Comment thread src/interface/interface.jl Outdated
@lkdvos
Copy link
Copy Markdown
Collaborator Author

lkdvos commented Nov 5, 2025

So, while it took me some time to actually write some updates, this has actually been in the back of my mind for quite some time.
The main issue was that I couldn't really find a solution that got all of my goals in a satisfactory way, but I think I finally found something I quite like.

One of the main reasons I was drawn to the logging system is that it does not require any wrapping, and I was hoping to find a way to not have to pollute the actual algorithms too much.
However, I kept getting annoyed by some pretty fundamental problems with that approach:

  • What do I do with logging events that are not from an algorithm
  • How do I have no (or very little) overhead by default
    The first issue could be solved by manually wrapping a parent logger, and then at every algorithm start using a with_logger approach, which then filters out the correct messages and dispatches the algorithm events correctly.
    I tried this out, but didn't really like the end result.
    The fact that this basically requires that you always wrap the algorithm, or deal with the fact that you get log messages into the logging pipeline that are "unhandled" really felt wrong.
    The same can be said for that second issue, where it is easy to turn off the logging with very little overhead, but this is not the case by default.
    An alternative solution I played around with a bit was to emit custom LogEvent structs, and attempt to use dispatch in combination with the logging system to handle this.
    Unfortunately, the logging system is pretty much designed to not specialize as much as possible, and I couldn't find a way to not run into method ambiguities.
    While this could be manually patched for the base loggers, every single new logger would have the same issue, so this was also off the table.

What kept me kind of fixated on this is that it felt like it really almost works the way I wanted it to, so finally I bit the bullet and realized I can just partially rewrite the entire logging system myself, copying what I like and not copying what I don't.
So here is my new attempt, which is in spirit very similar to Manopt, with the primary difference being that I used a scoped value.
The idea being that this should quite naturally be composable for subsolvers, having their own scope, doesn't interfere with other logging machinery, still has full flexibility, and the default case should have very little overhead.

I hope it is slightly clear what I'm trying to achieve, I still have some work cut out for things like factories, and properly writing some docs for this, but I wanted to already push this and perhaps get an opinion or two?

@kellertuer
Copy link
Copy Markdown
Member

Overall the newest commit looks very nice. I already had one comment on the code, see above, that one could reduce overhead also by having a wrapper for the algorithm state, is that without that wrapper no logging functions are ever called. This is what I do with the DebugState in Manopt.

@lkdvos
Copy link
Copy Markdown
Collaborator Author

lkdvos commented Nov 6, 2025

Hmmm, I would really like to avoid having to do additional wrapping, since the wrapping is already happening in the scoped value.
I did some minimal benchmarks, using the root-finding problem in our testsuite.
For a 1000 iterations of the newton method, which is realistically the worst-case scenario for something you would ever use an iterative solver for, as the computations are very cheap, I get these timings before and after:

julia> @benchmark solve($problem, $algorithm1)

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  128.583 μs   6.155 ms  ┊ GC (min  max): 0.00%  96.95%
 Time  (median):     142.584 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   147.027 μs ± 97.708 μs  ┊ GC (mean ± σ):  2.41% ±  4.95%

              ▄▇▆▃▄▄▇█▅▂
  ▂▄▄▃▄▄▆▆▇▅▅███████████▇▇▇▆▅▅▅▅▄▄▄▄▃▃▂▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  129 μs          Histogram: frequency by time          175 μs <

 Memory estimate: 156.30 KiB, allocs estimate: 10002.

julia> @benchmark solve($problem, $algorithm1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  164.042 μs    6.906 ms  ┊ GC (min  max): 0.00%  97.27%
 Time  (median):     174.250 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   179.881 μs ± 100.043 μs  ┊ GC (mean ± σ):  2.00% ±  4.83%

       ▁▅▇█▆▂▂▂▃▃▃
  ▁▁▂▃▆█████████████▇▇▆▆▇▇▆▆▅▅▄▃▃▃▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  164 μs           Histogram: frequency by time          207 μs <

 Memory estimate: 156.30 KiB, allocs estimate: 10002.

In other words, the overhead of the logging infrastructure when there is no logging happening is about 30-40 ns per iteration. I think I find this acceptable, but that's probably because of my use-cases far exceeding these values. I don't know what you think?

(as a sidenote, this looks like a lot of allocations are happening, which is definitely something to investigate)

@kellertuer
Copy link
Copy Markdown
Member

Hm, my idea of the wrapper was that there is not even that code in the default case, but you might be right that this super simple newton is already an extreme case and the overhead does look ok, though we would always have case 2 in your scenario?
I do not have a super strong reason for the wrapper, just my default way to do stuff, so I am also maybe just used to that paradigm.
The plain iteration looks a bit nicer without the logs calls,...but again, not a strong push for that, just personal reference.

For the allocs, maybe just how Newton is implemented there.

@codecov
Copy link
Copy Markdown

codecov Bot commented Nov 6, 2025

Codecov Report

❌ Patch coverage is 95.74468% with 2 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@4a60118). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/logging.jl 97.56% 1 Missing ⚠️
src/stopping_criterion.jl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main       #7   +/-   ##
=======================================
  Coverage        ?   56.36%           
=======================================
  Files           ?        5           
  Lines           ?      220           
  Branches        ?        0           
=======================================
  Hits            ?      124           
  Misses          ?       96           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lkdvos
Copy link
Copy Markdown
Collaborator Author

lkdvos commented Nov 6, 2025

So, I've updated the default implementation, the overhead is now completely negligible and it looks like the compiler can union-split where it needs to. (There is some variability on the benchmarks depending on what my computer is doing at the same time I think)

This clearly has less overhead, but the iteration now looks a bit more complicated, and we lose the ability to enable or disable the logging after the solving has started, which was previously actually possible in concurrent environments.
I think I'm fine with either, I don't mind the additional 4 lines in our default solve! implementation, and I like that there is only a single entry point if I ever would overload something.

julia> @benchmark solve($problem, $algorithm1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  134.250 μs    7.939 ms  ┊ GC (min  max): 0.00%  97.87%
 Time  (median):     149.375 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   155.963 μs ± 123.466 μs  ┊ GC (mean ± σ):  2.95% ±  4.94%

                 ▆█▃▁▁▁▁▁
  ▁▁▁▁▁▁▁▁▁▂▂▂▂▄█████████▇▇▇▆▆▄▄▃▃▂▂▂▂▂▃▄▂▂▂▂▂▁▁▁▁▁▁▁▁▁▂▂▂▁▁▁▁▁ ▃
  134 μs           Histogram: frequency by time          178 μs <

 Memory estimate: 156.30 KiB, allocs estimate: 10002.

julia> @benchmark solve($problem, $algorithm1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min  max):  141.750 μs    9.444 ms  ┊ GC (min  max): 0.00%  98.15%
 Time  (median):     148.667 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   155.532 μs ± 128.421 μs  ┊ GC (mean ± σ):  2.92% ±  4.94%

     ▃██▃  ▁▁▁▁
  ▁▂▅████████████▇▆▆▅▄▃▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▃▅▄▂▁▂▂▂▁▁▁▁▁▁▁▁▁▁ ▃
  142 μs           Histogram: frequency by time          180 μs <

 Memory estimate: 156.30 KiB, allocs estimate: 10002.

@kellertuer
Copy link
Copy Markdown
Member

Oh, the argument to interactively add logs sounds like an argument for your approach. In my wrap idea one would have to stop, wrap, and restart-solve which might recompile a few things.

I have not checked what LOGGING_ENABLED is but the current solution looks like a good mix of both, while keeping your flexibility during the iterations. So lets keep it like it is now

@kellertuer
Copy link
Copy Markdown
Member

kellertuer commented Nov 6, 2025

also good it is no longer log!, both because we have that function on manifolds and the new name sounds a bit better :)

edit: checked the overall PR and understand the variables now, neat :)

@lkdvos
Copy link
Copy Markdown
Collaborator Author

lkdvos commented Nov 6, 2025

Ok, let me try and get some tests, some factories and the docs in, shoud make things more clear too

@kellertuer
Copy link
Copy Markdown
Member

The extended docs look very neat! I think for a few of the actions and stopping criteria I would also like to work on a “convenience layer”, that you can just say something like debug = [:Time,] or so to generate the three actions to measure time (same for the stopping criteria) – so having some factories for these so the end user has to think a bit less :).
But I like the fine granularity one could do by hand every time as well.

@kellertuer
Copy link
Copy Markdown
Member

The docs are indeed very nice.

I have a comment on indicates_conovergence: In Manopt I have a second function by now, that sounds a bit less formal: has_converges. The difference is as follows (and Herre even nicer to divide):

  • indicate convergence is the static state, whether a stopping criterion would always stop due to a convergence certificate.
    i.e. stopping after an iteration would never indicate that, stop when stale would. their or would not, their and would
  • has converged would be the dynamic thing, for an actual algorithm that ran. Then the first thing is that it actually has finished. Afterwards the difference would be for the or case: Depending on which one “fired”, it would return true even (if stale has indicated)

One could also try to interface some status report to some enums - but that is a bit tricky maybe with the or/_and_s ;)

I can work on this after this PR when I tackle a stripping criterion factory (to handle a few kwargs automatically like maxiter or maxtime) and turn them into stopping criteria

Comment thread docs/src/logging.md Outdated
@lkdvos
Copy link
Copy Markdown
Collaborator Author

lkdvos commented Nov 7, 2025

@kellertuer
Copy link
Copy Markdown
Member

Nice!

You have a good tendency to confuse me with function names: log! is the in-place logarithmic map and you used that for a while, now there is also GroupAction, cf. https://juliamanifolds.github.io/LieGroups.jl/stable/interface/actions/#LieGroups.GroupAction. Yeah its is slowly getting late here - so besides naming - these are great tutorials!

They go in full detail with all technical stuff necessary. I hope (aas mentioned) to do some them on a convenience layer a bit shorter / more accessible, but that would then be code/part of a tutorial before (with a bit less technical detail). Great work!

@lkdvos
Copy link
Copy Markdown
Collaborator Author

lkdvos commented Nov 7, 2025

The best part is that I actually realized this the moment I wrote it down, that group action has a very different meaning 🙃 I'm happy to change that though, just wanted to get everything more or less in and stable. (Also, I realized that in the current implementation, we could even have with_algorithmlogger(:PreStep => action1, :PreStep => action2) to automatically do this, or again use & or some other syntax, in which case we wouldn't even need to export this :)

I think I would propose the following plan:

  • I'm finalizing this PR by adding some tests, after which I would like to make some final changes based on your feedback, surrounding naming, wrapping, wording etc.
  • I 200% agree that the current actions are too bare bones, and we can and should add a number of convenience builtins that make life a lot easier. However, in an attempt to make this PR not too big, I'd maybe save that for a follow-up, to make reviewing and seeing what happens easier?
  • Writing the example for the docs has made me realize that there are some minor points in the interface that we might be able to improve upon. Since we aren't at a release stage though, I'd also like to bring that up in a separate issue, after getting this in.
  • I'm not sure I understood your comments about indicates_convergence, and the interplay between stopping criteria and logging is still a bit fuzzy to me. I'd love to formalize that part a bit more, since I'd argue that this is definitely an important aspect, but it might again be easier to reason about this after already merging this PR.

@lkdvos lkdvos changed the title [WIP] Logging interface Logging interface Nov 7, 2025
@kellertuer
Copy link
Copy Markdown
Member

since GroupAction is an action on a group and here we have a group of actions, what to you think about ActionGroup ?

I like the idea with the pairs :Context => Action that looks nice. And within a context building a group, & is ok, but we could also use some other binary operator is we have one that seems fitting?

Your next steps sound great!

For the convergence, maybe have the following in mind that is even easier to split here (and naming of functions can surely be improved)

  • indicates_convergence(sc) is more a mathematical and static property to ask. So this would here only depend on a StoppingCriterion and state, whether (if this indicates to stop) it indicates we converged to something. StopAfterIteration(10) does not indicate convergence but a StopWhenGradientNormLess indicates convergence (at least to a critical point). This is static - no algorithm needs to run for this.
  • has_converged(sc, sc_state) is a thing depending on the StoppingCriterionState and could in for the simple cases like StopWhenGradientLess just be indicates_convergence(sc) && is_finihsed(...)

For the StopWhenAny the difference is very subtle. As a static guarantee (does it always indicate convergence?) all its components have to answer true for the. For has_convergeed on the other hand, among those that indicated is_finished, one has have converged.

@lkdvos
Copy link
Copy Markdown
Collaborator Author

lkdvos commented Nov 8, 2025

Ah, I think I'm following what you mean with the distinction now, that does make sense indeed.

Copy link
Copy Markdown
Member

@kellertuer kellertuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though there is one minor remark I have for the docs, I already mark it as approved, since that is not an urgent one.

Thanks for the detailed technical docs. It helped me to see how nice the Logging ended up being and to also see a nice explanation for the stopping criteria.

Comment thread docs/src/interface.md Outdated
Co-authored-by: Ronny Bergmann <git@ronnybergmann.net>
@lkdvos lkdvos merged commit c0ad8e3 into main Nov 8, 2025
11 checks passed
@lkdvos lkdvos deleted the ld-debug branch November 8, 2025 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants