Observability

Librsvg supports basic, mostly ad-hoc logging with an RSVG_LOG=1 environment variable. This has not been very effective in letting me, the maintainer, know what went wrong when someone reports a bug about librsvg doing the wrong thing in an application. Part of it is because the code could be more thorough about logging (e.g. log at all error points), but also part of it is that there is no logging about what API calls are made into the library. On each bug reported on behalf of a particular application, my thought process goes something like this:

  • What was the app doing?

  • Can I obtain the problematic SVG?

  • Does the bug reporter even know what the problematic SVG was?

  • Was the app rendering with direct calls to the librsvg API?

  • Or was it using the gdk-pixbuf loader, and thus has very little control of how librsvg is used?

  • If non-pixbuf, what was the Cairo state when librsvg was called?

  • What sort of interesting API calls are being used? Stylesheet injection? Re-rendering single elements with different styles?

And every time, I must ask the bug reporter for information related to that, or to point me to the relevant source code where they were using librsvg… which is not terribly useful, since building their code and reproducing the bug with it is A Yak That Should Not Have To Be Shaved.

Desiderata

Know exactly what an application did with librsvg:

  • All API calls and their parameters.

  • State of the Cairo context at entry.

  • “What SVG?” - be careful and explicit about exfiltrating SVG data to the logs.

  • Basic platform stuff? Is the platform triple enough? Distro ID?

  • Versions of dependencies.

  • Version of librsvg itself.

Internals of the library:

  • Regular debug tracing. We may have options to enable/disable tracing domains: parsing, cascading, referencing elements, temporary surfaces during filtering, render tree, etc.

  • Log all points where an error is detected/generated, even if it will be discarded later (e.g. invalid CSS values are silently ignored, per the spec).

Enabling logging

It may be useful to be able to enable logging in various ways:

  • Programmatically, for when one has control of the source code of the problematic application. Enable logging at the problem spot, for the SVG you know that exhibits the problem, and be done with it. This can probably be at the individual RsvgHandle level, not globally. For global logging within a single process, see the next point.

  • For a single process which one can easily launch via the command line; e.g. with an environment variable. This works well for non-sandboxed applications. Something like RSVG_LOG_CONFIG=my_log_config.toml.

  • With a configuration file, a la ~/.config/librsvg.toml. Many programs use librsvg and you don’t want logs for all of them; allow the configuration file to specify a process name, or maybe other ways of determining when to log. For session programs like gnome-shell, you can’t easily set an environment variable to enable logging - hence, a configuration file that only turns on logging from the gnome-shell process.

All of the above should be well documented, and then we can deprecate RSVG_LOG.

Which SVG caused a crash?

Every once in a while, a bug report comes in like “$application crashed in librsvg”. The application renders many SVGs, often indirectly via gdk-pixbuf, and it is hard to know exactly which SVG caused the problem. Think of gnome-shell or gnome-software.

For applications that call librsvg directly, if they pass the filename or a GFile then it is not hard to find out the source SVG.

But for those that feed bytes into librsvg, including those that use it indirectly via gdk-pixbuf, librsvg has no knowledge of the filename. We need to use the base_uri then, or see if the pixbuf loader can be modified to propagate this information (is it even available from the GdkPixbufLoader machinery?).

If all else fails, we can have an exfiltration mechanism. How can we avoid logging all the SVG data that gnome-shell renders, for example? Configure the logger to skip the first N SVGs, and hope that the order is deterministic? We can’t really “log only if there is a crash during rendering”.

Log only the checksums of SVGs or data lengths, and use that to find which SVG caused the crash? I.e. have the user use a two-step process to find a crash: get a log (written synchronously) of all SVG checksums/lengths, and then reconfigure the logger to only exfiltrate the last one that got logged - presumably that one caused the crash.

Which dynamically-created SVG caused a problem?

Consider a bug like https://gitlab.gnome.org/GNOME/gnome-shell/-/issues/5415 where an application dynamically generates an SVG and feeds it to librsvg. That bug was not a crash; it was about incorrect values returned from an librsvg API function. For those cases it may be useful to be able to exfiltrate an SVG and its stylesheets only if it matches a user-provided substring.

Global configuration

$(XDG_CONFIG_HOME)/librsvg.toml - for startup-like processes like gnome-shell, for which it is hard to set an environment variable:

Per-process configuration

RSVG_LOG_CONFIG=my_log_config.toml my_process

Programmatic API

FIXME

Configuration format

[logging]
enabled=true
process=gnome-shell              # mandatory for global config - don't want to log all processes - warn to g_log if key is not set
output=/home/username/rsvg.log   # if missing, log to g_log only - or use a output_to_g_log=true instead?

API logging

Log cr state at entry, surface type, starting transform.

Log name/base_uri of rendered document.

Can we know if it is a gresource? Or a byte buffer? Did it come from gdk-pixbuf?

Implementation

There is currently the start of a Session type woven throughout the source code, with the idea of it being the thing that records logging events, it may be better to plug into the tracing ecosystem:

https://crates.io/crates/tracing

Initial ideas:

  • See the “In libraries” section in tracing’s README; it shows how to create spans for API calls.

  • How would we capture from gnome-shell? tracing-journald? Or would things be easier for casual users if we logged to a file?

  • Maybe later, have a tracing-sysprof crate to send the events to sysprof?

Log contents

/home/username/rsvg.log - json doesn’t have comments; put one of these in a string somehow:

******************************************************************************
* This log file exists because you enabled logging in ~/.config/librsvg.toml *
* for the "gnome-shell" process.                                             *
*                                                                            *
* If you want to disable this kind of log, please turn it off in that file   *
* or delete that file entirely.                                              *
******************************************************************************

******************************************************************************
* This log file exists because you enabled logging with                      *
* RSVG_LOG_CONFIG=config.toml for the "single-process-name" process.         *
*                                                                            *
* If you want to disable this kind of log, FIXME                             */
******************************************************************************

To-do list

  • Audit code for GIO errors; log there.

  • Audit code for Cairo calls that yield errors; log there.

  • Log the entire ancestry of the element that caused the error? Is that an insta-reproducer?