Render tree
For historical reasons, librsvg’s code flow during rendering is as
follows. The rendering code traverses the SVG tree of elements, and
for each one, its ::draw()
method is called; its signature looks
like this (some arguments omitted):
pub fn draw(
&self,
...
draw_ctx: &mut DrawingCtx,
) -> Result<BoundingBox, RenderingError> { ... }
The draw() methods perform the actual rendering as side effects on the
draw_ctx
, and return a BoundingBox
. That is, the bounding box of
an element is computed at the same time that it is rendered. This is
suboptimal for several reasons:
Many things that happen during rendering depend on knowing the bounding box. For example, gradients, patterns, and filters with units set to
objectBoundingBox
need to know the bounds. The rendering code in drawing_ctx.rs is cluttered because it must resolve bounding boxes very late.This is especially problematic for filters, since a Cairo surface needs to be created before rendering, and that surface should have a size relative to the bounding box of the element being filtered! Bug #1 is precisely about this: librsvg instead creates a temporary surface as big as the document’s toplevel viewport and filters it, but this doesn’t work well for filters like Gaussian blur that should actually reference pixels outside of the document’s area (think of a shape that extends past the document’s area, which then gets blurred).
The way for an element to signal that it is not drawable (e.g.
<defs>
is by returning an empty bounding box and not rendering anything. This is awkward.When rendering to a temporary surface for filtering or masking, there is a set of affine transformations that needs to be maintained carefully: an affine for the clipping path outside the temporary surface, an affine for drawing inside the surface, an affine to composite the surface into the final result. This is hard to understand and hard to test.
These problems can be solved by having a render tree.
What is a render tree?
As of 2022/Oct/06, librsvg does not compute a render tree data
structure prior to rendering. Instead, in a very 2000s fashion, it
walks the tree of elements and calls a .draw()
method for each
one. Each element then calls whatever methods it needs from
DrawingCtx
to draw itself. Elements which don’t produce graphical
output (e.g. <defs>
or <marker>
) simply have an empty
draw()
method.
Over time we have been refactoring that in the direction of actually being able to produce a render tree. What would that look like? Consider an SVG document like this:
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
<defs>
<rect id="TheRect" x="10" y="10" width="20" height="20" fill="blue"/>
</defs>
<g>
<use href="#TheRect" stroke="red" stroke-width="2"/>
<circle cx="50" cy="50" r="20" fill="yellow"/>
</g>
</svg>
A render tree would be a list of nested instructions like this:
group { # refers to the toplevel SVG
width: 100
height: 100
establishes_viewport: true # because it is an <svg> element
children {
group { # refers to the <g>
establishes_viewport: false # because it is a simple <g>
children {
shape {
path="the <rect> above but resolved to path commands"
# note how the following is the cascaded style and the <use> semantics
fill: blue
stroke: red
stroke-width: 2
}
shape {
path="the <circle> above but resolved to path commands"
fill: yellow
}
}
}
}
}
That is, we take the high-level SVG instructions and “lower” them to a few possible drawing primitives like path-based shapes that can be grouped. All the primitives have everything that is needed to draw them, like their set of computed values for styles, and their coordinates resolved to their user-space coordinate system.
Browser engines produce render trees more or less similar to the above (they don’t always call them that), and get various benefits:
The various recursively-nested subtrees can be rendered concurrently.
Having low-level primitives makes it easier to switch to another rendering engine in the future.
The tree can be re-rendered without recomputation, or subtrees can be recomputed efficiently if e.g. an animated element changes a few of its properties.
Why did librsvg not do that since the beginning?
Librsvg was originally written in the early 2000s, when several things were happening at the same time:
libxml2 (one of the early widely-available parsers for XML) had recently gotten a SAX API for parsing XML. This lets an application stream in the parsed XML elements and process them one by one, without having to build a tree of elements+attributes first. In those days, memory was at a premium and “not producing a tree” was seen as beneficial.
The SVG spec itself was being written, and it did not have all of the features we know now. In particular, maybe at some point it didn’t have elements that worked by referencing others, like
<use>
or<filter>
. The CSS cascade could be done on the fly for the XML elements being streamed in, and one could emit rendering commands for each element to produce the final result.
That is, at that time, it was indeed feasible to do this: stream in parsed XML elements one by one as produced by libxml2, and for each element, compute its CSS cascade and render it.
This scheme probably stopped working at some point when SVG got
features that allowed referencing elements that have not been declared
yet (think of <use href="#foo"/>
but with the <defs> <path
id="foo" .../> </defs>
declared until later in the document). Or
elements that referenced others, like <rect filter="url(#blah)">
.
In both cases, one needs to actually build an in-memory tree of parsed
elements, and then resolve the references between them.
That is where much of the complexity of librsvg’s code flow comes from:
AcquiredNodes
is the thing that resolves references when needed. It also detects reference cycles, which are an error.ComputedValues
often get resolved until pretty late, by passing theCascadedValues
state down to children as they are drawn.DrawingCtx
was originally a giant ball of mutable state, but we have been whittling it down and moving part of that state elsewhere.
Summary of the SVG rendering model
In the SVG2 spec, this has been offloaded to the “Order of graphical operations” section of the Compositing and Blending Level 1 spec. Once the render tree is resolved, each node is painted like this, conceptually to a transparent, temporary surface:
Paint the shape/text/etc.
Filters.
Clip paths.
Masks.
Blend/composite the temporary surface onto the result.
The most critical function in librsvg is probably DrawingCtx::with_discrete_layer; it implements this drawing model.
Current state (2023/03/30)
layout.rs
has the beginnings of the render tree. It’s probably mis-named? It contains this:
A LayerKind with primitives for path-based shapes, text, and images.
A stacking context, which indicates each layer’s opacity/clip/mask/filters.
A Layer which composes the previous two. The StackingContext provides the compositing/masking/filtering parameters, while the LayerKind determines the primitive contents of the layer.
Various ancillary structures that try to have only user-space coordinates (e.g. a number of CSS pixels instead of
5cm
) and no references to other things.
The last point is not yet fully realized. For example,
StackingContext.clip_in_user_space
has a reference to an element,
which will be used as the clip path — that one needs to be normalized
to user-space coordinates in the end. Also,
StackingContext.filter
is a filter list as parsed from the SVG,
not a FilterSpec
that has been resolved to user space.
It would be good to resolve everything as early as possible to allow lowering concepts to their final renderable form. Whenever we have done this via refactoring, it has simplified the code closer to the actual rendering via Cairo.
Major subprojects
Path based shapes (layout::Shape
) and text primitives
(layout::Text
) are almost done. The only missing thing for shapes
would be to “explode” their markers into the actual primitives that
would be rendered for them. However…
There is no primitive for groups yet. Every SVG element that allows
renderable children must produce a group primitive of some sort:
svg
, g
, use
, marker
, etc. Among those, use
and
marker
are especially interesting since they must explode their
referenced subtree into a shadow DOM, which librsvg doesn’t support
yet for CSS cascading purposes (the reference subtree gets rendered
properly, but the full semantics of shadow DOM are not implemented
yet).
Elements that establish a viewport (svg
, symbol
, image
,
marker
, pattern
) need to carry information about this
viewport, which is a viewBox
plus preserveAspectRatio
and
overflow
. See #298 for a somewhat
obsolete description of the refactoring work needed to unify this
logic.
The layout::StackingContext
struct should contain another field,
probably called layer
, with something like this:
struct StackingContext {
// ... all its current fields
layer: Layer
}
enum Layer {
Shape(Box<Shape>),
Text(Box<Text>),
StackingContext(Box<StackingContext>)
}
That is, every stacking context should contain the thing that it will draw, and that thing may be a shape/text or another stacking context!
As of 2023/03/30, the “current viewport” is no longer part of
DrawingCtx’s mutable state. Instead, a Viewport struct is passed
down the call chain via a function argument. This is not complete
yet, since the code modifies the current cr’s transform apart from
the current viewport’s transform. The goal is to have the current
viewport actually have the full transform to be applied to the object
being rendered. This should simplify gnarly code paths like the one
for rendering <pattern>
.
Bounding boxes
SVG depends on the objectBoundingBox
of an element in many places:
to resolve a gradient’s or pattern’s units, to determine the size of
masks and clips, to determine the size of the filter region.
The current big bug to solve is #778, which requires
knowing the objectBoundingBox
of an element before rendering
it, so that a temporary surface of the appropriate size can be created
for rendering the element if it has isolated opacity or masks/filters.
Currently librsvg creates a temporary surface with the size and
position of the toplevel viewport, and this is wrong for shapes that
fall outside the viewport.
The problem is that librsvg computes bounding boxes at the time of
rendering, not before that. However, now layout::Shape
and
layout::Text
already know their bounding box beforehand. Work
needs to be done to do the same for a layout::Group
or whatever
that primitive ends up being called (by taking the union of its
children’s bounding boxes, so e.g. that a group with a filter can
create a temporary surface to be able to render all of its children
and then filter the surface).
Being able to compute the objectBoundingBox
of an element before
rendering it would open the door to fixing bug #1 (yeah, really):
currently, the temporary surface used for filtering has the size of
the toplevel viewport, but this doesn’t work well when one tries to
Gaussian-blur an element that lies partially outside that viewport.
The filter should apply to the element’s extents plus the filter
region, which takes into account the extra space needed for a Gaussian
blur to work around a shape. Since librsvg cannot render the full
shape if it lies partially outside of the toplevel viewport, the
blurred result shows up with a halo near the image’s edge, since
transparent pixels get “blurred in” with the shape’s pixels.