PDF Post-processing
Chromium’s print-to-PDF produces valid but bloated files. Every page is rendered independently so identical assets (background images, SVG patterns, transparency masks) are embedded as separate PDF objects on every page. A 365-day planner can easily be 10x larger than necessary.
Pyplanner provides two post-processing utilities that shrink the output and add navigation.
Optimization lives in its own module (pdfopt) with a simple
bytes-in/bytes-out API. It is not called inside Planner.pdf(). This
separation keeps the rendering path focused on Jinja2 and Playwright, and lets
callers choose whether to optimize. The CLI exposes --opt/--no-opt; library
users call optimize() explicitly.
Optimizing PDF size
optimize() accepts raw PDF bytes, rewrites the internal
structure and returns optimized bytes:
from pyplanner.pdfopt import optimize
with open("planner.pdf", "rb") as f:
raw = f.read()
optimized = optimize(raw)
with open("planner-opt.pdf", "wb") as f:
f.write(optimized)
saved = 1 - len(optimized) / len(raw)
print(f"Saved {saved:.0%}")
The optimization passes are:
Image deduplication - identical Image XObjects are merged into a single canonical copy. The full PDF object graph is walked to rewire all references. This handles images inside Form XObjects, tiling Patterns and transparency masks.
ProcSet stripping - obsolete
/ProcSetarrays are removed from every/Resourcesdictionary. These have been ignored by PDF readers since PDF 1.4 (2001).Form XObject deduplication - after image dedup, Form XObjects that previously differed only by which copy of an image they referenced now have identical content streams. They are merged the same way as images.
Recompression - the file is re-serialized with object stream packaging and Flate recompression.
Adding bookmarks
add_bookmarks() inserts outline entries (a
table-of-contents sidebar in PDF viewers):
from pyplanner.pdfbookmarks import add_bookmarks
with open("planner.pdf", "rb") as f:
pdf_bytes = f.read()
# Top-level bookmarks: (title, 0-based page number)
pdf_bytes = add_bookmarks(pdf_bytes, [("2026", 1),])
# Nested bookmarks under "2026":
pdf_bytes = add_bookmarks(pdf_bytes, [
("January", 2),
("February", 33),
], parent=["2026"])
with open("planner-bookmarked.pdf", "wb") as f:
f.write(pdf_bytes)
The parent argument is a list of bookmark titles forming a path from the
root to the desired parent node.
Note
When rendering via Planner.pdf(), bookmarks
are added automatically from .page element IDs in the HTML. You only
need add_bookmarks() if you are building a
custom pipeline.
Putting it together
A typical post-processing pipeline after Planner.pdf():
from pyplanner import Calendar, Planner
from pyplanner.pdfopt import optimize
cal = Calendar()
planner = Planner("planners/ff-2026/ff-2026.html", calendar=cal)
pdf_bytes = planner.pdf() # bookmarks added automatically
pdf_bytes = optimize(pdf_bytes)
with open("ff-2026.pdf", "wb") as f:
f.write(pdf_bytes)
This matches what the CLI does when you run
pyplanner planners/ff-2026 --pdf.
API reference
- pyplanner.pdfopt.optimize(pdf_bytes: bytes) bytes
Deduplicate images, strip ProcSets, and merge identical Forms.
Opens the given PDF bytes with pikepdf, applies all optimization passes described in the module docstring, and returns the re-serialized PDF bytes.
- Parameters:
pdf_bytes – Raw PDF file content.
- Returns:
Optimized PDF file content.
- pyplanner.pdfbookmarks.add_bookmarks(pdf_bytes: bytes, items: Sequence[tuple[str, int]], parent: Iterable[str] | None = None) bytes
Insert bookmarks into a PDF.
Each call appends one or more sibling bookmark entries under the specified parent node. Call multiple times to build a multi-level outline incrementally.
- Parameters:
pdf_bytes – Raw PDF content.
items –
(title, page_number)pairs to insert. Page numbers are 0-based.parent – Path of bookmark titles from the root to the desired parent node, e.g.
["2026", "January"]inserts under the “January” child of “2026”.None(default) or empty iterable inserts at the top level.
- Returns:
PDF bytes with bookmarks added.
- Raises:
ValueError – If any title in parent is not found in the existing outline.