PDF Post-processing

Chromium’s print-to-PDF produces valid but bloated files. Every page is rendered independently so identical assets (background images, SVG patterns, transparency masks) are embedded as separate PDF objects on every page. A 365-day planner can easily be 10x larger than necessary.

Pyplanner provides two post-processing utilities that shrink the output and add navigation.

Optimization lives in its own module (pdfopt) with a simple bytes-in/bytes-out API. It is not called inside Planner.pdf(). This separation keeps the rendering path focused on Jinja2 and Playwright, and lets callers choose whether to optimize. The CLI exposes --opt/--no-opt; library users call optimize() explicitly.

Optimizing PDF size

optimize() accepts raw PDF bytes, rewrites the internal structure and returns optimized bytes:

from pyplanner.pdfopt import optimize

with open("planner.pdf", "rb") as f:
    raw = f.read()

optimized = optimize(raw)

with open("planner-opt.pdf", "wb") as f:
    f.write(optimized)

saved = 1 - len(optimized) / len(raw)
print(f"Saved {saved:.0%}")

The optimization passes are:

  1. Image deduplication - identical Image XObjects are merged into a single canonical copy. The full PDF object graph is walked to rewire all references. This handles images inside Form XObjects, tiling Patterns and transparency masks.

  2. ProcSet stripping - obsolete /ProcSet arrays are removed from every /Resources dictionary. These have been ignored by PDF readers since PDF 1.4 (2001).

  3. Form XObject deduplication - after image dedup, Form XObjects that previously differed only by which copy of an image they referenced now have identical content streams. They are merged the same way as images.

  4. Recompression - the file is re-serialized with object stream packaging and Flate recompression.

Adding bookmarks

add_bookmarks() inserts outline entries (a table-of-contents sidebar in PDF viewers):

from pyplanner.pdfbookmarks import add_bookmarks

with open("planner.pdf", "rb") as f:
    pdf_bytes = f.read()

# Top-level bookmarks: (title, 0-based page number)
pdf_bytes = add_bookmarks(pdf_bytes, [("2026", 1),])

# Nested bookmarks under "2026":
pdf_bytes = add_bookmarks(pdf_bytes, [
    ("January", 2),
    ("February", 33),
], parent=["2026"])

with open("planner-bookmarked.pdf", "wb") as f:
    f.write(pdf_bytes)

The parent argument is a list of bookmark titles forming a path from the root to the desired parent node.

Note

When rendering via Planner.pdf(), bookmarks are added automatically from .page element IDs in the HTML. You only need add_bookmarks() if you are building a custom pipeline.

Putting it together

A typical post-processing pipeline after Planner.pdf():

from pyplanner import Calendar, Planner
from pyplanner.pdfopt import optimize

cal = Calendar()
planner = Planner("planners/ff-2026/ff-2026.html", calendar=cal)

pdf_bytes = planner.pdf()     # bookmarks added automatically
pdf_bytes = optimize(pdf_bytes)

with open("ff-2026.pdf", "wb") as f:
    f.write(pdf_bytes)

This matches what the CLI does when you run pyplanner planners/ff-2026 --pdf.

API reference

pyplanner.pdfopt.optimize(pdf_bytes: bytes) bytes

Deduplicate images, strip ProcSets, and merge identical Forms.

Opens the given PDF bytes with pikepdf, applies all optimization passes described in the module docstring, and returns the re-serialized PDF bytes.

Parameters:

pdf_bytes – Raw PDF file content.

Returns:

Optimized PDF file content.

pyplanner.pdfbookmarks.add_bookmarks(pdf_bytes: bytes, items: Sequence[tuple[str, int]], parent: Iterable[str] | None = None) bytes

Insert bookmarks into a PDF.

Each call appends one or more sibling bookmark entries under the specified parent node. Call multiple times to build a multi-level outline incrementally.

Parameters:
  • pdf_bytes – Raw PDF content.

  • items(title, page_number) pairs to insert. Page numbers are 0-based.

  • parent – Path of bookmark titles from the root to the desired parent node, e.g. ["2026", "January"] inserts under the “January” child of “2026”. None (default) or empty iterable inserts at the top level.

Returns:

PDF bytes with bookmarks added.

Raises:

ValueError – If any title in parent is not found in the existing outline.