PDF Post-processing =================== Chromium's print-to-PDF produces valid but bloated files. Every page is rendered independently so identical assets (background images, SVG patterns, transparency masks) are embedded as separate PDF objects on every page. A 365-day planner can easily be 10x larger than necessary. Pyplanner provides two post-processing utilities that shrink the output and add navigation. Optimization lives in its own module (``pdfopt``) with a simple bytes-in/bytes-out API. It is *not* called inside ``Planner.pdf()``. This separation keeps the rendering path focused on Jinja2 and Playwright, and lets callers choose whether to optimize. The CLI exposes ``--opt/--no-opt``; library users call ``optimize()`` explicitly. Optimizing PDF size ------------------- :func:`~pyplanner.pdfopt.optimize` accepts raw PDF bytes, rewrites the internal structure and returns optimized bytes: .. code-block:: python from pyplanner.pdfopt import optimize with open("planner.pdf", "rb") as f: raw = f.read() optimized = optimize(raw) with open("planner-opt.pdf", "wb") as f: f.write(optimized) saved = 1 - len(optimized) / len(raw) print(f"Saved {saved:.0%}") The optimization passes are: 1. **Image deduplication** - identical Image XObjects are merged into a single canonical copy. The full PDF object graph is walked to rewire all references. This handles images inside Form XObjects, tiling Patterns and transparency masks. 2. **ProcSet stripping** - obsolete ``/ProcSet`` arrays are removed from every ``/Resources`` dictionary. These have been ignored by PDF readers since PDF 1.4 (2001). 3. **Form XObject deduplication** - after image dedup, Form XObjects that previously differed only by which copy of an image they referenced now have identical content streams. They are merged the same way as images. 4. **Recompression** - the file is re-serialized with object stream packaging and Flate recompression. Adding bookmarks ---------------- :func:`~pyplanner.pdfbookmarks.add_bookmarks` inserts outline entries (a table-of-contents sidebar in PDF viewers): .. code-block:: python from pyplanner.pdfbookmarks import add_bookmarks with open("planner.pdf", "rb") as f: pdf_bytes = f.read() # Top-level bookmarks: (title, 0-based page number) pdf_bytes = add_bookmarks(pdf_bytes, [("2026", 1),]) # Nested bookmarks under "2026": pdf_bytes = add_bookmarks(pdf_bytes, [ ("January", 2), ("February", 33), ], parent=["2026"]) with open("planner-bookmarked.pdf", "wb") as f: f.write(pdf_bytes) The ``parent`` argument is a list of bookmark titles forming a path from the root to the desired parent node. .. note:: When rendering via :meth:`Planner.pdf() `, bookmarks are added automatically from ``.page`` element IDs in the HTML. You only need :func:`~pyplanner.pdfbookmarks.add_bookmarks` if you are building a custom pipeline. Putting it together ------------------- A typical post-processing pipeline after ``Planner.pdf()``: .. code-block:: python from pyplanner import Calendar, Planner from pyplanner.pdfopt import optimize cal = Calendar() planner = Planner("planners/ff-2026/ff-2026.html", calendar=cal) pdf_bytes = planner.pdf() # bookmarks added automatically pdf_bytes = optimize(pdf_bytes) with open("ff-2026.pdf", "wb") as f: f.write(pdf_bytes) This matches what the CLI does when you run ``pyplanner planners/ff-2026 --pdf``. API reference ------------- .. autofunction:: pyplanner.pdfopt.optimize .. autofunction:: pyplanner.pdfbookmarks.add_bookmarks