Is there a way to generate HTML and PDFs with selectable text in pandoc?

Question

I am writing some documentation in Pandoc Markdown, and I would like a diagram to have selectable text in both the HTML and PDF output. I made the diagram in draw.io, so I can export it as a PDF or SVG.

Suppose I have the following Pandoc Markdown document:

# Introduction

![diagram](diagram.svg)

If I compile the following with pandoc doc.md -o doc.html, the diagram's text is not selectable in the HTML site.

SVG diagram in HTML Site

The SVG by itself does have selectable text though.

SVG diagram in browser

If I compile with pandoc doc.md -o doc.pdf, the diagram's text is not selectable in the resulting PDF.

SVG diagram in PDF

On the other hand, if I export the diagram as a PDF, the diagram's text is selectable in the HTML site, but only through a PDF reader which make it kind of ugly.

PDF diagram in HTML site

The text is selectable in the PDF with no issues though.

I was thinking of writing a Pandoc filter to use the svg tag instead of the img tag in the HTML, but that still wouldn't solve it not being selectable in the PDF. Does anyone know if this is possible?

chrism99 · Accepted Answer · 2025-09-16 23:19:06Z

In case anyone is also trying to do this, I was able to make the text selectable in HTML sites by writing a pandoc filter that uses the object tag instead of the img tag. I wrote it in Haskell, but it can probably be easily converted to lua or python.

#!/usr/bin/env runhaskell

-- svg.hs
--    Use the `<object>` tag instead of `<img>` tag for svgs. Using `<object>`
--    allows for interactive SVGs.
--
-- Do `cabal install --lib pandoc-types` to install the pandoc Haskell module.
--
-- Compile with `ghc -package pandoc -package text svg.hs`
--
-- Then `pandoc --filter ./svg <rest of options>`

{-# LANGUAGE OverloadedStrings #-}

import Data.Text
import Text.Pandoc.JSON
import Text.Pandoc.Shared

imgToObject :: Inline -> Inline
imgToObject (Image attr alt (src, title))
    | ".svg" `isSuffixOf` src = 
        RawInline "html" $
            "<object data=\"" <> src <> "\" type=\"image/svg+xml\" title=\"" <> title <> "\">"
            <> (stringify alt)
            <> "</object>"
    | otherwise = (Image attr alt (src, title))

imgToObject x = x

main :: IO ()
main = toJSONFilter imgToObject

As for the PDF output, pandoc uses the \includesvg from the svg package. While looking through the documentation, I saw an example of exactly what I was trying to do (see https://ctan.org/pkg/svg?lang=en Section 4. Example). Sure enough, using their example svg, the text is selectable in the output pdf. This package looks to have been made with Inkscape in mind, but I was able to make an svg diagram in Inkscape which I exported as a plain svg (not an Inkscape svg) and it still had selectable text. I think might just be a draw.io issue.

Collectives™ on Stack Overflow

Is there a way to generate HTML and PDFs with selectable text in pandoc?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related