• JackbyDev@programming.dev
    link
    fedilink
    English
    arrow-up
    17
    ·
    20 hours ago

    Oh boy, I sure am excited to websites hosting PDFs! I love when the tool that everyone uses for hosting and viewing HTML get to be blessed with the perfect format that is PDF!

    I LOVE PDFS! I love two column PDFs! I love reading like this!

    1 3
    2 4
    5 7
    6 8

    Instead of like this

    1
    2
    3
    4
    5
    6
    7
    8

    It’s amazing and such a good user experience!

    I love that PDFs are so difficult to transform into HTML, too. I would never want the besmirch the publishers oerfect one approved layout by resizing the window!

    • brianary
      link
      fedilink
      English
      arrow-up
      3
      ·
      8 hours ago

      I’ve always called Word documents and PDFs “dead-end formats” (DEF). Once you export your data to them, there’s no reliable way to retrieve your data from them for further transformation like you can for YAML, JSON, XML, HTML, Markdown, &c.

    • keepthepace@slrpnk.net
      link
      fedilink
      English
      arrow-up
      7
      ·
      19 hours ago

      I love that PDFs are so difficult to transform into HTML, too

      FYI, if that’s relevant to your field, every new article published on arxiv.org now has a HTML render as well.

      And on many older publications, transforming “arxiv.org” into “ar5iv.org” leads to an HTML rendering that is a best-effort experiments they ran for a while.

      • JackbyDev@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        ·
        18 hours ago

        That’s really cool! What I really would like is a tool that converts PDFs to semantic HTML files. I took a peek there and it seems easier for them because they have the original LeX source.

        I think for arbitrary PDFs files the information just isn’t there. I’ve looked into it a bit and it’s sort of all over. A tool called pdf2htmlex is pretty good but it makes the HTML look exactly like the PDF.