Make Sure You’re Technical Documentation’s Always Professional With Proselint by Matthew Setter

Make Sure You’re Technical Documentation’s Always Professional With Proselint

February 20th, 2019

Do you want a no-fuss command-line tool to lint the quality of the prose in your technical documentation project, or in your technical writing projects? Do you want something that has an uncomplicated interface, yet provides rich feedback? Then come learn about proselint.

So far, in this series, we've covered three excellent tools (MarkdownLint, Broken Link Checker, and write-good). Each of them help improve technical documentation quality whether by giving us suggestions for improving what — or how — we've written, or whether the file doesn't lint correctly for the chosen format (okay, Markdown).

In this, the fourth part in the series, I'm going to cover another prose quality tool. No, I don't think you're a poor writer. Writing tight, effective, and meaningful technical documentation is hard work! So, good tools are essential to ensure that we do our best.

Don't believe me? Here's what Dan Allen, lead developer of the Antora and AsciiDoc projects says:

Writing good, effective prose is equally as involved as writing good code.

So, please indulge me while I beat the drum for yet more tools that help us both write high quality technical documentation and teach us a bit more about the craft, each time we sit down at the keyboard.

The tool that we'll cover here in part-4 four the series is proselint. Gladly, it sports an uncomplicated interface, so there isn't much to learn.

That said, it makes a pretty bold claim as to its abilities. Have a read of the claim below, and tell me if you think that it can back it up in the comments:

proselint places the world's greatest writers and editors by your side, where they whisper suggestions on how to improve your prose. You'll be guided by advice inspired by Bryan Garner, David Foster Wallace, Chuck Palahniuk, Steve Pinker, Mary Norris, Mark Twain, Elmore Leonard, George Orwell, Matthew Butterick, William Strunk, E.B. White, Philip Corbett, Ernest Gowers, and the editorial staff of the world's finest literary magazines and newspapers, among others. Our goal is to aggregate knowledge about best practices in writing and to make that knowledge immediately accessible to all authors in the form of a linter for prose.

I'll be honest; while I'm quite widely read, I've not read works from each one of these authors. And even if I had, it'd be too much work to check if the suggestions that proselint gives are truly indicative of how they'd write.

In addition, I found this interesting statement in the proselint docs:

We aim for a tool so precise that it becomes possible to unquestioningly adopt its recommendations and still come out ahead — with stronger, tighter prose. Better to be quiet and authoritative than loud and unreliable.

That last sentence bears focusing on for a moment: "Better to be quiet and authoritative than loud and unreliable". Consider for a moment how many tools that you have that overwhelm you with details, many of which are only a distraction, because they're incorrect.

I'd rather a tool that, if unsure says nothing, rather than tells me something, just on the off-chance it might be helpful I can live with that trade off. What about you?

Let's get into the app.

How to Install Proselint

Like any good command-line tool, proselint is trivial to install. Assuming that already you have Python installed, you install proselint using PIP (Python's package manager), as in the following example.

pip install proselint

If you're using a Linux distribution, you should be able to install it using your distribution's package manager. If you were using a Debian derivative, such as Ubuntu, you could use APT, as in the following example:

sudo add-apt-repository universe
sudo apt install python3-proselint

How Does Proselint Work?

Similar to write-good, proselint has a series of in-built checks that it runs on each file it scans. Here are a few of the checks that it runs:

  • Archaic writing styles
  • Cliches
  • Corporate speak
  • Dates & times
  • Jargon
  • Malapropisms
  • Profanity
  • Redundancy
  • Spelling errors
  • Writing consistency

Speaking personally, these cover a lot of the ways in which we weaken the prose that we write. If you feel that the checks don't go far enough however, or if you need something quite custom, feel free to contribute to the project, and add what you need. However, I’m not going to cover how to do that in this article.

How To Use Proselint

Now that it's installed and we understand a little bit about how it works, let's see how to use it. There aren't a lot of options that you can pass to proselint. However, given the richness of the advice that it provides, I'd say that that's acceptable.

At it's most basic, you only need to pass one or more paths to it. In the following example, I've passed a directory and a relative path to a file. Using these, proselint will parse all the identified Markdown files under those paths and print out a report.

Just to note, to the best of my knowledge, proselint only checks Markdown files. I tried it unsuccessfully on a number of other file formats.

proselint ./docs README.md

Here's an example of the output that the report generated:

docs/checking-broken-links.md:69:2: typography.symbols.ellipsis '...' is an approximation, use the ellipsis symbol '…'. Found once elsewhere.
docs/checking-broken-links.md:75:2: typography.symbols.ellipsis '...' is an approximation, use the ellipsis symbol '…'.
docs/checking-broken-links.md:98:12: typography.symbols.copyright (c) is a goofy alphabetic approximation, use the symbol ©.
docs/checking-broken-links.md:118:2: typography.symbols.ellipsis '...' is an approximation, use the ellipsis symbol '…'.
docs/best-practices.md:38:101: misc.but No paragraph should start with a 'But'.
docs/best-practices.md:44:1: cliches.write_good 'In a nutshell,' is a cliché.
README.md:21:51: garner.redundancy.ras RAS syndrome. Use 'PDF' instead of 'PDF format,'.

Other Output Formats

As you can see, the output is quite verbose, yet very thorough. If you look closely, you'll see that each line follows the following format:

<filename>:<line number>:<column number>: <check_name> <message>

As a result, it is, relatively, trivial to quickly find and fix errors — assuming that you believe the identified issues are worth changing.

While thoroughness is often a virtue, imagine what your console would look like if you were scanning a significant number of files and trying to work with output formatted in this manner. It will make the output a little hard to decipher — even if you're not using a large, widescreen monitor as I am.

Gladly, there are two options for making proselint's output more meaningful and parseable; these are: --compact and --json. --compact, as the name implies, considerably shortens the command's output. Running the command above with the --compact flag, would result in the following output:

-:69:2: typography.symbols.ellipsis '...' is an approximation, use the ellipsis symbol '…'. Found once elsewhere.
-:75:2: typography.symbols.ellipsis '...' is an approximation, use the ellipsis symbol '…'.
-:98:12: typography.symbols.copyright (c) is a goofy alphabetic approximation, use the symbol ©.
-:118:2: typography.symbols.ellipsis '...' is an approximation, use the ellipsis symbol '…'.
-:38:101: misc.but No paragraph should start with a 'But'.
-:44:1: cliches.write_good 'In a nutshell,' is a cliché.
-:21:51: garner.redundancy.ras RAS syndrome. Use 'PDF' instead of 'PDF format,'.

Note that, in this form, the file name's been removed, but the remaining information's still available. Honestly, while it's a little more compact, it's now hard — if not impossible — to know which file the recommendation relates to. So keep that in mind.

Now for the --json option. If we run the earlier command with it, the output will look as follows:

{"data": {"errors": []}, "status": "success"}
{"data": {"errors": []}, "status": "success"}
{"data": {"errors": []}, "status": "success"}
{"data": {"errors": [{"check": "typography.symbols.ellipsis", "column": 2, "end": 2737, "extent": 2, "line": 69, "message": "'...' is an approximation, use the ellipsis symbol '\u2026'. Found once elsewhere.", "replacements": null, "severity": "warning", "start": 2735}, {"check": "typography.symbols.ellipsis", "column": 2, "end": 3076, "extent": 2, "line": 75, "message": "'...' is an approximation, use the ellipsis symbol '\u2026'.", "replacements": null, "severity": "warning", "start": 3074}, {"check": "typography.symbols.copyright", "column": 12, "end": 3709, "extent": 2, "line": 98, "message": "(c) is a goofy alphabetic approximation, use the symbol \u00a9.", "replacements": null, "severity": "warning", "start": 3707}, {"check": "typography.symbols.ellipsis", "column": 2, "end": 4208, "extent": 2, "line": 118, "message": "'...' is an approximation, use the ellipsis symbol '\u2026'.", "replacements": null, "severity": "warning", "start": 4206}]}, "status": "success"}
{"data": {"errors": [{"check": "garner.redundancy.ras", "column": 51, "end": 1037, "extent": 11, "line": 21, "message": "RAS syndrome. Use 'PDF' instead of 'PDF format,'.", "replacements": "PDF", "severity": "warning", "start": 1026}]}, "status": "success"}

That's not too helful, if we're attempting to parse it visually. To make it more readable, pipe the output to jq, which, by default, will pretty print and colourise it, as in the following example:

{
  "data": {
    "errors": [
      {
        "check": "weasel_words.very",
        "column": 12,
        "end": 930,
        "extent": 5,
        "line": 14,
        "message": "Substitute 'damn' every time you're inclined to write 'very'; your editor will delete it and the writing will be just as it should be.",
        "replacements": null,
        "severity": "warning",
        "start": 925
      }
    ]
  },
  "status": "success"
}

In the example above, you can see that it lists:

  • The check that was run
  • The colume and line where the error was found
  • A message about the error that helps you to improve
  • A suggested replacement, if one could be determined
  • A severity rating that helps you to prioritise whether to fix it or not, and to know just how bad it might be.

As with --compact, the information's more readable. However, the file name is not available. As a result, you need to scan individual files, rather than a collection of files, to know the file to edit.

That said, the JSON output is very rich and detailed.

Tooling Support

Now that we've explored how to run proselint from the command-line, and a feel for the options that it supports, let's have a look at the list of editors that it integrates with. It currently supports the following, popular, tools:

  • Atom Editor
  • Danger
  • Emacs via Flycheck.
  • IntelliJ from JetBrains
  • Phabricator's arc CLI
  • Sublime Text
  • Vim via ALE or Syntastic
  • Visual Studio Code

Overall

On the whole, I've come to really enjoy using proselint when working with the ownCloud documentation, along with several other projects. It does what it says on the proverbial tin. It works well; a little slow at first to be fair, but after the first run, the cache takes over and significantly improves scan performance.

It's easy to install, and the interface is uncomplicated. It'd be great to include the scanned file names in the compact and JSON outputs. But otherwise, I'm very happy with it.

What Do You Think?

Do you feel the same way? Is proselint too simple for you? Are you already using it? Please share your thoughts in the comments. I'd love to know what you think.

 Other Parts In This Series

CC Image Courtesy of Valente on Flickr.


Matthew Setter. Ethical Hacker, Online Privacy Advocate, and a Software Engineer.

Matthew Setter

Software Engineer, Ethical Hacker, & Online Privacy Advocate.

Matthew Setter is a software engineer, ethical hacker, privacy advocate, & technical writer, who loves travelling. He is based in Nuremberg, Germany. When he's not doing all things tech, he's spending time with his family, and friends.