Diffing Collaborative Text

A common advice I've read in the past suggests, that we should write text documents (like LaTeX documents or Markdown documentation) in a way, so that each sentence is put on its own line. This gives us an easier time to create diffs of documents which are created in a collaborative fashion.

Emacs offers a function fill-paragraph, which breaks a given line before a specific column width. This might look like this:

A [common
advice](https://github.com/Wookai/paper-tips-and-tricks#one-sentence-per-line)
I've read in the past suggests, that we should write text documents (like LaTeX
documents or Markdown documentation) in a way, so that each sentence is put on
its own line. This gives us an easier time to create diffs of documents which
are created in a collaborative fashion.

fill-paragraph looks decent, but it can create massive diffs which are hard to reason about. There are several questions on StackOverflow (such as this one where people ask for a customized version of fill-paragraph which could format a piece of text, so that each sentence is put onto a new line. I've recently started to read the Emacs Lisp introduction tutorial (which ships with Emacs itself), so I've tried to come up with my own solution to this problem. I took some inspiration from the above StackOverflow post. Here's the code:

(defun fw/unfill-paragraph ()
  "Unfill the paragraph at point."
  (interactive)
  (let ((fill-column (point-max)))
    (fill-paragraph)))

(defun fw/wrap-at-sentences ()
  "Fills the current paragraph, but starts each sentence on a new line."
  (interactive)
  (save-excursion
    (fw/unfill-paragraph)
    (mark-paragraph)
    (while (< (point) (region-end))
      (forward-sentence)
      ;; We don't want the add a new line at the end of the paragraph
      (if (< (+ (point) 1) (region-end))
          (newline-and-indent))))
  ;; The selection will not be cleared if there is only one sentence in a paragraph
  (deactivate-mark))

The above code works for the most part, but there are still two edge cases which annoy me:

Emacs might treat phrases such as "e.g." or "i.e." as the end of a sentence, which means that a single sentence might end up on more than one line. This behavior can change depending on the configuration of your sentence-end-double-space variable, but we can still create examples in which forward-sentence does not behave as it should. Here's an example:

This sentence, which contains the phrase
e.g.
and because of how the new lines are put,
Emacs interprets it as two sentences.

The markdown-mode package has ambiguous behavior regarding lists. Depending on what operations you perform, a list might be formatted in two different ways. This can be visualized using an example:

Here's the initial text on one line:

- Some example sentence which contains no content. Another pointless example sentence. A third sentence.

If we put a new line right after every sentence, the text will end up like this:

- Some example sentence which contains no content.
Another pointless example sentence.
A third sentence.

If we instead indent the second sentence before we put a new line on every following sentence, each following sentence will have the correct indentation:

- Some example sentence which contains no content.
  Another pointless example sentence.
  A third sentence.

Both versions seem to be valid Markdown (well, at least for every interpreter that I've tried), but I'd still prefer the second version.

Published: 2019-07-23