Diffing Collaborative Text
A common advice I’ve read in the past suggests, that we should write text documents (like LaTeX documents or Markdown documentation) in a way, so that each sentence is put on its own line. This gives us an easier time to create diffs of documents which are created in a collaborative fashion.
Emacs offers a function fill-paragraph
, which breaks a given line before a
specific column width. This might look like this:
A [common
advice](https://github.com/Wookai/paper-tips-and-tricks#one-sentence-per-line)
I've read in the past suggests, that we should write text documents (like LaTeX
documents or Markdown documentation) in a way, so that each sentence is put on
its own line. This gives us an easier time to create diffs of documents which
are created in a collaborative fashion.
fill-paragraph
looks decent, but it can create massive diffs which are hard to
reason about. There are several questions on StackOverflow (such as this
one where people ask for a customized version of
fill-paragraph
which could format a piece of text, so that each sentence is
put onto a new line. I’ve recently started to read the Emacs Lisp introduction
tutorial (which ships with Emacs itself), so I’ve tried to come up with my own
solution to this problem. I took some inspiration from the above StackOverflow
post. Here’s the code:
(defun fw/unfill-paragraph ()
"Unfill the paragraph at point."
(interactive)
(let ((fill-column (point-max)))
(fill-paragraph)))
(defun fw/wrap-at-sentences ()
"Fills the current paragraph, but starts each sentence on a new line."
(interactive)
(save-excursion
(fw/unfill-paragraph)
(mark-paragraph)
(while (< (point) (region-end))
(forward-sentence)
;; We don't want the add a new line at the end of the paragraph
(if (< (+ (point) 1) (region-end))
(newline-and-indent))))
;; The selection will not be cleared if there is only one sentence in a paragraph
(deactivate-mark))
The above code works for the most part, but there are still two edge cases which annoy me:
Emacs might treat phrases such as “e.g.” or “i.e.” as the end of a sentence,
which means that a single sentence might end up on more than one line. This
behavior can change depending on the configuration of your
sentence-end-double-space
variable, but we can still create examples in which
forward-sentence
does not behave as it should. Here’s an example:
This sentence, which contains the phrase
e.g.
and because of how the new lines are put,
Emacs intereprets it as two sentences.
The markdown-mode
package has ambiguous behavior regarding lists. Depending on
what operations you perform, a list might be formatted in two different ways.
This can be visualized using an example:
Here’s the initial text on one line:
- Some example sentence which contains no content. Another pointless example sentence. A third sentence.
If we put a new line right after every sentence, the text will end up like this:
- Some example sentence which contains no content.
Another pointless example sentence.
A third sentence.
If we instead indent the second sentence before we put a new line on every following sentence, each following sentence will have the correct indentation:
- Some example sentence which contains no content.
Another pointless example sentence.
A third sentence.
Both versions seem to be valid Markdown (well, at least for every interpreter that I’ve tried), but I’d still prefer the second version.