The value of pretty-printing and syntax highlighting


From:     Patrick TJ McPhee
Date: 05 Jul 1997
Donald Knuth's literate programming systems format code fragments using bold face for keywords, italics for variables and a variety of mathematical symbols for language operators. This approach to formatting has been eschewed in many subsequent systems, on the grounds that it makes it more difficult to support a variety of programming languages within a single literate framework. I have been wondering whether you lose anything when you abandon this sort of syntax-highlighting.

I must admit that I prefer to see printed code with all the bolds and italics, and using normal text typefaces rather than typewriter-style typefaces. Part of the reason for this may be that I do a substantial amount of my work in C. Almost all of the keywords in C have to do with either variable declaration or flow control, so bolded keywords tend to be visual anchors to the tops of loops or the start of some other logical section of code.

In a language like COBOL, which uses keywords for everything, it seems like the the page would be speckled with bold-faced words, and perhaps they would fail to aid in comprehension. ML, which I think falls somewhere between C and COBOL in terms of having keywords all over the place, is pretty-printed in Paulson's "ML for the Working Programmer", and the decision taken there was to put keywords and symbols in a normal-weight typewriter font, and everything else in a fixed-pitch italic font. The result is highly-readable, but I would say the main aid to comprehension here is good indentation.

There is a long tradition of pretty-printing using indentation only, especially in Lisp. It seems like most of the published programs I have seen don't go beyond this. I think that's too bad, because it ultimately makes them harder to read than they would be if the publisher had made some judicious use of typographic niceties. Lisp listings, for example, could usually be helped out by putting subscripts on all those parentheses, and making nested parentheses different sizes, and using different type weights or ink colors for each set of parentheses, in addition to careful indentation.

So, I think pretty-printing is generally good, although I acknowledge different languages have to be treated differently. I would go so far as to say that replacing certain ASCII symbols with proper math symbols (e.g., \times for *, and, in C, \equiv for ==) can act as an aid to comprehension, which ought to be the whole point. Does anybody have an alternate view? Is there any research on source-code comprehension?


From:     Norman Ramsey
Date: 07 Jul 1997
Patrick TJ McPhee writes: So, I think pretty-printing is generally good, although I acknowledge different languages have to be treated differently. I would go so far as to say that replacing certain ASCII symbols with proper math symbols (e.g., \times for *, and, in C, \equiv for ==) can act as an aid to comprehension, which ought to be the whole point. Does anybody have an alternate view?

I think prettyprinting has its place, and that place is usually between book covers. I have argued before that programs under development are frequently edited, and given the current state of editing tools for literate programming, it does not make sense for the output of the woven document to look wildly different from the input that is edited. When a document is going to be put between covers and read widely, then it makes sense to fine-tune its appearance, and it is OK if the code fills with unreadable TeX hieroglyphics in the process.

Is there any research on source-code comprehension?
@Book{Baecker90,
  author =       "Ronald M. Baecker and Aaron Marcus",
  title =        "Human Factors and Typography for More Readable Programs",
  pages =        "366",
  publisher =    "Addison-Wesley Publishing Co. (ACM Press)",
  address =      "Reading, MA",
  year =         "1990",
  price =        "25.25/27.95",
  ISBN =         "0-201-10745-7",
  note =         "ACM Order number 706890",
}


From:     Lee Wittenberg
Date: 08 Jul 1997
Patrick TJ McPhee writes: There is a long tradition of pretty-printing using indentation only, especially in Lisp. It seems like most of the published programs I have seen don't go beyond this. I think that's too bad, because it ultimately makes them harder to read than they would be if the publisher had made some judicious use of typographic niceties. Lisp listings, for example, could usually be helped out by putting subscripts on all those parentheses, and making nested parentheses different sizes, and using different type weights or ink colors for each set of parentheses, in addition to careful indentation.

Most people (including Lispers) forget that when Lisp was first introduced, it followed the Algol-60 example of providing both a "hardware language" and a "publication language". (Incidentally, Algol-60's publication language is the basis for most typesetting prettyprinters. The the brief paragraphs concerning this in the Algol Revised Report are well worth reading for anyone interested in typeset prettyprinting.) Lisp's publication language was known as "M-expressions", to distinguish them from S-expressions. Programs written as M-expressions are much easier for humans to understand (not surprisingly, since that's what it's designed for).

So, I think pretty-printing is generally good, although I acknowledge different languages have to be treated differently. I would go so far as to say that replacing certain ASCII symbols with proper math symbols (e.g., \times for *, and, in C, \equiv for ==) can act as an aid to comprehension, which ought to be the whole point.

I have been working on a program recently to teach myself Java. Naturally, I am using literate programming (noweb), and just as naturally (because I wrote it), I am using the Pretzel prettyprinter for Java. I found it much more pleasant to work with the prettyprinted listings than with untypeset code (my experience is in direct contrast to Norman Ramsey's, but this is more personal preference than anything else. I agree with Norman about the best place for prettyprinting being between the covers of books). I have also found a number of bugs when the prettyprinter got confused, and the indentation (or whatever) wasn't what I expected (because what I wrote wasn't what I thought I was writing). I firmly believe that web source should be set up in such a way that it can be prettyprinted or not, depending on the weaver's preference. I am trying to set up my Java webs that way, but Pretzel doesn't currently typeset [[...]] stuff, so it isn't completely possible.


From:     Dan Schmidt
Date: 10 Jul 1997
Lee Wittenberg writes: I firmly believe that web source should be set up in such a way that it can be prettyprinted or not, depending on the weaver's preference. I am trying to set up my Java webs that way, but Pretzel doesn't currently typeset [[...]] stuff, so it isn't completely possible.

You might want to try my dpp prettyprinter; it is written for C/C++. but off the top of my head, it should work for Java, you'd just have to change the @keywords list (which doesn't require any knowledge of Perl). dpp does handle [[...]] constructs correctly. The big noticeable difference between dpp and something like Pretzel is that it doesn't change line-breaking or indentation. This makes it possible to catch everything with an ad hoc technique rather than having to construct a grammar. If the change is as trivial as I claim, I will put Java support in myself for the next release.