Announcements for noweb

From:	Norman Ramsey
Date:	18 Mar 1993

Although noweb has been available outside Princeton for more than two years, I am now officially announcing its release. What's "official" about this release is that I now have time to support noweb and I promise to fix bugs. noweb is available via anonymous ftp from bellcore.com in file ~ftp/pub/norman/noweb.shar.Z or from csservices.princeton.edu in file ~ftp/pub/noweb.shar.Z. DOS code with binaries is available in the same locations as dosnoweb.zip. I do not support the DOS code and it may or may not be the same as the supported source code. The rest of this announcement repeats information in the noweb README file; it contains a short sales pitch and a description of what you get and on what terms. I am posting a longer article that contains a more in-depth description and sales pitch.

INTRODUCTION -- WHAT IS noweb?

noweb is designed to meet the needs of literate programmers while remaining as simple as possible. Its primary advantages are simplicity, extensibility, and language-independence. noweb uses 4 control sequences to WEB's 27, and its manual is only two pages. noweb works "out of the box" with any programming language, and its formatter-dependent part is under 50 lines. The primary sacrifice relative to WEB is the loss of the language-dependent features: prettyprinting and an index of identifiers. noweb provides extensibility by using the Unix toolkit philosophy. The "notangle" and "noweave" commands are built from pieces, which are then assembled in pipelines using shell scripts. The pieces are:

  markup     convert noweb file from human syntax to tool syntax
  unmarkup   inverse of markup
  nt         `tangle' the tool form of the noweb file
  noxref     insert cross-reference information for latex

These pieces are combined by the scripts in the shell directory to provide more than just weaving and tangling:

  notangle   analog of TANGLE
  noweave    analog of WEAVE
  nountangle tangle, but keep interleaved documentation in comments
  noroots    print names of all root chunks in a noweb file
  nocount    count number of lines of code and documentation.

noweb has been used for three years both at Princeton and elsewhere. It has been used for tens of thousands of lines of code in such languages as awk, C, C++, Icon, Modula-3, Promela, and Standard ML. If you already know you want to use noweb, you need only install it and read the manual page. If you're just curious about noweb, a sales pitch appears in the technical report in doc/ieee.tex.

WHAT YOU GET IN THIS DISTRIBUTION

This distribution contains the following directories:
contrib software contributed by noweb users
doc man pages and a technical report
examples parts of noweb programs in different languages
icon Icon code for nonstandard weave and cross-referencer
lib noweave's cross-referencer
shell all the shell scripts that make up the actual commands
src source code for nt and markup
tex supporting tex code for /usr/local/lib/tex/macros
where appropriate, these directories have README files of their own.

WEAVING

The worst aspect of literate programming is the enormous amount of time wasted wrangling over what prettyprinted output should look like. Although noweb does no prettyprinting, it is not entirely immune--- several people have complained about noweave's output or have sent me changes that add more options to noweave. Having been down that road with Spider, I won't be fooled again. noweb doesn't try to be all things to all programmers, but it is very easy to change. If you don't like noweave's formatting, you can easily throw away noweave and make your own. To help you get started, the shell directory in the distribution contains three versions of noweave: noweave the standard (supposed to be latex-proof) noweave.nr what I use (handles 90 columns of code) noweave.simple simple, uses no special TeX hacking The simple version can't handle code with @ signs. The article in doc/ieee.tex explains the intermediate language that noweb uses to represent literate programs. noweb comes with two cross-referencers for use with noweave. The standard one is written in awk, because that's what everybody has. There is also a somewhat better cross-referencer written in Icon. Neither cross-referencer has been thoroughly exercised. See the INSTALL file for more details. noweb is designed to be extended with a language-dependent prettyprinter and indexer. I haven't written one because my experience with Spider taught me that prettyprinting is far more trouble than it's worth. If someone else wants to write one, I will be happy to help and advise.

NOTES

doc/ieee.* contains a paper that has been submitted to IEEE Software. You must `make install' before attempting to format the paper, since it uses the noweb document style option. The paper documents the representation of noweb files that is used by the noweb tools, in case you want to write any tools of your own. Simple tools (e.g. count the number of lines of interleaved documentation) are trivial. If you write any tools, or you want tools written (e.g. prettyprinters, index generators), let me know. The icon directory contains Icon programs that do most of the job of noweave.sh and noxref. If you want to adapt noweb to work with a text processor other than TeX or latex, they might provide a better starting point. I confess that the whole system should have been written in Icon from the beginning, but I am not going to do it over. Icon is available by anonymous FTP from cs.arizona.edu. Thanks to Dave Hanson for cpif. Thanks to Joseph Reynolds for prodding me to fix [[...]]. Thanks to Lee Wittenberg for the DOS binaries. Thanks to Gary Leavens and Lee Wittenberg for testing this version, especially the installation process. I am, as always, responsible for errors and awkwardness that remain. Send comments or questions to norman@bellcore.com. I enjoy hearing from noweb users; if you have enjoyed noweb, why not send me a local postcard for my collection? My address is:

Norman Ramsey
Bellcore
445 South Street
Morristown, New Jersey 07960
USA

COPYRIGHT

From:	Norman Ramsey
Date:	18 Mar 1993

This message contains an ASCII version of the technical report that describes noweb and why you might care to use it. The article may one day be published in IEEE Software (it's been under review for more than a year), but don't hold your breath. If you get noweb from csservices.princeton.edu, you get the TeX source for this article, and you can make a version you might actually be able to read.

Literate-Programming Tools Need Not Be Complex
Norman Ramsey
Department of Computer Science, Princeton University
35 Olden Street, Princeton, New Jersey 08544
August 1992

Abstract

When it was introduced, literate programming meant WEB. Desire to use WEB with languages other than Pascal led to the implementation of many versions. WEB is complex, and the difficulty of using WEB creates an artificial barrier to experimentation with literate programming. noweb provides much of the functionality of WEB, with a fraction of the complexity. noweb is independent of the target programming language, and its formatter-dependent part is less than 40 lines. noweb is extensible, because it uses two representations of programs: one easily edited by authors and one easily manipulated by tools. This paper explains how to use the noweb tools and gives examples of their use. It sketches the implementation of the tools and describes how new tools are added to the set. Because WEB and noweb overlap, but each does some things that the other cannot, this paper enumerates the differences.

Key words: literate programming, readability, programming environments

Introduction

When literate programming was introduced, it was synonymous with WEB, a tool for writing literate Pascal programs [6, Chapter 4]. The idea attracted attention; several examples of literate programs were published, and a special forum was created to discuss literate programming [1, 2, 6, 13]. WEB was adapted to programming languages other than Pascal [3, 7, 8, 10, 12]. With experience, many WEB users became dissatisfied [9]. Some found WEB not worth the trouble, as did one author of the program appearing in Appendix C of Reference 11. Others built their own systems for literate programming. The literate-programming forum was dropped, on the grounds that literate programming had become the province of those who could build their own tools [14].

WEB programmers interleave source code and descriptive text in a single document. When using WEB, a programmer divides the source code into modules. Each module has a documentation part and a code part, and modules may be written in any order. The programmer is encouraged to choose an order that helps explain the program. The code parts are like macro definitions; they have names, and they contain both code and references to other modules. A WEB file represents a single program; TANGLE extracts that program from the WEB source. One special module has a code part with no name, and TANGLE expands the code part of that module to extract the program. WEAVE converts WEB source to TeX input, from which TeX can produce high-quality typeset documentation of the program.

WEB is a complex tool. In addition to enabling programmers to present pieces of a program in any order, it expands three kinds of macros, prettyprints code, evaluates some constant expressions, provides an integer representation for string literals, and implements a simple form of version control. The manual for the original version documents 27 "control sequences" [5]. The versions for languages other than Pascal offer slightly different functions and different sets of control sequences. Significant effort is required to make WEB usable with a new programming language, even when using a tool designed for that purpose [8].

WEB's shortcomings make it difficult to explore the idea of literate programming; too much effort is required to master the tool. I designed a new tool that is both simple and independent of the target programming language. noweb is designed around one idea: writing named chunks of code in any order, with interleaved documentation. Like WEB, and like all literate-programming tools, it can be used to write a program in pieces and to present those pieces in an order that helps explain the program. noweb's value lies in its simplicity, which shows that the idea of literate programming does not require the complexity of WEB.

noweb

A noweb file contains program source code interleaved with documentation. When notangle is given a noweb file, it writes the program on standard output. When noweave is given a noweb file, it reads the noweb source and produces, on standard output, TeX source for typeset documentation. Figure 1 shows how to use notangle and noweave to produce code and documentation for a C program contained in the noweb file foo.nw.

notangle foo.nw > foo.c
noweave foo.nw > foo.tex

Figure 1: Using noweb to build code and documentation

A noweb file is a sequence of chunks, which may appear in any order. A chunk may contain code or documentation. Documentation chunks begin with a line that starts with an at sign (@) followed by a space or newline. They have no names. Code chunks begin with <<chunk name>>= on a line by itself. The double left angle bracket (<<) must be in the first column. Chunks are terminated by the beginning of another chunk, or by end of file. If the first line in the file does not mark the beginning of a chunk, it is assumed to be the first line of a documentation chunk. Documentation chunks contain text that is ignored by notangle and copied verbatim to standard output by noweave (except for quoted code). noweave can work with LaTeX, or it can use a TeX macro package, supplied with noweb, that defines commands like "chapter" and "section".

Code chunks contain program source code and references to other code chunks. Several code chunks may have the same name; notangle concatenates their definitions to produce a single chunk, just as TANGLE does. Code chunk definitions are like macro definitions; notangle extracts a program by expanding one chunk (by default the chunk named <<*>>). The definition of that chunk contains references to other chunks, which are themselves expanded, and so on. notangle's output is readable; it preserves the indentation of expanded chunks with respect to the chunks in which they appear. Code may be quoted within documentation chunks by placing double square brackets around it ([[...]]). These double square brackets are ignored by notangle, but they are used by noweave to give code special typographic treatment. If double left and right angle brackets are not paired, they are treated as literal "<<" and ">>". Users can force any such brackets, even paired brackets, to be treated as literal by preceding the brackets by an at sign (e.g. "@<<").

@ This program has no input, because we want to keep it
simple. The result of the program will be to produce a
list of the first thousand prime numbers, and this list
will appear on the [[output]] file.

Since there is no input, we declare the value [[m = 1000]]
as a compile-time constant. The program itself is capable
of generating the first [[m]] prime numbers for any
positive [[m]], as long as the computer's finite
limitations are not exceeded.

<<program to print the first thousand prime numbers>>=
program print_primes(output);
 const m = 1000;
      <<other constants of the program>>
 var <<variables of the program>>
   begin <<print the first [[m]] prime numbers>>
   end.

Figure 2: Sample noweb input, from prime number program

Figure 2 shows a fragment of a noweb program that computes prime numbers. The program is derived from the example used in Reference 6, Chapter 4, and Figure 2 should be compared with Figure 2b of that paper. Figure 3 shows the program after processing by noweave and LaTeX. Figure 4 shows the beginning of the program as extracted by notangle. A complete example program accompanies this paper.

This program has no input, because we want to keep it simple.
The result of the program will be to produce a list of the first
thousand prime numbers, and this list will appear on the output
file.

Since there is no input, we declare the value m = 1000 as a
compile-time constant. The program itself is capable of gen-
erating the first m prime numbers for any positive m, as long as
the computer's finite limitations are not exceeded.

<program to print the first thousand prime numbers>
program print_primes(output);
const m = 1000;
<other constants of the program>
var <variables of the program>
begin <print the first m prime numbers>
end.

Figure 3: Output produced by noweave and LaTEX from Figure 2

program print_primes(output);
const m = 1000;
rr = 50;
cc = 4;
ww = 10;
ord_max = 30; - p_ord_max squared must exceed p_m "
var p: array [1..m] of integer;
- the first m prime numbers, in increasing order "
page_number: integer;
..
.

Figure 4: Part of primes program as written by notangle

Using noweb

Experimenting with noweb is easy. noweb has little syntax: definition and use of code chunks, marking of documentation chunks, quoting of code, and quoting of brackets. noweb can be used with any programming language, and its manual fits on two pages. On a large project, it is essential that compilers and other tools be able to refer to locations in the noweb source, even though they work with notangle's output [9]. Giving notangle the -L option makes it emit pragmas that inform compilers of the placement of lines in the noweb source. It also preserves the columns in which tokens appear. If notangle is not given the -L option, it respects the indentation of its input, making its output easy to read. Large programs may also benefit from cross-reference information. If given the -x option, noweave uses LaTeX to show on what pages each chunk is defined and used.

WEB files map one to one with to both programs and documents. The mapping of noweb files to programs is many to many; the mapping of files to documents is many to one. Source files are combined by listing their names on notangle's or noweave's command line. Many programs may be extracted from one source by specifying the names of different root chunks, using notangle's -R command-line option. The simplest example of a one-to-many mapping of programs is that of putting C header and program in a single noweb file. The header comes from the root chunk <header>, and the program from the default root chunk, <*>. The following rules for make automate the process:

foo.c: foo.nw
notangle -L foo.nw > foo.c
foo.h: foo.nw
notangle -R header foo.nw > xfoo.h
-cmp -s xfoo.h foo.h __ cp xfoo.h foo.h

Using cmp avoids touching the header file when its contents haven't
changed. This trick is explained on pages 265-266 of Reference 4.

A more interesting example is using noweb to interleave different languages in one source file. I wrote an awk script that read a machine description and emitted a disassembler for that machine, and I used noweb to combine the script and description in a single file, so I could place each part of the input next to the code that processed that input. The machine description was in the root chunk <opcodes table>, and the awk script in the default root chunk. The processing steps were:

notangle opcodes.nw > opcodes.awk
notangle -R 'opcode table' opcodes.nw
awk -f opcodes.awk > disassem.sml

Many-to-one mapping of source to program can be used to obtain effects similar to those of Ada or Modula-3 generics. Figure 5 shows generic C code that supports lists. The code can be "instantiated" by combining it with another noweb file. pair_list.nw, shown in Figure 6, specifies lists of integer pairs. The two are combined by applying notangle to them both:

notangle pair_list.nw generic_list.nw > pair_list.c

noweb has no parameter mechanism, so the "generic" code must refer to a fixed set of symbols, and it cannot be checked for errors except by compiling pair_list.c. These restrictions make noweb a poor approximation to real generics, but useful nevertheless.

I have used noweb for small programs written in various languages, including C, Icon, awk, and Modula-3. Larger projects have included a code generator for Standard ML of New Jersey (written in Standard ML) and a multi-architecture debugger, written in Modula-3, C, and assembly language. A colleague used noweb to write an experimental file system in C++.

This list code supports circularly-linked lists represented by a pointer to
the last element. It is intended to be combined with other noweb code that
defines <fields of a list element> (the fields found in an element of a list) and
that uses <list declarations> and <list definitions>.

<list declarations>
 typedef struct list -
   <fields of a list element>
   struct list *_link;
 " *List;

 extern List singleton(void);   /* singleton list, uninitialized fields */
 extern List append(List, List); /* destructively append two lists */
 #define last(l)   (l)
 #define head(l)   ((l) ? (l)->next : 0)
 #define forlist(p,l) for (p=head(l); p; p=(p==last(l) ? 0 : p->next))

<list definitions>
 List append (List left, List right) -
    List temp;
    if (left == 0)  return right;
    if (right == 0) return left;
    temp = left->_link; left->_link = right->_link; right->_link = temp;
    return right;
 "
   .

Figure 5: Generic code for implementing lists in C

<*>
 <list declarations>
 <list definitions>

<fields of a list element>
 int x;
 int y;

Figure 6: Program to instantiate lists of integer pairs

The sizes of these programs are

    Program       Documentation linesTotal lines

    markup and nt       400        1,200
    ML code generator   900        2,600
    Debugger           1,400       11,000
    File system        4,400       27,000

Representation of noweb files

The noweb syntax is easy to read, write, and edit, but it is not easily manipulated by programs. To make it easy to extend noweb, I have written markup, which converts noweb source to a representation that is easily manipulated by commonly used Unix tools like sed and awk. In this representation, every line begins with @ and a key word. The possibilities are:

@begin kind n Start a chunk
@end kind n End a chunk
@text string string appeared in a chunk
@nl A newline
@defn name The code chunk named name is being defined
@use name A reference to code chunk named name
@quote Start of quoted code in a documentation chunk
@endquote End of quoted code in a documentation chunk
@file filename Name of the file from which the chunks came
@literal text noweave copies text to output

markup numbers each chunk, starting at 0. It also recognizes and undoes the escape sequence for double brackets, e.g. converting "@<<" to "<<". markup's output represents a sequence of files. Each file is represented by a "@file filename" line, followed by a sequence of chunks. The representation of a documentation chunk where docline may be @text, @nl, @quote, or @endquote is:

@begin docs n where n is the chunk number.
docline repeated an arbitrary number of times.
@end docs n

Every @nlcorre-sponds to a newline in the original file. markup guarantees that quotes are balanced and not nested. The representation of a code chunk where codeline may be @text, @nl, or @use is:

    @begin code n where n is the chunk number.
    @defn name    name of this chunk.
    @nl           The newline following <<name>>= in the original file
    codeline      repeated an arbitrary number of times.
    @end code n

The noweb tools are implemented by piping the output of markup to other programs. notangle is a Unix shell script that builds a pipeline between markup and nt, which reads and expands definitions of code chunks. noweave pipes the output of markup to a 24-line awk script that inserts appropriate TeX or LaTeX formatting commands. Having a format easily read by programs makes noweb extensible; one can manipulate literate programs using Unix shell scripts and filters. To be able to share programs with colleagues who don't enjoy literate programming, I modified notangle by adding to its pipeline a stage that places each line of documentation in a comment and moves it to the succeeding code chunk. The resulting script, nountangle, transforms a literate pro- gram into a traditional commented program, without loss of information and with only a modest penalty in readability. Figure 7 shows the results of applying nountangle to the prime-number program shown in Figure 2. noweave's cross-reference generation is also implemented as an extension; the output of markup is piped through an awk script that uses @literal to insert LaTeX cross-reference commands. Another simple tool finds all the roots in a noweb file, making it easy to find definitions where chunk names have been misspelled.

Comparing WEB and noweb

Unlike WEB, noweb is independent of the target programming language. WEB tools can be generated for many programming languages, but those languages must be lexically similar to C. For example, WEB can't handle the awk regular-expression notation "/:::/"; every such expression must quoted using WEB's "verbatim" control sequence. The effort required to generate WEB tools is significant; the prospective user must write a specification of several hundred lines.

- This program has no input, because we want to keep it
- simple. The result of the program will be to produce a
- list of the first thousand prime numbers, and this list
- will appear on the [[output]] file.
  ..
  .

- <program to print the first thousand prime numbers>=
program print_primes(output);
  const m = 1000;
      - "section-The output phase-
      -
      - <other constants of the program>=
      rr = 50;
      cc = 4;
      ww = 10;
      - <other constants of the program>=
      ord_max = 30;  - p_ord_max squared must exceed p_m
  var - How should table [[p]] be represented? Two possibilities
     - suggest themselves: We could construct a sufficiently
  ..
  .

Figure 7: Output produced by nountangle from Figure 2

Being independent of the target programming language makes noweb simpler, but it also means that noweb can do less. Most of the differences between WEB and noweb arise because WEB has language-dependent features that are not present in noweb. These features include prettyprinting, type- setting comments using TeX, generating an index of identifiers, expanding macros, evaluating constant expressions, and converting string literals to indices into a "string pool." Among these features, noweb users are most likely to miss prettyprinting and the index of identifiers. Some differences arise because WEB and noweb implement similar features differently. WEB's original TANGLE removed white space and folded lines to fill each line with tokens, making its output unreadable [6, Chapter 4, Figure 3]. Later adaptations preserved line breaks but removed other white space. By default, notangle preserves whitespace and maintains indentation when expanding chunks. It can therefore be used with languages like Miranda and Haskell, in which indentation is significant. TANGLE cannot. WEB's WEAVE assigns a number to each chunk, and its cross-reference information refers to chunk numbers, not page numbers. noweb uses LaTeX to emit cross-reference information that refers to page numbers. Anyone who has read a large literate program will appreciate the difference.

WEB works poorly with LaTeX; LaTeX constructs cannot be used in WEB source, and getting WEAVE output to work in LaTeX documents requires tedious adjustments by hand. noweb works with both plain TeX and LaTeX. Both WEAVE and noweave depend on the text formatter in two ways: the source of the program itself, and the supporting macros. WEAVE's source (written using WEB for C) is several thousand lines long, and the formatting code is not isolated. noweave's source is a 57-line shell script, and only 31 of those lines have to do with formatting. Both WEAVE and noweave use about 200 lines of supporting macros for plain TeX. noweb uses another 80 lines to support LaTeX, most of which is used to eliminate duplicate page numbers in cross-reference lists.

noweb has two features that weren't in the original WEB, but that appeared in some of WEB's later adaptations. They are the ability to inform the compiler of the original locations of source lines and the ability to extract more than one program from a single source file. Reviewers have had many expectations of literate-programming tools [13, 14]. The most important is verisimilitude: a single input should produce both compilable program and publishable document, warranting the correctness of the document. Others include flexible order of elaboration, ability to develop program and documentation concurrently in one place, cross-references, and indexing. WEB satisfies all these expectations, and noweb satisfies all but one (it does not provide automatic indexing).

Discussion

WEB takes the monolithic approach to literate programming it does everything. noweb's approach is to compose simple tools that manipulate files in the noweb format. Existing Unix tools provide some of the WEB features that aren't found in noweb. Unix supplies two macro processors: the C preprocessor and the m4 macro processor. xstr extracts string literals. patch provides a form of version control similar to WEB's change files. Few of WEB's remaining features will be missed; for example, many compilers evaluate constant expressions at compile time. Experience with WEB has suggested that prettyprinting may be more trouble than it is worth, and that the index of identifiers, while useful, is not a necessity [9]. Three things distinguish noweb from previous work. noweb takes as simple as possible a view of literate programming and the tools needed to implement it. Instead of relying on a generator or re-implementation to support different programming languages, noweb is independent of the target programming language. noweave's dependence on its typesetter is small and isolated, instead of being distributed throughout a large implementation. Experimenting with noweb is easy because the tools are simple and they work with any language. If the experiment is unsatisfying, it is easy to abandon, because notangle's output, unlike TANGLE's, is readable. noweb is simpler than WEB and is easier to use and understand, but it does less. I argue, however, that the benefit of WEB's extra features is outweighed by cost of the extra complexity, making noweb better for writing literate programs. noweb can be obtained by anonymous FTP from csservices.princeton.edu, in file pub/noweb.shar.Z.

Acknowledgements

Mark Weiser's invaluable encouragement provided the impetus for me to write this paper, which I did while visiting the Computer Science Laboratory of the Xerox Palo Alto Research Center. Comments from David Hanson and from the anonymous referees stimulated me to improve the pa- per. The development of noweb was supported by a Fannie and John Hertz Foundation Fellowship.

References

[1] P. J. Denning. Announcing literate programming. Communications of the ACM, 30(7):593, July 1987.
[2] D. Gries and J. Bentley. Programming pearls: Abstract data types. Communications of the ACM, 30(4):284-290, April 1987.
[3] K. Guntermann and J. Schrod. WEB adapted to C. TUGboat, 7(3):134- 137, October 1986.
[4] B. W. Kernighan and R. Pike. The UNIX Programming Environment. Prentice-Hall, 1984.
[5] D. E. Knuth. The WEB system of structured documentation. Technical Report 980, Stanford Computer Science, Stanford, California, September 1983.
[6] D. E. Knuth. Literate Programming, volume 27 of Center for the Study of Language and Information Lecture Notes. Leland Stanford Junior University, Stanford, California, 1992.
[7] S. Levy. WEB adapted to C, another approach. TUGBoat, 8(1):12-13, 1987.
[8] N. Ramsey. Literate programming: Weaving a language-independent WEB. Communications of the ACM, 32(9):1051-1055, September 1989.
[9] N. Ramsey and C. Marceau. Literate programming on a team project. Software Practice & Experience, 21(7):677-683, July 1991.
[10] W. Sewell. How to MANGLE your software: the WEB system for Modula-2. TUGboat, 8(2):118-128, July 1987.
[11] W. Sewell. Weaving a Program: Literate Programming in WEB. Van Nostrand Reinhold, New York, 1989.
[12] H. Thimbleby. Experiences of `literate programming' using CWEB (a variant of Knuth's WEB). Computer Journal, 29(3):201-211, 1986.
[13] H. Thimbleby. A review of Donald C. Lindsay's text file difference utility, diff. Communications of the ACM, 32(6):752-755, June 1989.
[14] C. J. Van Wyk. Literate programming: An assessment. Communications of the ACM, 33(3):361-365, March 1990.

From:	Norman Ramsey
Date:	28 Oct 1993

(In case you didn't know, noweb is a language-independent literate-programming tool whose watchwords are simplicity and extensibility. If you're not already familiar with noweb, the full README file will follow in a separate message. It has complete information about noweb, including how to get sources by anonymous ftp. If you know something about noweb, read on.)

I am very pleased to be able to announce a major new release of noweb. The major change visible to users is support for local identifier cross-reference and an index of identifiers. Those of us who write Icon, TeX, or yacc code can enjoy the dubious benefits of automatic discovery of definitions and uses; others will have to fall back on a scheme by which definitions are marked manually and uses are discovered automatically.

There are serious changers under the hood which should be of profound interest to the small cadre of noweb hackers out there. A -filter option in the notangle and noweave scripts makes it easy to attach tools that manipulate noweb information in language-dependent or other customized ways. For example, the automatic indexing codes for Icon, TeX, and yacc are about 30 lines each. I hope that one of you will write a tool that automatically recognizes definitions of interesting identifiers in C or C++ programs. The same hooks have been used in two different contributed prettyprinters.

Finally, the LaTeX support is drastically revised, and you have far too many options and hooks to use in fiddling with the output. Many of these revisions are in response to complaints by users that they wanted things formatted differently.

From:	Norman Ramsey
Date:	28 Oct 1993

This is version 2.5 of ``noweb'', a low-tech literate programming tool. noweb is available via anonymous FTP from the Comprehensive TeX Archive Network, in directory web/noweb. CTAN includes hosts ftp.shsu.edu, ftp.tex.ac.uk, and ftp.uni-stuttgart.de. These sites mirror the master directory bellcore.com:pub/norman/noweb. You can also get the master shar file bellcore.com:pub/norman/noweb.shar.Z. The file INSTALL tells how to build noweb.

Changes to this version are so extensive that they are detailed in a separate CHANGES file. They include:

major enhancements of LaTeX support & incompatible changes to noweave
language-independent support for an index of identifiers and for local identifier cross-reference.
a `noweb' command that ``extracts everything.''
contributed prettyprinters for Icon, Object-Oriented Turing, and a variant of Dijkstra's language of guarded commands.
restructured shell scripts to make things easier for hackers (especially -filter).
bug fixes.

Introduction

noweb is designed to meet the needs of literate programmers while remaining as simple as possible. Its primary advantages are simplicity, extensibility, and language-independence. noweb uses 5 control sequences to WEB's 27. The simple noweb manual is only 2 pages; documenting the full power of noweave and notangle requires another 3 pages. noweb works ``out of the box'' with any programming language, and its formatter-dependent part is a 60-line nawk program. The primary sacrifice relative to WEB is that code is not prettyprinted.

noweb provides extensibility by using the Unix toolkit philosophy. The ``noweb,'' ``notangle,'' and ``noweave'' commands are built from pieces, which are then assembled in pipelines using shell scripts. The pieces include:

  markup    convert noweb file from human syntax to tool syntax
  unmarkup  inverse of markup
  totex     convert from tool syntax to TeX/LaTex markup
  nt        `tangle' the tool form of the noweb file
  mnt       discover roots, then act like nt
  noidx     insert indexing and cross-reference information for latex
  finduses  finds uses of identifiers

These pieces are combined by the scripts in the shell directory to provide more than just weaving and tangling:

  noweb      analog of nuweb
  notangle   analog of TANGLE
  noweave    analog of WEAVE
  nountangle tangle, but keep interleaved documentation in comments
  noroots    print names of all root chunks in a noweb file
  nocount    count number of lines of code and documentation.
  nodefs     extract defined identifiers for noweave -indexfrom
  noindex    build an external index for multi-file documents

noweb has been used for four years both at Princeton and elsewhere. It has been used for tens of thousands of lines of code in such languages as awk, C, C++, Icon, Modula-3, PAL, Perl, Promela, and Standard ML. If you already know you want to use noweb, you need only install it and read the manual page. If you're just curious about noweb, a sales pitch appears in the technical report in xdoc/ieee.tex. This paper describes version 2.3, so it's somewhat out of date.

What you get in this distribution

This distribution contains the following directories:

  contrib   software contributed by noweb users
  examples  parts of noweb programs in different languages
  icon      Icon code for nonstandard weave and cross-referencer
  lib       noweave's cross-referencer
  shell     the shell scripts that make up the actual commands
  src       source code for nt and markup
  tex       supporting tex code for /usr/local/lib/tex/macros
  xdoc      man pages and a technical report (named to be unpacked last)
where appropriate, these directories have README files of their own.
Distributions available by FTP also have DOS binaries, which are
always out of date:
  DOS       zip file containing old MS-DOS binaries

Weaving --- a tar pit

The worst aspect of literate programming is the enormous amount of time wasted wrangling over what ``woven'' output should look like. Although noweb does no prettyprinting, it is not entirely immune--- several people have complained about noweave's output or have sent me changes that add more options to noweave. I resisted for years, but with version 2.5 I finally succumbed. I let the number of options to noweave double, and I have provided for too many options and hooks for customizing the latex output. I won't let it happen again.

noweb doesn't try to be all things to all programmers, but it is very easy to change. If you don't like noweave's formatting, you can read tex/support.nw to learn how to customize it; look for the words ``style hook.'' (Reading noweb.sty directly is not recommended.) For simple formatting, it might be easier to throw away noweave and make your own. To help you get started, the shell directory contains noweave.simple, a simplified version of noweave that Dave Hanson created for use with C programs (it can't handle code with @ signs). The article in xdoc/ieee.tex explains the intermediate language that noweb uses to represent literate programs.

The intermediate language makes it possible to extend noweave with a language-dependent prettyprinter, as shown by contributions of an Icon prettyprinter by Kostas Oikonomou and a guarded-command prettyprinter by Conrado Martinez-Parra. (I haven't written a prettyprinter myself because my experience with Spider taught me that prettyprinting is far more trouble than it's worth.) Further contributions of prettyprinters are welcome.

noweb comes with two cross-referencers for use with noweave. The standard one is written in awk, because that's what everybody has. There is also one written in Icon, which is slightly better because it ignores case when alphabetizing chunk names. See the INSTALL file for more details.

Cross-referencing makes formatting even more of a tar pit; the cross-referencer itself takes about 300 lines, and extensive LaTeX support is also required. I haven't made the attempt to write cross-reference code for plain TeX. Anyone who has ideas for reducing the number of options or for other ways to restore sanity to the situation is urged to write to norman@bellcore.com.

Index and identifier cross-reference

To noweb, any string of nonwhite characters can be an identifier. A human being or a language-dependent tool must mark definitions of identifiers; noweb finds the uses using a language-independent algorithm. The algorithm relies on an idea taken from the lexical conventions of Standard ML. Characters are divided into three classes: alphanumerics, symbols, and delimiters. If an identifier begins with an alphanumeric, it must be delimited on the left by a symbol or a delimiter. If it begins with a symbol, it must be delimited on the left by an alphanumeric or a delimiter. If it begins with a delimiter, there are no restrictions on the character immediately to the left. Similar rules apply on the right-hand side. The default classifications are chosen to make sense for commonly used programming languages, so that noweb will not recognize `zip' when it sees `zippy', or `++' when it sees `++:='. This trick works surprisingly well, but it does not prevent noweb from spotting identifiers in comments or string literals.

The basic assumption in noweb is that a human being will identify definitions using the @ %def mumble foo quux construct. I have, however, found it very useful to write simple filters that attempt to identify global definitions automatically. Filters for Icon, TeX, and yacc all take about 30 lines of Icon code and are included in the noweb distribution. Contributions for other languages are encouraged. If you write a filter of your own, you can put it in the $LIB directory with a name like `autodefs.pascal'.

noweave -index works well for short programs, but nodefs, noindex, and noweave -indexfrom are there for large multi-file programs, See the noindex man page for details.

Notes

xdoc/ieee.* contains a paper that has been submitted to IEEE Software. You must `make install' before attempting to format the paper, since it uses the noweb document style option. This paper doesn't discuss features that are new in version 2.5.

The paper documents the representation of noweb files that is used by the noweb tools, in case you want to write any tools of your own. Simple tools (e.g. count the number of lines of interleaved documentation) are trivial. If you write any tools, or you want tools written (e.g. prettyprinters, index generators), let me know.

The icon directory contains Icon versions of many of the pipeline stages. If you want to adapt noweb to work with a text processor other than TeX or latex, they might provide a better starting point. I confess that the whole system should have been written in Icon from the beginning, but I am not going to do it over. Icon is available by anonymous ftp from cs.arizona.edu.

I have a standing offer open to troff users: I will adapt noweb to troff if you will tell me what the output should look like and you will try to use the results.

Thanks to Preston Briggs for the Aho-Corasick recognizer, and for helpful discussions. Thanks to Dave Hanson for cpif. Thanks to Dave Love for LaTeX wizardry. Thanks to Joseph Reynolds for prodding me to fix [[...]]. Thanks to Lee Wittenberg for the DOS binaries.

Send comments or questions to norman@bellcore.com. I enjoy hearing from noweb users; if you have enjoyed noweb, why not send me a local postcard for my collection?

Copyright

Noweb is copyright 1989-1993 by Norman Ramsey. All rights reserved. You may use and distribute noweb for any purpose, for free. You may modify noweb and create derived works, provided you retain the copyright notice, but the result may not be called noweb without my written consent. You may not sell noweb itself, but you may do anything you like with programs created with noweb. Noweb is not a Bellcore product. Bellcore makes no warranty and accepts no liability for any software in this distribution. If you find something useful, we're all surprised. Major thanks are due to Preston Briggs and Dave Love, without whom there wouldn't be a noweb 2.5. Thanks also to Lee Wittenberg for finding innumerable bugs and to George Greenwade for helping set up the distribution.

From:	Lee Wittenberg
Date:	28 Feb 1994

The (long awaited?) DOS port of noweb 2.5a is now officially available. The package dosnoweb.zip is available for anonymous FTP in the pub/leew directory of bart.kean.edu. With luck, the various noweb sites (bellcore.com, etc.) will pick up copies to reduce the strain on poor li'l bart, but for now, this is the only place it's available. I will be actively supporting this version (unlike my previous hack of an earlier noweb), so please feel free to report any bugs to me (not that anyone on this list has any problems complaining.

Unlike earlier DOS noweb's, this version is "load and go" -- it contains everything you need (except TeX) to get started using noweb under DOS, and has been (fairly) thoroughly tested. Executable versions of notangle and noweave, compiled versions of the Icon source, and an Icon interpreter are included (as well as executable versions of noindex and nodefs, provided by Brian Danilko). [Other tools will be provided when I get around to it, or when a significant number of users complain enough about the missing stuff, whichever comes first]. Printable copies of the documentation (IEEE.DVI and the .TXT manual pages) are included to help beginners.

From:	Norman Ramsey
Date:	19 Jun 1994

noweb version 2.6 is now available on CTAN in web/noweb. This version has an HTML back end so you can create hypertext literate programs to be browsed with Mosaic. There's also a -latex+html option which makes it easy to use latex2html to create hypertext versions of programs already marked up in latex style. There is a bug fix in this version, plus I have changed the support code to work around a bug in latex2e. I have now fixed all the problems I know of involving using noweb with latex2e's compatibility mode.

From:	Norman Ramsey
Date:	18 Aug 1994

Version 2.6b of noweb is now available from the CTAN sites in directory web/noweb. This version is primarily a bug-fix version, but it does include a rudimentary recognizer for definitions in C programs, so noweb can now provide reasonable automatic cross-reference for C. (Note that this feature, as with the other auto-recognizers, is available only in the Icon version.) I append the list of changes---if you don't care about any of these bugs, you needn't get the new version.

CHANGES FOR VERSION 2.6b Added -autodefs c Changed installation procedure so that source is no longer distributed with contributed software --- you now must install noweb, then build contributed software. Bug fixes:

Makefile didn't create .../man/man1
noweave botched -x option, emitted index info anyway
markup complained, incorrectly, about [[<<]] in documentation.
noidx died if it tripped over an identifier used only in quoted code
tohtml wasn't inserting doc anchor into above_defns, so some xrefs that should have read `above' were coming out `below'
when one identifier was a prefix of another, as in Class and Class::member, finduses duplicated the prefix.

From:	Norman Ramsey
Date:	31 Aug 1994

Noweb version 2.6c is now available from bellcore.com and should soon propagate to the CTAN sites. The purpose of this release is to make good on my promise to write the ``Noweb Hacker's Guide.'' The earlier technical report ``Literate-Programming Tools can be Simple and Extensible'' is now officially obsolete. The best source for an overview of noweb is now: ``Literate Programming Simplified,'' IEEE Software, September 1994, pp97-105. The proper place for technical details is in the manual pages and in the Noweb Hacker's guide, both of which are distributed with noweb.

From:	Norman Ramsey
Date:	06 Jan 1995

Phil Bewig writes: Are there any literate programming tools for troff users?

Thimbleby's cweb supported c and troff, but I don't know if it is still available. I have a standing offer to noweb users that I will make noweb work with troff if they will (a) tell me what the output should look like, and (b) use the results. I might need some help with troff, too, although we have some pretty good experts here.

From:	Norman Ramsey
Date:	27 Feb 1995

My brain slipped a gear and I announced a nonexistent release of noweb. The current version (1 week old) is version 2.7, not version 2.8. I have cancelled the original announcement; here's a replacement... I am pleased to announce the release of version 2.7 of noweb. I have bumped the version number because of the addition of two features:

Noweb now sports an efficient, one-pass LaTeX to HTML converter. The converter is distributed in two forms: `l2h' acts as a noweb filter, converting only documentation chunks, and `sl2h' can be used on complete latex documents. You can think of `l2h' as an `anti-latex2html'; it is fast, extensible, has no options, and uses no bitmaps.
Noweb can now deal with nuweb programs; from the nuweb source code I have hacked up a parser that reads nuweb format and emits noweb pipeline format. When combined with l2h, this parser makes it possible to create HTML documents from existing nuweb files that contain LaTex markup. You get to click on an identifier and jump to the chunk (scrap?) containing its definition. For example, on the noweb home page you can find a pointer to the file created by the following command: noweave -markup numarkup -filter l2h -html -index nuweb.w > nuweb.html

The other notable change in version 2.7 is that I have re-arranged the directory structure as follows:

  binaries    Pre-built distributions containing binaries and documentation
  contrib     software contributed by noweb users
  examples    sample noweb filters and programs in different languages
  src         Source code and documentation (including FAQ) for noweb

Unless you want everything, you will probably want to fetch src.tar.gz (and possibly contrib and examples) rather than the whole thing. So far I am distributing linux and (old) DOS binaries. The advantage of these distributions is you can get the benefits of the Icon versions of the noweb filters without having to install Icon. Finally, version 2.7 contains some bug fixes and a number of small improvements in the HTML support.

From:	Norman Ramsey
Date:	25 Jun 1995

Stephen writes: I would like to get noweb to automagically index my identifiers for nice (not so) simple ANSI C. It seems the method is through the use of autodefs.{lang} files but I have failed to find one for C.

If you use LIBSRC=icon during the installation, you get noweave -autodefs c automatically. There's no need to go grubbing around in the implementation. If you don't have Icon, the noweb home page points to it, including binaries for popular targets. On the other hand, if you don't have Icon, and if the thought of installing one more programming language makes you gag, you could grub around in the implementation and write the definitions stuff in awk or Perl. (Hint: it will be faster and easier to install Icon.) If I ever get funding for noweb 3, Icon will go away (at least as far as noweb is concerned). Of course, it will be replaced by something even more weird and wonderful...

From:	Norman Ramsey
Date:	19 Dec 1997

A while ago I threatened to add a new keyword to the noweb pipeline representation the better to support language-dependent tools. I have now updated the Hackers' Guide to describe the proper use of the new keyword. Go forth now and write tools that support it!