Noweb file inclusion

From:	Loris Bennett
Date:	13 Aug 1996

Having used nuweb2noweb on a nuweb file which uses @i to include other files I find that the resulting noweb file is enormous because the included files are written directly into the noweb file. How do I emulate nuweb's @i with noweb? If anyone can point me to a nice multi-file noweb example, this would enable me to answer (slightly embarrassing) questions like the above myself.

From:	Norman Ramsey
Date:	14 Aug 1996

Loris Bennett writes: Having used nuweb2noweb on a nuweb file which uses @i to include other files I find that the resulting noweb file is enormous because the included files are written directly into the noweb file. How do I emulate nuweb's @i with noweb?
The short answer is it can't be done. The long answer is that you can usually use LaTeX's \include or \input commands and keep the noweb files separate. If you actually need to tangle all those files together (as opposed to just weaving them), you can usually mention all the names on the command line:

   notangle foo.nw bar.nw quux.nw > big.out

If anyone can point me to a nice multi-file noweb example, this would enable me to answer (slightly embarrasing) questions like the above myself.

You can find a directory with several noweb files used to create a single document on the noweb homepage.

From:	Cameron Smith
Date:	16 Aug 1996

I would say rather that it doesn't need to be done, because notangle and noweave can accept multiple files on the command line.

Norman goes on to say: The long answer is that you can usually use LaTeX's \include or \input commands and keep the noweb files separate. If you actually need to tangle all those files together (as opposed to just weaving them), you can usually mention all the names on the command line:
notangle foo.nw bar.nw quux.nw > big.out

Here's an example that shows this.
Consider two noweb source files a.nw and b.nw:
::::: a.nw ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
The beginning.

<<abc>>=
   Line 1
   <<def>>
   Line 2
   <<def>>
   Line 3
   <<def>>
   Line 4
@

The end.
::::: b.nw ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Start of the middle.

<<def>>=
   Line A
   Line B
   Line C
@

End of the middle.
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

The command "notangle -Rabc a.nw b.nw" produces this output:
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
   Line 1
      Line A
      Line B
      Line C
   Line 2
      Line A
      Line B
      Line C
   Line 3
      Line A
      Line B
      Line C
   Line 4
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

The files a.nw and b.nw are treated by notangle as a single input stream. It accumulates the definitions of the chunks <<abc>> and <<def>> from all the input in that stream, and then writes out the expanded text. It doesn't matter which file each chunk comes from, so there's no explicit "include" statement (and no need for one).

Be warned: bad things can happen if a single chunk name (like <<def>>) has parts of its definition in several files. Recall that if there are several code chunks beginning "<<def>>=", then notangle concatenates all of them *in the order it sees them* and uses the result as the definition of <<def>>. Since notangle reads the files on its command line in the order you give them, the command "notangle -Rdef x.nw y.nw" can yield a different result than "notangle y.nw x.nw". In general, it is safest if you restrict yourself to placing the entire definition of a chunk name in a single file (still allowing the definition to be broken into separate chunks, but keeping all the chunks for a given name in a single file). Then the order in which files are given to notangle won't matter.

I have written a program called "nodepend" that uses "noroots" to find the root chunks in a noweb source file (or group of source files) and generate Makefile rules to extract the roots into separate files. When nodepend sees (for example) that <<abc>> is a root chunk made up of pieces from a.nw and b.nw, it writes a rule that is essentially the same as

   abc: a.nw b.nw
          notangle -Rabc a.nw b.nw > abc

so that the file "abc" will be re-generated if either a.nw or b.nw changes. This isn't perfect, since changes to a.nw or b.nw that don't affect abc will also trigger this rule. The "cpif" script comes in handy in this case, and prevents non-changes to abc from triggering even more needless remakes, but the fact remains that notangle has done some unnecessary work, and continues to redo it until you give in and allow abc to be rebuilt. So it makes sense to lay out your noweb files carefully, and not to build too many independent files from a single noweb source.

Like notangle, noweave can also accept multiple input files.
The output of "noweave a.nw b.nw" is (almost):
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
\documentstyle[noweb]{article}\pagestyle{noweb}\noweboptions{}%
\begin{document}\nwfilename{a.nw}\nwbegindocs{0}The beginning.

\nwenddocs{}\nwbegincode{1}\moddef{abc}\endmoddef
   Line 1
   \LA{}def\RA{}
   Line 2
   \LA{}def\RA{}
   Line 3
   \LA{}def\RA{}
   Line 4
\nwendcode{}\nwbegindocs{2}\nwdocspar

The end.
\nwenddocs{}\nwfilename{b.nw}\nwbegindocs{0}Start of the middle.

\nwenddocs{}\nwbegincode{1}\moddef{def}\endmoddef
   Line A
   Line B
   Line C
\nwendcode{}\nwbegindocs{2}\nwdocspar

End of the middle.
\nwenddocs{}\end{document}
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

I said "(almost)" because for readability in this posting I manually broke the first line, which was quite long. Of course, you wouldn't do that in ordinary use, since you aren't normally supposed to be reading the noweave-generated TeX source anyway. In any case, multi-file noweave output passes through LaTeX just fine. As Norman suggested, you could also have run noweave on each source file separately and use a "driver" file to control the typesetting. Suppose your driver file looks like this:

::::: driver.tex ::::::::::::::::::::::::::::::::::::::::::::::::::::::::
\documentstyle[noweb]{article}
\pagestyle{noweb}
\begin{document}
\input{a.tex}
\input{b.tex}
\end{document}
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Then the commands

   noweave -n a.nw > a.tex
   noweave -n b.nw > b.tex
   latex driver.tex

will produce the same DVI file as running LaTeX on the output of "noweave a.nw b.nw" given above. The separate-noweave approach may be easier to code into a Makefile, however, using (say) a ".nw.tex" suffix rule. Also, using a driver file (and the "-n" option of noweave) gives you liberty to exercise more control over the typesetting by using a custom style file or defining custom typesetting commands in the driver. (You can use the "-delay" option to similar effect for a project that is contained in a single noweb source file, but I have found that "-delay" isn't as useful for a multi-file project.)

Placing the noweave output for each noweb file in a separate TeX file has another advantage: line numbers reported in TeX error messages will then correspond to the line numbers in the original noweb documents, because noweave's LaTeX formatter goes to extraordinary lengths to insert no new line breaks. This is an admirable feature that I wish I could occasionally defeat since TeX has a fixed-size input buffer. Some TeX implementations choke on lines of only 512 characters, and when noweave strings out an enormous list of indexing information or other administrative crud on a single TeX source line, I get nervous.

Actually, almost all of the noweb development I have done has used only a single noweb source file (granted, sometimes a very large source file) for each separate project or configuration unit, but recently I have been working on some much larger projects that cannot possibly be handled this way, so I have been giving thought to how to best use noweb to develop and document such code. The multi-file capabilities of notangle, and Makefile tricks like nodepend, are essential for this.

From:	Alexandre Valente Sousa
Date:	19 Aug 1996

Cameron Smith writes: I have written a program called "nodepend" that uses "noroots" to find the root chunks in a noweb source file (or group of source files) and generate Makefile rules to extract the roots into separate files. When nodepend sees (for example) that <<abc>> is a root chunk made up of pieces from a.nw and b.nw, it writes a rule that is essentially the same as
abc: a.nw b.nw notangle -Rabc a.nw b.nw > abc
so that the file "abc" will be re-generated if either a.nw or b.nw changes. This isn't perfect, since changes to a.nw or b.nw that don't affect abc will also trigger this rule. The "cpif" script comes in handy in this case, and prevents non-changes to abc from triggering even more needless remakes, but the fact remains that notangle has done some unnecessary work, and continues to redo it until you give in and allow abc to be rebuilt. So it makes sense to lay out your noweb files carefully, and not to build too many independent files from a single noweb source.

I also noticed this problem with the use of cpif and updates to a noweb file that does not change the tangled code. The problem is serious when you have (as I do) around 40 noweb files and complex recursive makefiles, they just tend to waste CPU cycles without producing any useful work

My solution was to change the noweb script (I use the noweb script more often than notangle and noweave, only the combined document with around 600 pages uses noweave), so that whenever "noweb path/foo.nw" is run the empty file "path/noweb.tid" is created (tid stands for Time ID). Now noweb checks the timestamp of .tid, and if it is newer than the .nw file then it does nothing (except update the timestamp of the .tid to the current time). Obviously if some output file is manually deleted then the use of .tid will erroneously avoid that the file be regenerated. Thus there is a -f (force) option that retangles everything in spite of the .tid. Notice that the use of .tid is not a replacement for cpif, in fact we need both working together. If anyone wants the modified noweb script I can post it, however it relies on a hacked mnt (that creates missing intermediate dirs and that uses paths relative to the .nw file, not the current dir) and it has built-in support for RCS

A related problem (with wasted CPU cycles) concerns building the .dvi file Here several latex runs are needed (usually 4 or 5 for complex documents with lots of indexes), and because in the case of my 600 pages document one latex runs takes almost 5 minutes I would like to make as few latex runs as possible. The solution was to create a Bourne shell script that knows about noweb, latex, bibtex, makeindex, and David Jones latex multiple index package. Here is the usage screen:

--------
Usage: /users/avs/phd/ross/literate/noweb/current/builddvi [OPTIONS] file[.tex]
Options:
 -h          Help
 -b          Use BibTeX
 -x          Use noindex (external noweb index)
 -H          Use noindex hack, i.e. trailing '(' replaced with '()'
 -n          Noweb code index support (check \nwixadds entries in .aux)
 -i ind idx  Use 'makeindex -o file.ind file.idx' (repeat for each index)
Update file.dvi from file.tex using latex (& maybe bibtex, noindex, makeindex)
Minimizes number of latex runs by protecting old .bbl .toc .nwi and .ind files
Supports David Jones multiple index package
Using noindex assumes .tex files generated with defs info, see man noindex(1)
E.g. add the following rules to a noweb makefile:
 %.tex: %.nw
        noweb -o $(*)
 %.dvi: %.tex
        builddvi -n -b -i ind idx -i cnd ctx $(*)
--------

The trick to reduce the number of latex runs is to backup the .bbl, .ind, etc files and to compare the new file with the old and see if it changed, if it did not then we know that the last latex run used the updated info. The noweb code index entries are extracted from the .aux file and compared with the ones that were previously in the .aux file. Using this the number of latex runs is reduced to the minimum (1 in some cases, it can never be reduced to zero because there is not enough information around)

The builddvi script is general and can be used independently of noweb even for normal tex files, the nice thing about it is that it always makes as many latex runs as needed, and no more. This means that now at the shell prompt I always use builddvi instead of latex (no problem with concurrent runs as the internal tmp files depend on the PID of the builddvi) If anyone wants it, just let me know (it is written as a noweb program, so it should be easy to understand and validate).

I also once hit the tex buffer limit (and my buffer was 3000, not 512!) with a chunk that had too many definitions and uses. After processing 500 noweb pages latex run out of main memory (latex packs several things per word thus it is not straightforward to increase this memory beyond 262141), I tracked the problem to the noweb index, using noindex reduced the memory consumption to half (so I assume I am safe until I hit 1000 pages), and finally I hit the 67003 string characters limit of my tex binary. I fixed that by recompiling tex, it was surprisingly easy, here is my HOWTO for the Infomagic Nov 95 Linux distribution, slackware, I assume this also applies to other Unix systems:

\item Infomagic Nov 95 CDROM Disc1, [[slackwar/source/t/ntex-source]], unpack
  [[nts-w2c1.tgz]], [[nts-w2c2.tgz]], [[nts-w2c3.tgz]], [[nts-kpat.tgz]] from
  [[/]]
\item Go to [[usr/src/tex/tex]] and run [[configure]] (as documented in
  [[web2c-6.1/INSTALL]] if the [[kpathsea/paths.h.in]] has not the correct
  paths it is possible that the \TeX\ program starts very, very slowly)
\item Edit [[web2c-6.1/tex/tex.ch]] and make the following changes (notice that
  it is a kind of diff, do not change the first group of lines that have the
  given pattern, change the second group):
\begin{enumerate}
\item Change the version printed in the banner so that this special version is
  easily recognizable
\item Change [[buf_size]] from 3000 to 16000, this avoids overflows with noweb
  lines too long (caused by too many definitions and uses in a single code
  chunk)
\item Change [[error_line]] from 79 to 379 and [[max_print_line]] from 79 to
  379, this simplifies making scripts to show the overflows or underflows in
  the log file.
\item Change [[max_strings]] from 15000 to 16000 (this gives 11935
  ``\emph{strings}'')
\item Change [[string_vacancies]] from 100000 to 165004 and [[pool_size]] from
  124000 to 189002, these changes are related, together they provide 132005
  ``\emph{string characters}''
\item Change [[save_size]] from 4000 to 4008 (this is the `\emph{s}' type of
  ``\emph{stack positions}''). This was just a test thinking on the future.
\end{enumerate}
\item The [[hash_size]] is the number of ``\emph{multiletter control
  sequences}''.
\item Go to [[web2c-6.1]] and do [[make programs]], the new binary will be in
  [[tex/virtex]]
\item copy [[tex/virtex]] to [[/usr/bin/virtex]] (the [[tex]] command is a 
  symbolic link to it)
\end{itemize}

I also have been thinking how to use noweb in large projects. In short here is what I do:

a) the program is divided into more or less autonomous components, each component is in one dir (all component dirs at the same level, this is essential to allow using \input{../bar/foo} instead of \input{foo} so that the same \input statement works from sibling dirs
b) each component consists of one or more .nw files. If component foo has two subcomponents named alpha and beta then we have the files foo/alpha.nw, foo/beta.nw, and most pathnames of the code in foo/alpha.nw are of the form "foo/alpha/version/filename.x".
b) noweb was modified to integrate RCS (revision control) support. Most tangled files pathnames are of the form "alpha/version/bar.c" where alpha is the subcomponent name. The hacked noweb script replaces the fixed string "version" (if it occurs in output filenames) with the RCS $State: Exp $ that it extracts from the 1st lines of the .nw file (from the $Id: nowebfi.html,v 1.3 2001-07-29 11:51:09-07 daniel_mall Exp daniel_mall $ entry). This allows having several versions of the code in the same file system. E.g., the chunk is <<alpha/version/bar.c>> and the $State: Exp $ is "current" then "alpha/current/bar.c" is created, and if we are in a linux machine then the object file goes to "alpha/current/linux/bar.o" (or to "alpha/current/sun4s/bar.o" if we now compile using a Solaris machine). Binaries also go to the same architecture dependent dir (e.g. alpha/current/linux/bar) and alpha/current/bar is a shell script that uses uname to detect the architecture and then exec the correct bin
c) The makefiles are standardized and small and always include a rules makefile that knows about this and uses $(VERSION) to refer to the $State: Exp $ and $(BASENAME) to refer to the subcomponent name. One of the subcomponents is the master thus it also creates a master makefile in the component dir, this makefile has e.g. "BASENAME=alpha beta", meaning that there are two subcomponents, then the makefile expands that list and recursively calls the subcomponent makefiles. The latex code includes version dependent .eps pictures using "alpha/\version/bar.eps" and noweb will make the definition of \version stand for the $State: Exp $

d) there are 3 special dirs at the same depth as the component dirs. The first has the bibliography databases in BibTeX format. The second has the file dots.nw with the source code for all the diagrams (pictures) written using dot (dot is a free directed graph layout tool available from AT&T; the use of dot allows applying literate programming techniques to the pictures). The third has a file that collects all the code together to produce a single huge file. The GNU makefile entry is more or less like this:

SECTIONS=intro/intro ... faultt/rmsg faultt/rprocess ... ... .(lots of names)
textsource=everything.tex boot.cfg \
    $(foreach sectionName, $(SECTIONS), ../$(sectionName).nw) \
    $(shell echo ../dots/dots/$(VERSION)/*.dot) \
    $(shell echo ../biblio/*.bib)

everything.dvi: $(textsource)
    -mv $(*).dvi old-$(*).dvi
    for k in $(SECTIONS); do \
       if older ../$$k.nw TMP-`basename $$k`.defs; then : ; else \
          nodefs ../$$k.nw > TMP-`basename $$k`.defs; fi; done
    sort -u *.defs | cpif defs.all
    for k in $(SECTIONS); do \
       if older defs.all TMP-`basename $$k`.tex; then \
          if older ../$$k.nw TMP-`basename $$k`.tex; then doit=0; \
          else doit=1; fi; \
       else doit=1; fi; \
       if [ $$doit -eq 1 ]; then \
          noweave -n -indexfrom defs.all ../$$k.nw | \
           sed '\#^\\nw#s#(}#()}#g' > TMP-`basename $$k`.tex; fi; \
    done
    (cd ../dots; $(MAKE))
    ../literate/noweb/$(VERSION)/builddvi -x -H -b -i ind idx -i cnd ctx $(*)

The boot.cfg is for placing \includeonly{foo} directives (using "\includeonly{}" (i.e. no filenames) causes only the TOC and the indexes to be generated). The older program is the following:

#include 
#include 
#include 

int main(int argc, char *argv[])
{
 struct stat stat1, stat2;
 if (argc != 3) {
    fprintf(stderr, "Usage: %s file1 file2\n\
-- Exits 0: file1 older than file2, 1: newer, 2: file1 not found,\n", argv[0]);
    fprintf(stderr, "--       3: file2 not found, 4: incorrect usage\n");
    fprintf(stderr, "-- Correct sh usage: 'if older foo.c foo.o; then : ; else cc -c foo.c; fi\n");
    return 4;
 }
 if (stat(argv[1], &stat1))
    return 2;
 if (stat(argv[2], &stat2))
    return 3;
 if (stat1.st_mtime > stat2.st_mtime)
    return 1;
 return 0;
}

e) each noweb file can be processed standalone or as a part of the "everything.tex". This is done by having each file start with \input{setup.cfg}, and then use the latex ifthenelse package to conditionally include exclude some code (such as the latex preamble, etc) depending on the value of the "standalone" boolean. Actually each component's setup.cfg is a symbolic link to a single file that provides the standalone setup. On the other hand in the everything dir when "everything.tex" uses \include{../everything/TMP-rmsg} this will get to the \input{setup.cfg} in the 1st line of that file, however latex will include setup.cfg from the current working dir thus an empty setup.cfg is included
f) HTML code is generated locally in each component dir, the "everything" dir has code that replaces the TOC entries of "everything.html" with links to the components, and there is a master dot file with the program architecture, using tkdot (an extension of dot for Tcl/Tk) and a simple script clicking on the nodes or edges of archit.dot causes Netscape or Mosaic to be fired (if not already running) and display the corresponding HTML code
g) I was unlucky in that I use the Beta programming language (an OO language) that uses the tokens <<foo:bar>> and @<<foo:bar>> for separate compilation modules, thus I have to remember to use @<<...>> and @@<<...>>. Otherwise I have no complains about noweb (except maybe that it is so easy to adapt to your special needs that you tend to overdo)
g) special circumstances use some special code e.g. noweb file with triple configuration: standalone, embedded and paper (a paper using the special style files required by the journal). But the above copes with most of my needs and it is very seldom I resort to non standard code
h) I am also very careful with names and following strict name and style conventions. An excellent book on this subject is: @book ( kn:Lakos96, author={John Lakos}, title={Large-Scale C++ Software Design}, publisher={Addison Wesley}, year=1996, pages={xxxii,846}, ISBN={0-201-63362-0}, annote={exceptional, published June 96} )
i) From the shell prompt (and in makefiles) I usually use "noweb -t" instead of just "noweb" (that is I don't generate the .tex). For a 4000 line noweb file with few comments and a lot of code this reduces the noweb processing time from 20s to 4s.