Noweb file inclusion


From:     Loris Bennett
Date: 13 Aug 1996
Having used nuweb2noweb on a nuweb file which uses @i to include other files I find that the resulting noweb file is enormous because the included files are written directly into the noweb file. How do I emulate nuweb's @i with noweb? If anyone can point me to a nice multi-file noweb example, this would enable me to answer (slightly embarrassing) questions like the above myself.


From:     Norman Ramsey
Date: 14 Aug 1996
Loris Bennett writes: Having used nuweb2noweb on a nuweb file which uses @i to include other files I find that the resulting noweb file is enormous because the included files are written directly into the noweb file. How do I emulate nuweb's @i with noweb?
The short answer is it can't be done. The long answer is that you can usually use LaTeX's \include or \input commands and keep the noweb files separate. If you actually need to tangle all those files together (as opposed to just weaving them), you can usually mention all the names on the command line:
   notangle foo.nw bar.nw quux.nw > big.out

If anyone can point me to a nice multi-file noweb example, this would enable me to answer (slightly embarrasing) questions like the above myself.

You can find a directory with several noweb files used to create a single document on the noweb homepage.


From:     Cameron Smith
Date: 16 Aug 1996
Loris Bennett writes: Having used nuweb2noweb on a nuweb file which uses @i to include other files I find that the resulting noweb file is enormous because the included files are written directly into the noweb file. How do I emulate nuweb's @i with noweb?

Norman Ramsey replied: The short answer is it can't be done.

I would say rather that it doesn't need to be done, because notangle and noweave can accept multiple files on the command line.

Norman goes on to say: The long answer is that you can usually use LaTeX's \include or \input commands and keep the noweb files separate. If you actually need to tangle all those files together (as opposed to just weaving them), you can usually mention all the names on the command line:
    notangle foo.nw bar.nw quux.nw > big.out
Here's an example that shows this.
Consider two noweb source files a.nw and b.nw:
::::: a.nw ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
The beginning.

<<abc>>=
   Line 1
   <<def>>
   Line 2
   <<def>>
   Line 3
   <<def>>
   Line 4
@

The end.
::::: b.nw ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Start of the middle.

<<def>>=
   Line A
   Line B
   Line C
@

End of the middle.
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

The command "notangle -Rabc a.nw b.nw" produces this output:
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
   Line 1
      Line A
      Line B
      Line C
   Line 2
      Line A
      Line B
      Line C
   Line 3
      Line A
      Line B
      Line C
   Line 4
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
The files a.nw and b.nw are treated by notangle as a single input stream. It accumulates the definitions of the chunks <<abc>> and <<def>> from all the input in that stream, and then writes out the expanded text. It doesn't matter which file each chunk comes from, so there's no explicit "include" statement (and no need for one).

Be warned: bad things can happen if a single chunk name (like <<def>>) has parts of its definition in several files. Recall that if there are several code chunks beginning "<<def>>=", then notangle concatenates all of them *in the order it sees them* and uses the result as the definition of <<def>>. Since notangle reads the files on its command line in the order you give them, the command "notangle -Rdef x.nw y.nw" can yield a different result than "notangle y.nw x.nw". In general, it is safest if you restrict yourself to placing the entire definition of a chunk name in a single file (still allowing the definition to be broken into separate chunks, but keeping all the chunks for a given name in a single file). Then the order in which files are given to notangle won't matter.

I have written a program called "nodepend" that uses "noroots" to find the root chunks in a noweb source file (or group of source files) and generate Makefile rules to extract the roots into separate files. When nodepend sees (for example) that <<abc>> is a root chunk made up of pieces from a.nw and b.nw, it writes a rule that is essentially the same as

   abc: a.nw b.nw
          notangle -Rabc a.nw b.nw > abc
so that the file "abc" will be re-generated if either a.nw or b.nw changes. This isn't perfect, since changes to a.nw or b.nw that don't affect abc will also trigger this rule. The "cpif" script comes in handy in this case, and prevents non-changes to abc from triggering even more needless remakes, but the fact remains that notangle has done some unnecessary work, and continues to redo it until you give in and allow abc to be rebuilt. So it makes sense to lay out your noweb files carefully, and not to build too many independent files from a single noweb source.
Like notangle, noweave can also accept multiple input files.
The output of "noweave a.nw b.nw" is (almost):
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
\documentstyle[noweb]{article}\pagestyle{noweb}\noweboptions{}%
\begin{document}\nwfilename{a.nw}\nwbegindocs{0}The beginning.

\nwenddocs{}\nwbegincode{1}\moddef{abc}\endmoddef
   Line 1
   \LA{}def\RA{}
   Line 2
   \LA{}def\RA{}
   Line 3
   \LA{}def\RA{}
   Line 4
\nwendcode{}\nwbegindocs{2}\nwdocspar

The end.
\nwenddocs{}\nwfilename{b.nw}\nwbegindocs{0}Start of the middle.

\nwenddocs{}\nwbegincode{1}\moddef{def}\endmoddef
   Line A
   Line B
   Line C
\nwendcode{}\nwbegindocs{2}\nwdocspar

End of the middle.
\nwenddocs{}\end{document}
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
I said "(almost)" because for readability in this posting I manually broke the first line, which was quite long. Of course, you wouldn't do that in ordinary use, since you aren't normally supposed to be reading the noweave-generated TeX source anyway. In any case, multi-file noweave output passes through LaTeX just fine. As Norman suggested, you could also have run noweave on each source file separately and use a "driver" file to control the typesetting. Suppose your driver file looks like this:
::::: driver.tex ::::::::::::::::::::::::::::::::::::::::::::::::::::::::
\documentstyle[noweb]{article}
\pagestyle{noweb}
\begin{document}
\input{a.tex}
\input{b.tex}
\end{document}
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Then the commands

   noweave -n a.nw > a.tex
   noweave -n b.nw > b.tex
   latex driver.tex
will produce the same DVI file as running LaTeX on the output of "noweave a.nw b.nw" given above. The separate-noweave approach may be easier to code into a Makefile, however, using (say) a ".nw.tex" suffix rule. Also, using a driver file (and the "-n" option of noweave) gives you liberty to exercise more control over the typesetting by using a custom style file or defining custom typesetting commands in the driver. (You can use the "-delay" option to similar effect for a project that is contained in a single noweb source file, but I have found that "-delay" isn't as useful for a multi-file project.)

Placing the noweave output for each noweb file in a separate TeX file has another advantage: line numbers reported in TeX error messages will then correspond to the line numbers in the original noweb documents, because noweave's LaTeX formatter goes to extraordinary lengths to insert no new line breaks. This is an admirable feature that I wish I could occasionally defeat since TeX has a fixed-size input buffer. Some TeX implementations choke on lines of only 512 characters, and when noweave strings out an enormous list of indexing information or other administrative crud on a single TeX source line, I get nervous.

Actually, almost all of the noweb development I have done has used only a single noweb source file (granted, sometimes a very large source file) for each separate project or configuration unit, but recently I have been working on some much larger projects that cannot possibly be handled this way, so I have been giving thought to how to best use noweb to develop and document such code. The multi-file capabilities of notangle, and Makefile tricks like nodepend, are essential for this.


From:     Alexandre Valente Sousa
Date: 19 Aug 1996
Cameron Smith writes: I have written a program called "nodepend" that uses "noroots" to find the root chunks in a noweb source file (or group of source files) and generate Makefile rules to extract the roots into separate files. When nodepend sees (for example) that <<abc>> is a root chunk made up of pieces from a.nw and b.nw, it writes a rule that is essentially the same as
  abc: a.nw b.nw
         notangle -Rabc a.nw b.nw > abc
so that the file "abc" will be re-generated if either a.nw or b.nw changes. This isn't perfect, since changes to a.nw or b.nw that don't affect abc will also trigger this rule. The "cpif" script comes in handy in this case, and prevents non-changes to abc from triggering even more needless remakes, but the fact remains that notangle has done some unnecessary work, and continues to redo it until you give in and allow abc to be rebuilt. So it makes sense to lay out your noweb files carefully, and not to build too many independent files from a single noweb source.

I also noticed this problem with the use of cpif and updates to a noweb file that does not change the tangled code. The problem is serious when you have (as I do) around 40 noweb files and complex recursive makefiles, they just tend to waste CPU cycles without producing any useful work

My solution was to change the noweb script (I use the noweb script more often than notangle and noweave, only the combined document with around 600 pages uses noweave), so that whenever "noweb path/foo.nw" is run the empty file "path/noweb.tid" is created (tid stands for Time ID). Now noweb checks the timestamp of .tid, and if it is newer than the .nw file then it does nothing (except update the timestamp of the .tid to the current time). Obviously if some output file is manually deleted then the use of .tid will erroneously avoid that the file be regenerated. Thus there is a -f (force) option that retangles everything in spite of the .tid. Notice that the use of .tid is not a replacement for cpif, in fact we need both working together. If anyone wants the modified noweb script I can post it, however it relies on a hacked mnt (that creates missing intermediate dirs and that uses paths relative to the .nw file, not the current dir) and it has built-in support for RCS

A related problem (with wasted CPU cycles) concerns building the .dvi file Here several latex runs are needed (usually 4 or 5 for complex documents with lots of indexes), and because in the case of my 600 pages document one latex runs takes almost 5 minutes I would like to make as few latex runs as possible. The solution was to create a Bourne shell script that knows about noweb, latex, bibtex, makeindex, and David Jones latex multiple index package. Here is the usage screen:

--------
Usage: /users/avs/phd/ross/literate/noweb/current/builddvi [OPTIONS] file[.tex]
Options:
 -h          Help
 -b          Use BibTeX
 -x          Use noindex (external noweb index)
 -H          Use noindex hack, i.e. trailing '(' replaced with '()'
 -n          Noweb code index support (check \nwixadds entries in .aux)
 -i ind idx  Use 'makeindex -o file.ind file.idx' (repeat for each index)
Update file.dvi from file.tex using latex (& maybe bibtex, noindex, makeindex)
Minimizes number of latex runs by protecting old .bbl .toc .nwi and .ind files
Supports David Jones multiple index package
Using noindex assumes .tex files generated with defs info, see man noindex(1)
E.g. add the following rules to a noweb makefile:
 %.tex: %.nw
        noweb -o $(*)
 %.dvi: %.tex
        builddvi -n -b -i ind idx -i cnd ctx $(*)
--------
The trick to reduce the number of latex runs is to backup the .bbl, .ind, etc files and to compare the new file with the old and see if it changed, if it did not then we know that the last latex run used the updated info. The noweb code index entries are extracted from the .aux file and compared with the ones that were previously in the .aux file. Using this the number of latex runs is reduced to the minimum (1 in some cases, it can never be reduced to zero because there is not enough information around)

The builddvi script is general and can be used independently of noweb even for normal tex files, the nice thing about it is that it always makes as many latex runs as needed, and no more. This means that now at the shell prompt I always use builddvi instead of latex (no problem with concurrent runs as the internal tmp files depend on the PID of the builddvi) If anyone wants it, just let me know (it is written as a noweb program, so it should be easy to understand and validate).

Placing the noweave output for each noweb file in a separate TeX file has another advantage: line numbers reported in TeX error messages will then correspond to the line numbers in the original noweb documents, because noweave's LaTeX formatter goes to extraordinary lengths to insert no new line breaks. This is an admirable feature that I wish I could occasionally defeat since TeX has a fixed-size input buffer. Some TeX implementations choke on lines of only 512 characters, and when noweave strings out an enormous list of indexing information or other administrative crud on a single TeX source line, I get nervous.

I also once hit the tex buffer limit (and my buffer was 3000, not 512!) with a chunk that had too many definitions and uses. After processing 500 noweb pages latex run out of main memory (latex packs several things per word thus it is not straightforward to increase this memory beyond 262141), I tracked the problem to the noweb index, using noindex reduced the memory consumption to half (so I assume I am safe until I hit 1000 pages), and finally I hit the 67003 string characters limit of my tex binary. I fixed that by recompiling tex, it was surprisingly easy, here is my HOWTO for the Infomagic Nov 95 Linux distribution, slackware, I assume this also applies to other Unix systems:

\item Infomagic Nov 95 CDROM Disc1, [[slackwar/source/t/ntex-source]], unpack
  [[nts-w2c1.tgz]], [[nts-w2c2.tgz]], [[nts-w2c3.tgz]], [[nts-kpat.tgz]] from
  [[/]]
\item Go to [[usr/src/tex/tex]] and run [[configure]] (as documented in
  [[web2c-6.1/INSTALL]] if the [[kpathsea/paths.h.in]] has not the correct
  paths it is possible that the \TeX\ program starts very, very slowly)
\item Edit [[web2c-6.1/tex/tex.ch]] and make the following changes (notice that
  it is a kind of diff, do not change the first group of lines that have the
  given pattern, change the second group):
\begin{enumerate}
\item Change the version printed in the banner so that this special version is
  easily recognizable
\item Change [[buf_size]] from 3000 to 16000, this avoids overflows with noweb
  lines too long (caused by too many definitions and uses in a single code
  chunk)
\item Change [[error_line]] from 79 to 379 and [[max_print_line]] from 79 to
  379, this simplifies making scripts to show the overflows or underflows in
  the log file.
\item Change [[max_strings]] from 15000 to 16000 (this gives 11935
  ``\emph{strings}'')
\item Change [[string_vacancies]] from 100000 to 165004 and [[pool_size]] from
  124000 to 189002, these changes are related, together they provide 132005
  ``\emph{string characters}''
\item Change [[save_size]] from 4000 to 4008 (this is the `\emph{s}' type of
  ``\emph{stack positions}''). This was just a test thinking on the future.
\end{enumerate}
\item The [[hash_size]] is the number of ``\emph{multiletter control
  sequences}''.
\item Go to [[web2c-6.1]] and do [[make programs]], the new binary will be in
  [[tex/virtex]]
\item copy [[tex/virtex]] to [[/usr/bin/virtex]] (the [[tex]] command is a 
  symbolic link to it)
\end{itemize}

Actually, almost all of the noweb development I have done has used only a single noweb source file (granted, sometimes a very large source file) for each separate project or configuration unit, but recently I have been working on some much larger projects that cannot possibly be handled this way, so I have been giving thought to how to best use noweb to develop and document such code. The multi-file capabilities of notangle, and Makefile tricks like nodepend, are essential for this.

I also have been thinking how to use noweb in large projects. In short here is what I do: