Noweb: multiple output files

From:	Ken Snyder
Date:	11 Oct 1996

I apologize if this has been discussed previously and I missed it: Note: I am using noweb version 2.7 Is there any way to generate multiple output files from a single noweb file? I am writing a program that incorporates C, yacc, and lex code, and I would like to use one noweb source file. I would like noweave to generate a single LaTeX file, and notangle to generate individual C, yacc, and lex files. If this is not considered "good practice" to do so, please inform me, as I do not consider myself someone who practices good programming techniques. I realize a multiple noweb source approach is facilitated by the noweave -n option, and that the lack of multiple outputs may be intentional.

From:	David Fox
Date:	11 Oct 1996

I do this by making the web output a shell script, e.g.

<<*>>=
../update Proxy.h << 'delim'
#ifndef Proxy_h
#define Proxy_h
<<header file of Proxy class>>
#endif
delim

../update Proxy.C << 'delim'
<<main file of Proxy class>>
delim
@

Then I just pipe the output of tangle to the sh. The update script compares the existing file with the new one (read from standard input) and updates it if there is a difference. By default it filters out "#line" directives because they change so often, but eventually this causes confusion in the debugger. I am mildly curious to hear if this is literate programming heresy.

#!/bin/sh

# update a file with the text read from standard input, but only if
# differences are found. Each file is sent through $FILTER before the
# comparison is made, through the result is the unfiltered file. By
# default, the value of this regular expression matches lines that are
# all whitespace or lines that begin with '#line'.

# There seems to be a bug in GNU diff which breaks the beginning of
# line anchoring, otherwise I could just use some diff options to do
# this.

NEW=/tmp/repldiff1$$
FLTD=/tmp/repldiff2$$
DOIT="yes"
FILTER="grep -v ^\(#line.*\|[[:space:]]*\)$"
#FILTER="cat"

# Analyze command line arguments

while [ "$#" != "0" ]; do
  case $1 in
    -filter) FILTER=$2; shift; shift;;
    -n) DOIT="no"; shift;;
    *) if [ "$ORIG" != "" ]; then
         echo "Ignoring extra file argument: $1";
       else
         ORIG=$1;
       fi
       shift;;
  esac
done

# Run the comparisons

if [ ! -f $ORIG ]; then
  echo "Creating $ORIG" 1>&2
  cat > $ORIG			# File doesn't exist, do the update
  #chmod 444 $ORIG
else
  #chmod 644 $ORIG
  cat > $NEW
  cat $ORIG | sed 's/^[ 	]*//' | $FILTER > $FLTD
  if ! cat $NEW |  sed 's/^[ 	]*//' | $FILTER | cmp -s $FLTD; then
    echo "Updating $ORIG" 1>&2
    if [ "$DOIT" = "yes" ]; then
      if [ -f $ORIG ]; then mv $ORIG $ORIG~; fi
      mv $NEW $ORIG
    fi
  else
    if [ "$DOIT" != "yes" ]; then echo "Not updating $ORIG"; fi
    rm -f $NEW
  fi
  rm -f $FLTD
  #chmod 444 $ORIG
fi

From:	Jacob Nielsen
Date:	13 Oct 1996

Is there any way to generate multiple output files from a single noweb file? I am writing a program that incorporates C, yacc, and lex code, and I would like to use one noweb source file. I would like noweave to generate a single LaTeX file, and notangle to generate individual C, yacc, and lex files.

You have 3 root chunks, say [[<<*>>]] (the C code), [[<<Header>>]] (the C header file), [[<<Yacc>>]] and [[<<Lex>>]] and extract them using notangle:

  notangle | cpif code.c
  notangle -RHeader | cpif code.h
  notangle -RYacc | cpif code.y
  notangle -RLex | cpif code.l

You should of course remember to include the file name, so it's:

  notangle -RLex code.nw | cpif code.l

If this is not considered "good practice" to do so, please inform me, as I do not consider myself someone who practices good programming techniques. I realize a multiple noweb source approach is facilitated by the noweave -n option, and that the lack of multiple outputs may be intentional.

I don't know if the lack of multiple outputs is intentional. Putting the C code and the header file stuff in the same noweb file is a good thing. Putting multiple programming languages in the same noweb file is not a good idea -- or rather: it's not supported in a good way. The problem is with the indexing: you get all the identifiers mixed together as if they belonged to the same code; and if you're not careful you also get pointers going from identifiers in the C code to the Yacc/Lex code where they may be related but, in fact, are not the same!

I.e., it's quite easy to get:

  List of Identifiers
      c_ptr  Used in chuck 1, 2 and 10

where chunks 1 and 2 are C code and chunk 10 is Lex code.

If someone has a way to do the cross-referencing on a per ``prog. language'' basis, I would love to hear about it.

From:	Michel Plugge
Date:	14 Oct 1996

Ken Snyder writes: Is there any way to generate multiple output files from a single noweb file? I am writing a program that incorporates C, yacc, and lex code, and I would like to use one noweb source file. I would like noweave to generate a single LaTeX file, and notangle to generate individual C, yacc, and lex files.

Perhaps I have a utility that you could use for the job. It is a kind of preprocessor; I wrote it because I am writing some converter utilities that share (in a large LEX file) a lot of code, but often there is to switch from one converter to another (in a single line, of just for some lines), or the code is just for two converters. If you add a {@@cw} to the beginning of the C code, {@@lw} to the beginning of LEX code and {@@yw} to the beginning of yacc code, runnin lx2l on this file would give you the different files (if called with command line parameter c, you get the C (weave) code, with l you get the LEX code, with y the yacc code and with w the complete weave file. Conditionals, abbreviations (like C preprocessor macros) and a (single) C headerfile for common conditionals with C source files are also supported. Documentation is currently very poor, but it is well tested, because I use it frequently. If you are interested, drop me a line.

From:	Harry George
Date:	15 Oct 1996

I wrote a noweb-workalike which writes a separate file for each root chunk. It is written in Modula-3 and is itself done as literate programming (that is, it bootstraps). See: http://www.eskimo.com/~hgeorge Follow link to m3noweb. The documentation is in LaTeX2e, with dvi files visible on the web site. The whole package is free. I have used this package for mixed Modula-3, Java, Perl, and Prolog, with multiple output files from a single .nw file.

From:	Paolo Amoroso
Date:	16 Oct 1996

Assuming you have C code in the root chunk, yacc code in chunk `yacc' and lex code in chunk `lex', you can get the corresponding output source files by issuing the following commands:

    notangle source-and-doc.nw > c-code.c
    notangle -Ryacc source-and-doc.nw > yacc-code.y
    notangle -Rlex source-and-doc.nw > lex-code.l

If this is not considered "good practice" to do so, please inform me, as I do not consider myself someone who practices good programming techniques. I

This practice is explicitly encouraged by Norman Ramsey, noweb's author, in his paper "Literate-Programming Tools Can Be Simple and Extensible". It is mentioned in section "Using noweb", where you can find examples of use. For more information check Ramsey's home page.

From:	David Kastrup
Date:	16 Oct 1996

Harry George writes: I wrote a noweb-workalike which writes a separate file for each root chunk.

Just want to point out that using a for loop on `noroots xxx.nw` it is very easy to automatically extract all root chunks via notangle.

From:	David Fox
Date:	17 Oct 1996

Jacob Nielsen writes: You have 3 root chunks, say [[<<*>>]] (the C code), [[<<Header>>]] (the C header file), [[<<Yacc>>]] and [[<<Lex>>]] and extract them using notangle:
notangle | cpif code.c notangle -RHeader | cpif code.h notangle -RYacc | cpif code.y notangle -RLex | cpif code.l

Ah, this makes more sense than what I was doing. I should have read the docs more carefully. Actually, I spoke too soon. If you more than a few targets it takes too long to re-scan the file for each one (its 900K). Also, cpif doesn't filter out #line directives, so basically most files change every time you run it. The problem with my way is that the resulting noweave output blows TeX memory.

From:	David Kastrup
Date:	18 Oct 1996

David Fox writes: Actually, I spoke too soon. If you more than a few targets it takes too long to re-scan the file for each one (its 900K). Also, cpif doesn't filter out #line directives, so basically most files change every time you run it.

Well, make up your mind! Either you don't want to do source debugging and then you don't need to tell notangle to generate #line directives, or you do want to do source debugging in which case you'd certainly prefer your object files to have correct references to changed line numbers, thus need to recompile.

From:	Jacob Nielsen
Date:	18 Oct 1996

Perhaps I should read the docs more carefully too. The man pages for noweb (noweb-2.7a) suggest an even easier way:

     noweb looks for chunks that are defined but not used in the
     source file. 
[ i.e., the root chunks ]
     If the name of such a chunk contains no
     spaces, the chunk is an ``output file;'' noweb expands it
     and writes the result onto the file of the same name.

And I believe running 'noweb' is liking running 'nuweb' (another literate programming tool): one pass over the web file gives multiple output files. The same can hardly be said of running 'notangle'.

From:	David Fox
Date:	20 Oct 1996

David Kastrup writes: Well, make up your mind! Either you don't want to do source debugging and then you don't need to tell notangle to generate #line directives, or you do want to do source debugging in which case you'd certainly prefer your object files to have correct references to changed line numbers, thus need to recompile.

Now that I have seen Jacob Nielsen's message and switched from notangle to noweb things are much quicker. I have decided to forgo source debugging for now (by omitting the "*" from the end of the filenames) to speed up my compilations. It still seems unfortunate that debugging means compile times that are an order of magnitude slower, but that is a topic for a compiler newsgroup.

From:	Alexandre Valente Sousa
Date:	18 Oct 1996

This is a comment on the noweb thread: How to extract multiple output files from a single noweb source, and whether it is a good or an evil thing to mix multiple programming languages in the same noweb source file. I got worried that there were so many answers and so many different solutions and no one seemed to hit the simplest solution, namely to use the tool "noweb" (yes, the noweb tool is part of the noweb distribution, try "man noweb") This is a long mail, and the reason why I collected all this together is that I disagree with most of what was said. Here is a summary of the problem and the suggested solutions, together with my comments:

PS: what are my credentials? None ... although I have been hacking noweb for the past two years or so trying to make it do the weird things I needed, so at least I know something about the implementation

Yes, just use the noweb tool, try "man noweb". And yes the C, yacc and lex files should be on the same noweb source file

David Fox writes: I do this by making the web output a shell script, e.g. [ shell script ] Then I just pipe the output of tangle to the sh. The update script compares the existing file with the new one (read from standard input) and updates it if there is a difference. By default it filters out "#line" directives because they change so often, but eventually this causes confusion in the debugger. I am mildly curious to hear if this is literate programming heresy.

No, this is not the way to do it, even if it works (why reinventing the wheel?). a) use noweb to generate the multiple output files b) noweb uses the cpif tool to compare file contents, the update of the timestamp only takes place if there was some change thus there is no need for that shell script. See "man cpif" c) unless you tell noweb to do so noweb will not generate #line directives, thus there is no need to filter them out

Jacob Nielsen writes: I don't know if the lack of multiple outputs is intentional. Putting the C code and the header file stuff in the same noweb file is a good thing.

I agree that the .c and the .h should be on the same file, I disagree that noweb does not support multiple output files (it does, through the noweb tool, which is no more than a shell script that uses src/c/mnt to generate multiple roots)

Putting multiple programming languages in the same noweb file is not a good idea -- or rather: it's not supported in a good way. The problem is with the indexing: you get all the identifiers mixed together as if they belonged to the same code; and if you're not careful you also get pointers going from identifiers in the C code to the Yacc/Lex code where they may be related but, in fact, are not the same! I.e., it's quite easy to get:
List of Identifiers c_ptr Used in chuck 1, 2 and 10 where chunks 1 and 2 are C code and chunk 10 is Lex code.
If someone has a way to do the cross-referencing on a per ``prog. language'' basis, I would love to hear about it.

Again I disagree. I think that noweb should be language independent and as such must recognize all identifiers as being the same even if they stem from different languages or are variables at different scope. And I will never put output files in a different source file just to avoid this problem, whenever this problem annoys me I just rename the identifiers so that they do not conflict. And if I am not willing to rename the identifiers, then I just accept the fact that e.g. I get multiple definitions of a variable that happens to be two or more variables, and so on. This of course is debatable ;), however I have 50 000 lines of noweb code and I do have unintended name overloading, it just has not been found to be a problem.

Michel Plugge writes: Perhaps I have a utility that you could use for the job. It is a kind of preprocessor; I wrote it because I am writing some converter utilities that share (in a large LEX file) a lot of code, but often there is to switch from one converter to another (in a single line, of just for some lines), or the code is just for two converters. If you add a {@@cw} to the beginning of the C code, {@@lw} to the beginning of LEX code and {@@yw} to the beginning of yacc code, runnin lx2l on this file would give you the different files (if called with command line parameter c, you get the C (weave) code, with l you get the LEX code, with y the yacc code and with w the complete weave file. Conditionals, abbreviations (like C preprocessor macros) and a (single) C headerfile for common conditionals with C source files are also supported. Documentation is currently very poor, but it is well tested, because I use it frequently. If you are interested, drop me a line.

As discussed above no need to use this (limited? very specific?) utility. However I assume that Michel Plugge is not a noweb user, therefore this answer is OK.

Harry George writes: I wrote a noweb-workalike which writes a separate file for each root chunk. It is written in Modula-3 and is itself done as literate programming (that is, it bootstraps). See: http://www.eskimo.com/~hgeorge Follow link to m3noweb. The documentation is in LaTeX2e, with dvi files visible on the web site. The whole package is free. I have used this package for mixed Modula-3, Java, Perl, and Prolog, with multiple output files from a single .nw file.

Again no need to use this. The name m3noweb makes me think this is a noweb user, so you should know about the noweb tool, didn't you read the man page before making your own tool that duplicates existing functionality?

Paolo Amoroso writes:
notangle source-and-doc.nw > c-code.c notangle -Ryacc source-and-doc.nw > yacc-code.y notangle -Rlex source-and-doc.nw > lex-code.l
This practice is explicitly encouraged by Norman Ramsey, noweb's author, in his paper "Literate-Programming Tools Can Be Simple and Extensible". It is mentioned in section "Using noweb", where you can find examples of use. For more information check Ramsey's home page.

I don't know if Norman Ramsey encourages this, but I disagree that it is the best solution. Again, I am biased, the way I use noweb is that a noweb file is a component (or a subcomponent if the component is too large to be in a single file), and I expect the default makefile rules (specified elsewhere in a project wide makefile include file) to take care of all the housekeeping chores. For that to work if I am in component "foo" I just use:

    @ this is the yacc code
    <<version/foo/bar.y>>=
    ...
    @ this is the lex code
    <<version/foo/bar.l>>=
    ...
    @ this is the main program
    <<version/foo/bar.c>>=
    ...
    @ this Makefile runs yacc, lex, cc to build the tool bar
    <<version/foo/Makefile>>=
    ...
    @

And now I do: make (and the implicit global makefile rules will take care of running: noweb foo.nw thus creating version/foo.tex, version/foo/bar.y, version/foo/bar.l, version/foo/bar.c, and then executing cd version/foo; $(MAKE) Makefile thus creating the tool bar. The implicit rules also take care of building version/foo.dvi, version/foo.ps, version/foo.html, but that is besides the point. [just ignore the version stuff, it is just that I use a modified noweb front-end so that I can get RCS version et al., this means that the directory version will not actually exist, rather a version dependent name such as "frozen", "current", "unstable", will be used]

Yes, rescanning the file takes too long, however there is no rescanning if you use the noweb tool instead of the notangle tool (the hard work is made by src/c/mnt, see the noweb source code) Again unless you tell noweb to do so it will not generate #line directives (nor for that matter will notangle unless you use the -L option) My largest file is 250 KB (110 pages) and it is too large, I intend to split it soon. Definitely I think you should think about breaking your 900K source file into several files.

The problem with my way is that the resulting noweave output blows TeX memory.

I also managed to get into trouble with LaTeX out of main memory. That happened when I combined several independent components and subcomponents into a single large run to make a kind of project reference manual (500 pages). I solved that by:

a) using noindex, see "man noindex"
b) recompiling LaTeX and making a few changes to it (about 5 lines). If you want to know how to do that (assuming Unix), send me a mail
c) my current LaTeX can handle my 700 pages project reference manual, and it will run out of memory when I go around 1200 pages. At that point I will give up on making a single document with everything and will split it into two independent parts. Obviously this combined document is hardly ever printed, I only print the section or chapter documents and they are around 60 pages at the most. On the other hand I improved noweb HTML handling of \index{...}, \index*{...} and added support for cross references and EPS pictures (converted to GIF) thus I always have the combined HTML document with global index, global table of contents and cross file crossrefs uptodate, I use it for browsing. I also do online viewing of book.dvi (700 pages) using xdvi (page numbers start at 1 and there are no ii pages, this simplifies jumping to a given page) but I only print it once every 3 months or so.

And to finalize I do think that a noweb source file should be seen as a component and group all files related to the component/module. This means the yacc, the lex, the C, the header file, the Makefile, the shell script, and so on, if they are related to the component then they should be in the component source file.

From:	Jacob Nielsen
Date:	19 Oct 1996

Alexandre Valente Sousa writes: Again I disagree. I think that noweb should be language independent and as such must recognize all identifiers as being the same even if they stem from different languages or are variables at different scope. And I will never put output files in a different source file just to avoid this problem, whenever this problem annoys me I just rename the identifiers so that they do not conflict. And if I am not willing to rename the identifiers, then I just accept the fact that e.g. I get multiple definitions of a variable that happens to be two or more variables, and so on. This of course is debatable ;), however I have 50 000 lines of noweb code and I do have unintended name overloading, it just has not been found to be a problem

Just to clarify: I do not want to turn noweb into a language dependent tool! If the name overloading is a problem and manual markup is out of the question, here's a proposal that doesn't make noweb a language dependent tool while at the same time produces separate indices:

1. Add a new keyword -- @ %proglang language
2. Let noweb collect the defines and uses of each language.
3. Add some trickery to get the separated indices.

<<code.c>>=
   some C code
@ %def c_ptr
@ %proglang C

<<code.l>>=
  some lex
@ %def c_ptr
@ %proglang lex

I admit, I am not a noweb/LaTeX hacker, so can it be done? It can't be difficult to extend the various noweb filters (prettyprinters and "find defs for a particular programming language") to use this information. PS: I would be quite happy with doing the use/def markup manually and using some LaTeX trickery to get the thing working. It shows that I don't mix programming languages in the same web file very often.

From:	Alexandre Valente Sousa
Date:	22 Oct 1996

Jacob Nielsen writes: Just to clarify: I do not want to turn noweb into a language dependent tool! If the name overloading is a problem and manual markup is out of the question, here's a proposal that doesn't make noweb a language dependent tool while at the same time produces separate indices:
1. Add a new keyword -- @ %proglang language 2. Let noweb collect the defines and uses of each language. 3. Add some trickery to get the separated indices. <<code.c>>= some C code @ %def c_ptr @ %proglang C <<code.l>>= some lex @ %def c_ptr @ %proglang lex
I admit, I am not a noweb/LaTeX hacker, so can it be done? It can't be difficult to extend the various noweb filters (prettyprinters and "find defs for a particular programming language") to use this information. PS: I would be quite happy with doing the use/def markup manually and using some LaTeX trickery to get the thing working. It shows that I don't mix programming languages in the same web file very often.

Yes I am pretty sure it can be done. I will try to see the implications, meaning it will affect Icon code and I do not know Icon very well however just by looking at sample Icon code and by trial and error I managed to write Icon code that implements HTML support for \index and \index* and \ref (both local and cross files) this in spite of the fact that Norman Ramsey said that it could not be done (or at least was very hard), so that of course boosted my confidence to try more. Now I have a hard deadline, but as soon as I get some time I will try to implement something similar to your idea, I will let you know.