Usage of CWEB


From:     Yuval Peduel
Date: 30 Nov 1994
I have long been an advocate of literate programming, but until recently had no choice but to do it with nothing more than commenting conventions using an editor. Then I got the chance to look into the WEB family of tools and grabbed it. Unfortunately, the beautiful bubble seems to have burst. My attempts to use CWEB have resulted in completely incomprehensible output:

1. When I write code, I do not always go top-down. When creating a loop, I will often design the loop invariant, the inner code to implement it, and then the loop primitive and the initialization code. But when I tried to set up a scrap (?) with: @<process one unit@>= before writing the loop which refers to @<process one unit@>, the cweave output came out all wrong: the curly brackets for the if statements ended up adjacent, on the line with the "if" and the Boolean expression, rather than bracketing the conditional code.

2. Under certain situations that I do not yet understand, line breaks that are clearly needed are not put in. In one case, five separate input lines, each a separate statement, were output by cweave on one line. More than once, several statements from inside a compound statement were put on one output line while the remaining statements from the same compound statement were not. In other cases, the "if", the Boolean expression, and part of the conditional code were merged with a preceding assignment statement while the else clause was handled perfectly.

I have also had minor but still annoying problems:

1. I like to organize my code along the LISP COND style, which translates into a C else-if chain. Cweave seems to insist on indenting further at each link of the chain.

2. I take full advantage of the Unix file system to give readable names to my files, using underscores to separate words. When cweave takes the file name for use in the header, it can't handle this.

3. I think indentation is a great aid to readability, but two spaces per level is too small. I would prefer 3 or 4, yet do not know how to do this.

While I can't say much about these minor problems, I do know that in some sense, the major problems are "my fault". To bring up CWEB, I FTPed the TeX, Mfont, CWEB, etc. sources and did a full install starting with config at every step along the way. When I was done, I ran all the examples in the cweb/examples directory and the results looked good. Thus the installation and the software seem O.K.

On the other hand, I patterned my code after what I saw in the examples, with what seemed to be only minor changes to accommodate my style. Surely I have made mistakes, but they can't be major and I still have no idea where or what they are. I chose CWEB rather than one of the language independent tools because I knew I would be programming in C and I thought that a language specific tool would provide more robust error recognition and handling.

I suppose my questions are:

1. Is my experience unique or do others experience this kind of tool fragility? Are these problems unique to CWEB or would I have similar problems with the other tools? (I.e. is the problem rooted in the C syntax, the macro processing limitations of TeX, a by-product of the CWEB implementation, or inherent in any attempt at a powerful text-based tool of this nature?)

2. Is there documentation to help newcomers? I have read the literate-programming FAQ, but seen none specific to CWEB. I have a copy of Knuth and Levy, but its usage information is minimal. Reading the code to find out how to use it is exactly what I am trying to get away from.

3. Are there any additional tools that would make it easier to satisfy the finicky requirements of cweave. For example, since the C compiler tends to be finicky, I use indent to make sure that I haven't left out closing braces, etc., but indent isn't really applicable to CWEB files. Are there any useful Emacs modes or the like?

4. Where do I go from here?

I am looking forward to getting both specific answers and to reading whatever general discussion these questions might generate.


From:     Wheeler Ruml
Date: 01 Dec 1994
I tried to use CWEB two years ago, and was also frustrated by similar problems to the ones you describe. I am in the middle of trying again, this time for C++ instead of C, and I am just about to give up. As far as I can tell, the technology just isn't "out-of-the-box" yet. I am having a terrible time getting my Makefile to work properly (GNU Make), and I miss the highly-developed C and C++ modes in Emacs.

Some of the formatting problems can be fixed using @; and such - read the CWEB docs carefully! Hacking the actually grammar is really hairy - check out some of the alternate CWEB variants, they have a cleaner layout. If anyone can offer some help with these sorts of configuration issues, I would be grateful. I am not willing to spend this much time fiddling with my tools and down-grading my expectations.


From:     Jacob Nielsen
Date: 01 Dec 1994
Yuval Peduel writes: When I write code, I do not always go top-down. When creating a loop, I will often design the loop invariant, the inner code to implement it, and then the loop primitive and the initialization code. But when I tried to set up a scrap (?) with: @<process one unit@>= before writing the loop which refers to @<process one unit@>, the cweave output came out all wrong: the curly brackets for the if statements ended up adjacent, on the line with the "if" and the Boolean expression, rather than bracketing the conditional code.
If I understand correctly, cweave produces:

    if (<Condition>) {
      <Conditional code>
    }

and you want:

    if (<Condition>) 
    {
      <Conditional code>
    }

Welcome to the world of pretty-printing, style a la Knuth and Levy.

1. Is my experience unique or do others experience this kind of tool fragility? Are these problems unique to CWEB or would I have similar problems with the other tools? (I.e. is the problem rooted in the C syntax, the macro processing limitations of TeX, a by-product of the CWEB implementation, or inherent in any attempt at a powerful text-based tool of this nature?)

The rearranging of statements (if clauses etc.) are almost inevitable if you use any literate programming tools that does serious source code formatting. The problem arises when the programmer uses one convention for how the code should look and the tool uses another. If all adhered to the Knuth/Levy style of formatting code, there would be no problems. If you want "poor mans pretty-printing" you should take a look at noweb. noweb is independent of the programming language, but "poor mans pretty-printing" has been added for C. My definition of "poor mans pretty-printing": It typesets keywords in bold etc. but respects newlines, indentation, spaces and such.

Are there any useful Emacs modes or the like?

I think there is a CWEB-mode (web-mode?)

PS: I think that pretty-printed code looks good, but it seems to be too much trouble.


From:     Joachim Schrod
Date: 01 Dec 1994
Are there any useful Emacs modes or the like?

Jacob Nielsen writes: I think there is a CWEB-mode (web-mode?)

Forget it. If you're used cc-mode and auctex, you'll throw it out immediately. In addition, it globally rebinds standard keys. Similar as web-mode does. For me, that's always a sign that authors did not understand Emacs concepts. There exists no really good Emacs support for literate programming until now, and that's a bad sign, actually. (I have to admit that I don't like nuweb-mode, either. To edit the source in an own window defeats the whole purpose of literate programming: Handling documentation and source as one unit.)

PS: I think that pretty-printed code looks good, but it seems to be too much trouble.

Me too, and the success of cc-mode/font-lock/hilit (or Borland IDE-style editors, for that matter) shows that people like it. The problem is more the current fixed formatting engine of CWEB than the process of pretty-printing as a whole.


From:     Marc van Leeuwen
Date: 01 Dec 1994
Yuval Peduel writes: I have long been an advocate of literate programming, but until recently had no choice but to do it with nothing more than commenting conventions using an editor. Then I got the chance to look into the WEB family of tools and grabbed it. Unfortunately, the beautiful bubble seems to have burst.

From what follows I gather that your problems are partly that you are dissatisfied with the conventions (and a bug) built into Levy/Knuth CWEB, and partly that you are experiencing parsing problems that CWEAVE fails to diagnose, but which completely screw up your output. I would urge you to try my CWEBx system, that was designed to be more flexible, informative, comprehensible, and correct than CWEB, yet basically works just like it. In fact, when using compatibility mode (command line option `+c') you should be able to process Levy/Knuth CWEB sources just as they are, but hopefully with more pleasant output. Limitations: the current version is a beta version; only fairly basic C++ is supported; the manual is in the process of being rewritten and therefore not complete (but the recent additions are listed in a separate file). Below I will indicate how the possibilities of CWEBx relate to your problems.

My attempts to use CWEB have resulted in completely incomprehensible output: 1. When I write code, I do not always go top-down. When creating a loop, I will often design the loop invariant, the inner code to implement it, and then the loop primitive and the initialization code. But when I tried to set up a scrap (?) with: @<process one unit@>= before writing the loop which refers to @<process one unit@>, the CWEAVE output came out all wrong: the curly brackets for the if statements ended up adjacent, on the line with the "if" and the Boolean expression, rather than bracketing the conditional code.

So you dislike the ugly brace style promoted by K&R, which Levy/Knuth CWEB has implemented. CWEBx gives you the choice between three brace styles, two of which (including the default) align opening and closing braces vertically.

2. Under certain situations that I do not yet understand, line breaks that are clearly needed are not put in. In one case, five separate input lines, each a separate statement, were output by CWEAVE on one line. More than once, several statements from inside a compound statement were put on one output line while the remaining statements from the same compound statement were not. In other cases, the "if", the Boolean expression, and part of the conditional code were merged with a preceding assignment statement while the else clause was handled perfectly.

You have definitely run into syntax problems here, which may have a number of causes. Common ones are macro invocations that are used in a way other than as an expression (which is what they look like), such as a complete statement (no semicolon after it), and typedef identifiers that CWEB does not know about; the former problem can be solved using `@;', the latter using `@f' or `@s' (or in CWEBx outside compatibility mode, even better by using the `@h' command to specify that included header files should be scanned for typedef declarations). To diagnose your problem, you may like to view any irreducible scrap sequences (which is a technical term for what remains from input that could not be completely digested by the parser). To obtain this, place `@1' in your first section, or for CWEBx specify a `+d' command option to CWEAVE.

I have also had minor but still annoying problems: 1. I like to organize my code along the LISP COND style, which translates into a C else-if chain. CWEAVE seems to insist on indenting further at each link of the chain.

This is strange; CWEB certainly does process `if.. else if ... else if' chains without increasing indentation. Maybe you have the same problems as under 2. above here?

2. I take full advantage of the Unix file system to give readable names to my files, using underscores to separate words. When CWEAVE takes the file name for use in the header, it can't handle this.

This is a bug in Levy/Knuth CWEB. CWEBx handles special characters in file names correctly.

3. I think indentation is a great aid to readability, but two spaces per level is too small. I would prefer 3 or 4, yet do not know how to do this.

In Levy/Knuth CWEB indentation is fixed to one `em' (the width of a \quad). In CWEBx you could say \indentation{2em} in limbo, or \indentation{1cm}, or whatever unit of indentation you like. This is really not a property of the CWEB programs, but of the TeX macro format used.

I chose CWEB rather than one of the language independent tools because I knew I would be programming in C and I thought that a language specific tool would provide more robust error recognition and handling.

It surely should.

I suppose my questions are: 1. Is my experience unique or do others experience this kind of tool fragility? Are these problems unique to CWEB or would I have similar problems with the other tools? (I.e. is the problem rooted in the C syntax, the macro processing limitations of TeX, a by-product of the CWEB implementation, or inherent in any attempt at a powerful text-based tool of this nature?)

I would say all your problems can be solved within the CWEB context, and most are solved in CWEBx. There are a few fundamental problems, but you are not very likely to run into them. (One is for instance typedef declarations that are local to a block; CWEAVE has no idea of lexical ranges (which might be quite disconnected in the CWEB source) and simply assumes all typedef declarations to be global. This could be a problem in C++, particularly when using templates, but for C I have never seen a local typedef.)

2. Is there documentation to help newcomers? I have read the literate-programming FAQ, but seen none specific to CWEB. I have a copy of Knuth and Levy, but its usage information is minimal. Reading the code to find out how to use it is exactly what I am trying to get away from.

CWEBx comes with a manual that tries to explain all relevant issues in a much more elaborate way than the Levy/Knuth manual.

3. Are there any additional tools that would make it easier to satisfy the finicky requirements of CWEAVE. For example, since the C compiler tends to be finicky, I use indent to make sure that I haven't left out closing braces, etc., but indent isn't really applicable to CWEB files. Are there any useful Emacs modes or the like?

CWEBx's CTANGLE counts braces and parentheses in every macro or module body, and reports and "corrects" any ones that are unbalanced. I think this is easier and more reliable than any brace matching done by an editor (they tend to get confused by the mixture of different lexical conventions that is used in CWEB source code).


From:     Marc van Leeuwen
Date: 01 Dec 1994
Wheeler Ruml writes: I tried to use CWEB two years ago, and was also frustrated by similar problems to the ones you describe. I am in the middle of trying again, this time for C++ instead of C, and I am just about to give up. As far as I can tell, the technology just isn't "out-of-the-box" yet. I am having a terrible time getting my Makefile to work properly (GNU Make), and I miss the highly-developed C and C++ modes in Emacs.

What's the problem with make files? The only one I know of is that if CTANGLE writes multiple files (e.g., a program file and a header file) then these will always be updated when any change is made to that source file. You could solve this by moving the old files into a subdirectory before running CTANGLE, and afterwards compare the new files with the old ones, moving the old files back to replace the new ones if they are equal, or removing them if not. All this could be specified in the make file. This is like noweb's cpif script, except that it does not require files to be written on stdout. The main problem I foresee is that files may change due to changing #line directives, while their actual contents is constant. For program files I would prefer to recompile the file if any #line directives have changed, lest my debugger would get confused, but for header files I might prefer to keep the old (incorrect) #line directives in order to preserve the older timestamp on the file; it is not difficult to write a comparison program that ignores lines starting with #line.


From:     Balasubramanian Narasimhan
Date: 01 Dec 1994
Yuval Peduel writes: I have long been an advocate of literate programming, but until recently had no choice but to do it with nothing more than commenting conventions using an editor. Then I got the chance to look into the WEB family of tools and grabbed it. Unfortunately, the beautiful bubble seems to have burst. My attempts to use CWEB have resulted in completely incomprehensible output:

My attempts at literate programming in CWEB have left me disheartened too. When I started work on the Diehard tests for Random Number Generators, I thought here was a project that would really benefit from a literate programming style. However, a few attempts mangled both the typesetting and code and I had to abandon the effort. I have one example ready at hand so users can duplicate one problem. (This example below was a first attempt some time ago.) Run the file below thru' cweave and TeX. Things should be nice. Now change every instance of the word "class" into "CLASS" and run it thru' cweave and TeX and see how it is typeset.

***************Begin example.w*********************
% Rngs: A package of Random Number Generators by George~Marsaglia,
%       B.~Narasimhan, and Arif~Zaman.

%\nocon % omit table of contents
\datethis % print date on listing
\def\DH{Diehard}
\def\TCL{Tcl/Tk}
\def\RNG{RNG}
\def\RNGS{RNGs}

@** Introduction. This document forms part of the \DH{} package,
written by George~Marsaglia, B.~Narasimhan, and Arif~Zaman. It
describes the components of the header file which contains
implementation limits as well as function prototypes. Anyone who
wishes to add/or modify \RNGS{} should include this file. 

@ Here is an overview of the organization. The entire header file is
enclosed within a conditional macro to prevent the definitions being
invoked repeatedly---the definitions obtain only when the variable
|_RNG_H| is undefined, which is the case during the first include.
This is so standard a technique that we shall not discuss
such little tricks henceforth.

@c
#ifndef _RNG_H
@<Constant definitions@>@/
@<Variable type definitions@>@/
@<Header files to include@>@/
@<Function prototypes@>@/
@<Other useful definitions@>@/
#define _RNG_H
#endif

@ We need some constants that define the limits of our programs. These
constants can be changed if necessary. The |MAX_RNGS| constant
indicates the maximum number of random number generators that can be
added. In addition, the |CLASS| allows us to use a single header file
with external functions correctly visible.

@<Constant definitions@>=
#ifdef _DH_MAIN
#define CLASS 
#else
#define CLASS extern
#endif
#define MAX_RNGS 50

@ We need three new variable types: one for keeping track of the
values returned by a random number generator, another for storing the
value of the generator, and finally, a structure that can be used to
store the state of a {\sl generic\/} random number generator.

@<Variable type...@>=
@<Typedef for value returned by rng@>@/
@<Typedef for mixed value@>@/
@<Structure for storing state of an rng@>

@ Some random number generators return sequences of reals while others
return sequences of integers. (Throughout this document, when we refer
to integers or unsigned integers, or doubles, we mean long integers,
unsigned long integers, and doubles as defined in the language C.) The
|rng_type| definition declares three kinds of values that might be
returned by \RNGS: |RNG_DOUBLE| for a double real value, |RNG_ULONG|
for an unsigned long integer, and |RNG_ILONG| denotes a long
integer. It is up to us to keep track of the type of the value
returned by any generator.

@<Typedef for value returned by rng@>=
typedef enum {RNG_DOUBLE, RNG_ULONG, RNG_ILONG} rng_type;

@ We need a variable to store the value returned by any \RNG. The
following type definition is natural.

@<Typedef for mixed value@>=
union mixed_value {
    unsigned long uival;        
    long int ival;      
    double dval;                
  } mixed_value;

@ Next, we need a structure for storing the state of {\sl any\/} \RNG.
More complicated \RNGS{} usually use a table of values for generating
the next number in a sequence. The tables might be made up of real
numbers or signed integers or unsigned integers. Typically, the table
tends to be homogeneous, i.e., it is either composed of reals, or
integers but not a mixture of both. We need some more constants.
@<Constant...@>=
#define MAX_TBL_SIZE 1024
#define MAX_INDICES 8
#define MAX_NAME_LEN 25

@ Now on to a structure for storing the state and related information
pertaining to an \RNG. Some fields suggest themselves. Every \RNG{}
will be uniquely identified by an index stored in |index|.
The string |name| of maximum length |MAX_NAME_LEN| will hold a
descriptive name of an \RNG, |type|, the type of value the generator
returns, |bits| the number of valid bits in the generator, |start| and
|end|, the starting and ending bits, if applicable. The last two
fields are expected to be zero when not applicable. The table itself
is |table| and |table_type| reveals what type of values are stored in
the table. Indices into |table| are almost surely necessary and
|table_inds| provides up to a maximum of |MAX_INDICES|. We shall be
generous and allow for another similar table |x_table| analogous to
|table| for other unseen possibilities---it is left to the programmer
to use them as he sees fit.

@<Structure...@>=
typedef struct rng_info
{
  int index;               /* The index of the generator. */
  char name[MAX_NAME_LEN]; /* Name of the Rng. */
  rng_type type;           /* Type value Rng returns. */
  int bits;                /* No. of valid bits in the rng. */
  int start;               /* Start of kosher bits. */
  int end;                 /* End of kosher bits. */
  rng_type table_type;     /* Type of value in table. */
  mixed_value table[MAX_TBL_SIZE]; /* The table itself. */
  int table_inds[MAX_INDICES]; /* Indices into table. */
  rng_type x_table_type;     /* Type of value in the extra table. */
  mixed_value x_table[MAX_TBL_SIZE]; /* The extra table. */
  int x_table_inds[MAX_INDICES]; /* Indices into extra table. */
} rng_info;

@ We must include the standard \TCL, math and string header files
since we will be using \TCL, math and string functions. 

@<Header files...@>=
#include <tcl.h>
#include <math.h>
#include <string.h>

@ We now define our function prototypes. These can be roughly divided
as follows.

@<Function prototypes@>=
@<Tcl oriented function prototypes@>@/
@<Diehard function prototypes@>@/
@<Random Number Generator function prototypes@>@/

@ The \DH{} package uses \TCL, and therefore some TCLish conventions
need to addressed. \TCL{} passes a handle to a Tcl intepreter that can
be used for communication. It seems to be a waste to pass the
interpreter handle everytime and so we shall use two functions for
accessing and storing interpreter handles. Note that this arrangement
imposes limitations on this package---most notably, support for
multiple interpreters is lost. We might address this deficiency at a
later point.

@<Tcl oriented...@>= 
extern void set_interpreter(Tcl_Interp *a);
extern Tcl_Interp *get_interpreter(void);

@ The following functions are defined elsewhere in respective
files. However, they are so crucial to our design that we provide a
brief description of each right here. The functions |get_rng_index|
and |set_rng_index| are the accessor and modifier functions for a
global variable that holds the index of the currently-chosen \RNG. 

@<Diehard function...@>=
CLASS int get_rng_index(void);
CLASS void set_rng_index(int a);

@ The function |set_current_rng| allows us to set the current \RNG{}
to any one of the available \RNGS. Note that it takes a pointer to
an \RNG{} function as its argument. The function |current_rng| 
returns a value from the currently chosen \RNG. The value returned is
a pointer to a variable of type |mixed_value|. 

@<Diehard function...@>=
CLASS void set_current_rng(mixed_value* (*a)());
CLASS mixed_value *(*current_rng)(void);

@ The function |get_rng_info| returns a pointer to a structure of type
|rng_info|. This structure should always be current, and anytime an
\RNG{} is chosen, one must ensure that this structure contains
pertinent information as well as a snapshot of the state of the
\RNG. The additional functions |get_rng_type|, |set_rng_type|,
|get_rng_bits|, |set_rng_bits|, |get_rng_name|, |set_rng_name| are
provided for conveniently accessing and modifying commonly used fields
of the |rng_info| structure.

@<Diehard function...@>=
CLASS rng_info *get_rng_info(void);
CLASS rng_type get_rng_type(void);
CLASS void set_rng_type(rng_type a);
CLASS int get_rng_bits(void);
CLASS void set_rng_bits(int a);
CLASS char *get_rng_name(void);
CLASS void set_rng_name(char *a);

@ The boolean function |rng_properly_chosen| can be used to avoid
errors---it returns |true| when all is well. 

@<Diehard function...@>=
CLASS int rng_properly_chosen(void);

@ This section lists all the \RNGS{} implemented. We provide three
random number generators at present: Lagged-Fibonacci, KISS, and
Super~Duper.

@<Random...@>=
#ifdef _DH_MAIN
@<Super Duper@>@/
@<Lagged Fibonacci@>@/
@<KISS@>@/
#endif

@ This section explains how \RNGS{} should be designed for use with
\DH. For every \RNG{}, there must be at least four
routines: (a)~an initializing routine, that seeds the state of the
\RNG{} possibly based on user chosen seed values, (b)~a routine that
computes the next random number in the sequence and returns a pointer
to a |mixed_value| type, (c)~a routine that stores the state of the
\RNG{} in a structure of type |rng_info|, a pointer to which is passed
to the routine, and (d)~a routine that sets the state of the \RNG{}
from a structure of type |rng_info|, a pointer to which is again
passed to the routine. These requirements are best illustrated by a
detailed example, say Marsaglia's Super~Duper. 

@<Super Duper@>=
@<Super-Duper initializer prototype@>@/
@<Super-Duper generator prototype@>@/
@<Super-Duper state saver prototype@>@/
@<Super-Duper state setter prototype@>@/

@ The initialization routine for Super Duper is |supdup_init| and it takes
a pointer to |rng_info| as argument and returns an integer code. The
code will be either |TCL_OK|, indicating that all was well, or
|TCL_ERROR|, indicating something was amiss.

@<Super-Duper initializer prototype@>=
extern int supdup_init(rng_info *a);

@ The Super duper generator itself is |supdup| and it returns a
pointer to |mixed_value|. 

@<Super-Duper generator prototype@>=
extern mixed_value *supdup(void);

@ The routine that will save the state of Super Duper in an |rng_info|
structure is |supdup_save_state|. 

@<Super-Duper state saver prototype@>=
extern void supdup_save_state(rng_info *a);

@ And finally, |supdup_set_state| will set the state of Super Duper from
an |rng_info| structure.

@<Super-Duper state setter prototype@>=
extern void supdup_set_state(rng_info *a);

@ All other generators are similar with some minor differences. The
Lagged~Fibonacci generator, for example, has many specialized
sub-generators for efficiency. They are described in the file
|fibo.c|. 

@<Lagged Fibonacci@>=
extern mixed_value *fibomulmod(void);
extern mixed_value *fiboplusmod(void);
extern mixed_value *fiboxormod(void);
extern mixed_value *fibosubmod(void);
extern mixed_value *fibomul32(void);
extern mixed_value *fiboplus32(void);
extern mixed_value *fiboxor32(void);
extern mixed_value *fibosub32(void);
extern int fibo_init(rng_info *a);
extern void fibo_save_state(rng_info *a);
extern void fibo_set_state(rng_info *a);

@ This section pertains to the KISS generator of Marsaglia and
Zaman. KISS stands for ``Keep It Simple, Stupid.''

@<KISS@>=
extern int kiss_init(rng_info *a);
extern mixed_value *kiss(void);
extern void kiss_save_state(rng_info *a);
extern void kiss_set_state(rng_info *a);

@ The following definitions are for convenience.

@<Other...@>=
#define ulong_rng_value (*(unsigned long *)(*current_rng)())
#define int_rng_value (*(int *)(*current_rng)())
#define double_rng_value (*(double *)(*current_rng)())
#define ulong_rng_value_ptr (unsigned long *)(*current_rng)()
#define ulong_rng_value_ptr (int *)(*current_rng)()
#define double_rng_value_ptr (double *)(*current_rng)()
#define max(x,y) ((x) > (y) ? (x) : (y))
#define lg(x) (log(x)/log(2.0))

@* Index.
Here is a list of the identifiers used, and where they appear. Underlined
entries indicate the place of definition. Error messages are also shown.

*************End example.w**************************


From:     Andrew John Mauer
Date: 01 Dec 1994
Yuval Peduel writes: 2. I take full advantage of the Unix file system to give readable names to my files, using underscores to separate words. When cweave takes the file name for use in the header, it can't handle this.

This is a minor issue that I have also run across in noweb. I found the simplest solution was to bend a little and use hyphens (`-') rather than underscores for separators in filenames. It makes life easier.


From:     Lee Wittenberg
Date: 01 Dec 1994
Balasubramanian Narasimhan writes: My attempts at literate programming in CWEB have left me disheartened too. When I started work on the Diehard tests for Random Number Generators, I thought here was a project that would really benefit from a literate programming style. However, a few attempts mangled both the typesetting and code and I had to abandon the effort. I have one example ready at hand so users can duplicate one problem. (This example below was a first attempt some time ago.)

Try noweb. It's much simpler.

Run the file below thru' cweave and TeX. Things should be nice. Now change every instance of the word "class" into "CLASS" and run it thru' cweave and TeX and see how it is typeset.

I think you mean it the other way around. The program you submitted used `CLASS' rather than `class'. Since the latter is a reserved word in C++ (which is accepted by CWEB), it's not surprising that the typesetting is different. Given your definition of CLASS:

#ifdef _DH_MAIN
#define CLASS 
#else
#define CLASS extern
#endif
You should probably have an "@f CLASS int" in the definitions part of the chunk (@s will do as well). This will work even when you change all the `CLASS's to `class'.


From:     Marc van Leeuwen
Date: 02 Dec 1994
Balasubramanian Narasimhan writes: My attempts at literate programming in CWEB have left me disheartened too. When I started work on the Diehard tests for Random Number Generators, I thought here was a project that would really benefit from a literate programming style. However, a few attempts mangled both the typesetting and code and I had to abandon the effort. I have one example ready at hand so users can duplicate one problem. (This example below was a first attempt some time ago.) Run the file below thru' cweave and TeX. Things should be nice. Now change every instance of the word "class" into "CLASS" and run it thru' cweave and TeX and see how it is typeset.

Well, actually the example contained |CLASS| rather than |class|, but I got the point: with |class| it works, with |CLASS| it doesn't. Now you might be alerted by the fact that |class| is a keyword in C++, and (Levy/Knuth) CWEB handles C++ (no way to shut this off). Apart from this your program had a few weak spots:

#ifdef _DH_MAIN
#define CLASS 
#else
#define CLASS extern
#endif

This is of course relevant to |CLASS|: it stands for the keyword |extern| (or for nothing). If you want to get proper formatting, you must inform CWEAVE that |CLASS| is not an ordinary identifier, but is used as a keyword. The proper way to say this is to specify "@f CLASS extern" in some section (e.g., the one containing the lines above) so that CWEAVE will treat |CLASS| just like it would treat |extern|. You cannot expect formatting to be proper if you are doing subtle things behind CWEAVE's back (and no, CWEAVE does not attempt to expand macros; imagine the mess that would give in more complicated cases).

@<Typedef for mixed value@>=
union mixed_value {
     unsigned long uival;        
     long int ival;      
     double dval;                
} mixed_value;

You said `typedef' in the module name, but didn't you forget to say it in the module itself? Better insert it there, or you C compiler will complain.

#include <tcl.h>

It appears that a typedef identifier |Tcl_Interp| is being defined in <tcl.h>. Then you should also tell this to CWEAVE: "@f Tcl_Interp int" so that CWEAVE will know to treat |Tcl_Interp| as a type. With these three small changes, your program comes out beautifully, whether using |class| or |CLASS| or something else.

So the really puzzling question is: why did it work well without the changes, provided you use |class|? Well, in your case you were in fact lucky that |class| is a C++ keyword, since this happened to make your program come through the parser, even though it is not proper C++ code (your use of |class| does not match the way it is used in C++). The grammar used by CWEAVE has some rules that are a bit too general, and they matched you use of |class| although they were really meant for different purposes. And what about |mixed_value|? Again there was C++ to your rescue, since for C++ a declaration of the form |union x { ... }| is treated as if it were |typedef union x { ... } x| in C; so your code actually was interpreted as if it were

typedef union mixed_value { ... } mixed_value;
mixed_value mixed_value;

in other words, |mixed_value| is both declared a typedef identifier, and a variable of that type (compilers don't like this in one same scope, but CWEAVE is not that picky). So despite the fact that you forgot |typedef|, CWEAVE treated |mixed_value| as a typedef identifier, which is in fact what you had intended. The declarations involving |Tcl_Interp| did come out wrong, but the effect was not too dramatic, and you probably overlooked it.

Conclusion: if you want nice output, you should inform CWEAVE about anything that it needs to know about identifiers being used in unusual ways, and `@f' (or `@s') is often the way to do this. For types defined in header files, my version of CWEB, called CWEBx, will be able to do without `@f' lines, provided it can locate the header files in question.


From:     Balasubramanian Narasimhan
Date: 02 Dec 1994
Marc van Leeuwen writes: <Lots of useful information deleted..>

I wish to thank Marc for his follow-up. Just a minor point. The missing typedef that Marc refers to was a result of my bungled editing. In any case, his analysis was an eye-opener.


From:     Yuval Peduel
Date: 03 Dec 1994
First, my thanks to all who responded. Even the discouraging messages were helpful. On some of the specific points:

the cweave output came out all wrong: the curly brackets for the if statements ended up adjacent, on the line with the "if" and the Boolean expression, rather than bracketing the conditional code.
Jacob Nielsen wrote: If I understand correctly, cweave produces:
    if (<Condition>) {
      <Conditional code>
    }

and you want:

    if (<Condition>) 
    {
      <Conditional code>
    }

My apologies for not making myself clear. What I actually got from cweave was something like:

     if (<Condition>) {  }
       <Conditional code>

and:

     for (;;) {  }
       <For body>

This is a somewhat more severe problem than one on pretty-printing style.

Welcome to the world of pretty-printing, style a la Knuth and Levy.

I actually prefer having the open brace on the same line as the if, for, or while, so I can't complain about their choice of default. But I gather that this and such other parameters as the indentation level are not select-able by the user. This strikes as me reasonable in a package written for one person's use, but not for a general release.

The rearranging of statements (if clauses etc.) are almost inevitable if you use any literate programming tools that does serious source code formatting. The problem arises when the programmer uses one convention for how the code should look and the tool uses another. If all adhered to the Knuth/Levy style of formatting code, there would be no problems.

I understand and appreciate the need to do some code moving for full pretty-printing. I have also read about the advantages in comprehensibility that full formatting can provide. Nonetheless, I still see the need for some control over the process. After all, some of us have weaker eyes and need stronger clues, larger fonts, etc.

If you want "poor mans pretty-printing" you should take a look at noweb. noweb is independent of the programming language, but "poor mans pretty-printing" has been added for C. My definition of "poor mans pretty-printing": It typesets keywords in bold etc. but respects newlines, indentation, spaces and such.

If I get everything else working and the Knuth/Levy style becomes my primary problem, I will consider this. In the meantime, I have bigger problems.

Marc van Leeuwen wrote: You have definitely run into syntax problems here, which may have a number of causes. Common ones are macro invocations that are used in a way other than as an expression (which is what they look like), such as a complete statement (no semicolon after it), and typedef identifiers that CWEB does not know about; the former problem can be solved using `@;', the latter using `@f' or `@s' (or in CWEBx outside compatibility mode, even better by using the `@h' command to specify that included header files should be scanned for typedef declarations). To diagnose your problem, you may like to view any irreducible scrap sequences (which is a technical term for what remains from input that could not be completely digested by the parser). To obtain this, place `@1' in your first section, or for CWEBx specify a `+d' command option to CWEAVE.

This is both good news and depressing. Good in that it gives me hope that there is a path out. Depressing in that one has to appeal to all of you out there to get this info.

I would say all your problems can be solved within the CWEB context, and most are solved in CWEBx. There are a few fundamental problems, but you are not very likely to run into them. (One is for instance typedef declarations that are local to a block; CWEAVE has no idea of lexical ranges (which might be quite disconnected in the CWEB source) and simply assumes all typedef declarations to be global. This could be a problem in C++, particularly when using templates, but for C I have never seen a local typedef.)

CWEBx comes with a manual that tries to explain all relevant issues in a much more elaborate way than the Levy/Knuth manual.

I will be looking at CWEBx in general and this manual in particular. Thanks. So far, after reading the messages my original post brought out, I would have to agree with Wheeler Ruml when he says:

the technology just isn't "out-of-the-box" yet

While I am willing to pursue it for a while longer, there is no way I can introduce CWEB as it stands for general use in my current environment. On the other hand, I still don't understand why this is the case. The individual pieces should all be well-understood by now and while putting them together is far from trivial, we are still dealing with a limited domain, so it should not be possible. What am I missing here? (If the response is, "look at the poor error handling of the C compilers out there, why should CWEB be any better?", I would have to say "ouch" and then reposte with, "but we are concerned with the human interface; they aren't!")


From:     Tommy Marcus McGuire
Date: 06 Dec 1994
Yuval Peduel wrote: My apologies for not making myself clear. What I actually got from cweave was something like:
     if (<Condition>) {  }
       <Conditional code>

and:

     for (;;) {  }
       <For body>

This is a somewhat more severe problem than one on pretty-printing style.

You aren't kidding. Could you post some of the code scraps (sections, whatever) that produce this kind of result? It has been quite a while since I used CWEB, but I never saw anything like that unless you had the <Conditional code> line outside the braces.


From:     Yuval Peduel
Date: 09 Dec 1994
Marc van Leeuwen writes: You have definitely run into syntax problems here, which may have a number of causes. Common ones are macro invocations that are used in a way other than as an expression (which is what they look like), such as a complete statement (no semicolon after it), and typedef identifiers that CWEB does not know about; the former problem can be solved using `@;', the latter using `@f' or `@s' (or in CWEBx outside compatibility mode, even better by using the `@h' command to specify that included header files should be scanned for typedef declarations). To diagnose your problem, you may like to view any irreducible scrap sequences (which is a technical term for what remains from input that could not be completely digested by the parser). To obtain this, place `@1' in your first section, or for CWEBx specify a `+d' command option to CWEAVE.

I have taken this advice, gone through my code, and fixed numerous problems. Some I identified just by reading the code, some by looking at the output of indent applied to individual fragments, and some by running CTANGLE. However, there are problems that I just cannot see. Here is an example of a short CWEB file that results in inappropriate code formatting. I am sure the error is mine, but where is it?

\@*Test.
@1
This is an attempt to figure out what goes wrong with my output.

@c
void
foo(int bar)
{
   @<try reading the data@>;
}

@ This is the section that hasn't come out right.

@<try reading the data@>=
no_data_reads = 0;
for (;;) {
   bytes_read = (*port->foo.spd->spd_mbuf_read)(port->foo.handle,
                                                &mbuf_chain,
                                                FLAG,
                                                &error_byte,
                                                &error);
   if (error) {
      break;      
   }
   else if (error_byte != SPD_RCURG) {
      break;
   }
   else break;
}
@ A termination section.
Any assistance welcomed. In response to my complaint about the handling of braces for conditional code (after if's, for's, etc.) Tommy McGuire writes:

You aren't kidding. Could you post some of the code scraps (sections, whatever) that produce this kind of result? It has been quite a while since I used CWEB, but I never saw anything like that unless you had the <Conditional code> line outside the braces.

Fortunately or unfortunately, I can't. After I went through the code fixing all the syntax errors I could find, this particular problem vanished. I still have no idea of which syntax errors caused which output problems.

I did, however, get the impression, which may be wrong, that both CWEB and CWEBx are very perturbed by a section in limbo: a section whose name is not referenced. (Just going through the code to make sure that every named fragment was referenced before it was defined seemed to significantly improve the output.) I can understand why this might be a problem for CTANGLE, but why should CWEAVE care?


From:     Marc van Leeuwen
Date: 12 Dec 1994
Yuval Peduel writes: However, there are problems that I just cannot see. Here is an example of a short CWEB file that results in inappropriate code formatting. I am sure the error is mine, but where is it? [...]
    bytes_read = (*port->foo.spd->spd_mbuf_read)(port->foo.handle,
                                                 &mbuf_chain,
                                                 FLAG,
                                                 &error_byte,
                                                 &error);
    if (error) {
       break;      
    }

The "error" is the identifier |error|, which is indeed yours. The problem is that CWEAVE treats |error| as a reserved word, mainly so that it will come out in boldface when you write an `#error' preprocessor directive. Although the identifiers that can follow `#' in a preprocessor line are not all reserved words in C or C++, CWEAVE will still treat them like that. A solution to this problem in your case is to insert a line `@f error x', to demote |error| to an ordinary identifier. Note that there are other identifiers of this kind that are likely to cause trouble, for instance |line| and |undef|. I have always found this behavior of CWEAVE irritating, and have recently changed CWEBx so that it will treat identifiers after `#' specially only in that context, so your example causes no problem in CWEBx. A complete list of all reserved words appears in section 28 of the Levy/Knuth CWEAVE listing (note that all C++ keywords are there; to use them as identifiers in C requires similar precautions as for |error|). For CWEBx they are in section 117, but you needn't look it up; the only anomaly is |va_dcl| which is there for historic reasons (types that are defined in certain ANSI header files, like |FILE|, are also predefined in CWEAVE, whether or not you include that header file).

I did, however, get the impression, which may be wrong, that both CWEB and CWEBx are very perturbed by a section in limbo: a section whose name is not referenced. (Just going through the code to make sure that every named fragment was referenced before it was defined seemed to significantly improve the output.) I can understand why this might be a problem for CTANGLE, but why should CWEAVE care?

I don't understand this. First of all a "section in limbo" is a contradiction in terms, since limbo is the TeX text before the first section. I assume you meant a section that is defined (or cited) but never used. This causes a warning message by CWEAVE both in CWEB and CWEBx, since it could indicate an oversight or typing error on the part of the programmer, but it should not otherwise affect the output of CWEAVE. And CTANGLE doesn't care about unreferenced modules at all, although it will complain about undefined ones. Certainly it should make no difference whether a module is defined before it is used or the other way around: both are perfectly valid (although the former is a bit less customary), and should lead to well-formatted output.


From:     Yuval Peduel
Date: 12 Dec 1994
Marc van Leeuwen writes: The "error" is the identifier |error|, which is indeed yours. The problem is that CWEAVE treats |error| as a reserved word, mainly so that it will come out in boldface when you write an `#error' preprocessor directive. Although the identifiers that can follow `#' in a preprocessor line are not all reserved words in C or C++, CWEAVE will still treat them like that. A solution to this problem in your case is to insert a line `@f error x', to demote |error| to an ordinary identifier.

Thank you. Now that I understand this, it seems crystal clear, though a bit perverted. I would never have found this on my own.

I have always found this behavior of CWEAVE irritating, and have recently changed CWEBx so that it will treat identifiers after `#' specially only in that context, so your example causes no problem in CWEBx.

True. I did download CWEBx, read the manual, and try it on my program. The results were not identical to CWEB, but they seemed to show similar problems (and gave the same error messages). When I started pruning my big program to an excerpt I could post, I used CWEB rather than CWEBx just because it seems more people have had experience with the former. I just tried CWEBx on both the excerpt and on the original program. CWEBx does handle the excerpt properly, but it still doesn't handle the full program.

A complete list of all reserved words appears in section 28 of the Levy/Knuth CWEAVE listing (note that all C++ keywords are there; to use them as identifiers in C requires similar precautions as for |error|). For CWEBx they are in section 117, but you needn't look it up; the only anomaly is |va_dcl| which is there for historic reasons (types that are defined in certain ANSI header files, like |FILE|, are also predefined in CWEAVE, whether or not you include that header file).

Argh. Why isn't this part of the user documentation?

I don't understand this. First of all a ``section in limbo'' is a contradiction in terms, since limbo is the TeX text before the first section.

Apologies for a misuse of a technical term.

I assume you meant a section that is defined (or cited) but never used. This causes a warning message by CWEAVE both in CWEB and CWEBx, since it could indicate an oversight or typing error on the part of the programmer, but it should not otherwise affect the output of CWEAVE. And CTANGLE doesn't care about unreferenced modules at all, although it will complain about undefined ones. Certainly it should make no difference whether a module is defined before it is used or the other way around: both are perfectly valid (although the former is a bit less customary), and should lead to well-formatted output.

Sounds good, if it is just a matter of warning messages. I came by my impression after seeing error messages such as:

   This is CWEAVE (Version x2+1.2a)
   *1
   ! Never used: <should we be trying to read?>
   Writing the output file...*1
   ! You need an = sign after the module name. (l. 16)
   @<should we be trying to read?@>;

Since my last post I discovered that this was due to my using @p to introduce the main program rather than @c. (The documentation says they are equivalent; experience says otherwise.) Thanks again for your help. I will continue trying to put my program into a form that produces the kind of output I want and documenting my problems along the way. Perhaps, in the end, I will have enough confidence in the tools to try to get others to use them.