Agile Documentation

Ron Jeffries. Essential XP: Documentation

Extreme Programming (XP) has very high focus on incremental development. The development cycle is to begin with Simple Design, communicated through Pair Programming, supported by extensive Unit Tests, and evolved through Refactoring. For this to work, it must be possible to refactor the code: the code must be very clean and very clear. There are rules (see elsewhere a discussion of the rules of simplicity relating to reusability) that help programmers produce clear code. In essence, if there is an idea in our head when we write the code, we require ourselves to express that idea directly in the code. We treat the need for comments as "code smells" suggesting that the code needs improvement, because it isn't clear enough to stand alone. Our first action is to improve the code to be more clear. Whatever we cannot make clear in code, we then comment.

Unit tests are also documentation. The unit tests show how to create the objects, how to exercise the objects, and what the objects will do. This documentation, like the acceptance tests belonging to the customer, has the advantage that it is executable. The tests don't say what we think the code does: they show what the code actually does.

Scott Ambler. "All the Right Questions". Software Development, December 2002, pg. 44.

Extreme programming doesn't prevent you from writing documentation-it insists only that you justify why you need it. Documentation should be treated like any other requirement-estimated, prioritized, and planned for accordingly. In other words, don't blindly create documentation simply because it makes you feel comfortable; instead do it because it adds value. Visit agile documentation for more information.

Andreas Ruping. Agile Documentation, John Wiley, 2003, pg. 197-204.

  • Focused Information: A clear and identifiable focus on a particular topic makes a document concise and straightforward.
  • Individual Documentation Requirements: The most effective approach towards documentation is for each project to define its documentation requirements individually.
  • Focus on Long-Term Relevance: There is much value in documentation that focuses on issues with a long-term relevance.
  • Design Rationale: Design documents become a valuable source of information if they aren't restricted to describing the actual design, but also focus on the rationale behind the design and explain why the particular design was chosen.
  • The Big Picture: A good feel for a project is best conveyed through a description of the 'big picture' of the architecture that underlies the system under construction.
  • Separation of Description and Evaluation: Authors gain credibility if, in their documents, they clearly separate description from evaluation.
  • Realistic Examples: Project documents are much more convincing if they include realistic examples from the project's context.
  • Structured Information: Most project documents are best organized as sequential yet well-structured text. This begins with well-chosen chapters and sections, but may well extend to using textual building blocks consistently throughout a document.
  • Judicious Diagrams: Diagrams can provide excellent overviews, while an accompanying text explains details to the extent that is necessary.
  • Code-Comment Proximity: Documentation of the code, to the extent that a project team considers it necessary, is best done through source code comments. Separate documents should be reserved for higher-level issues such as overviews, requirements, design, and architecture.
  • A Distinct Activity: When documentation is considered a distinct project activity, and not just the by-product of coding, it can be assigned its own budget, priority, and schedule. Documentation can then be weighed against other project activities.
  • Writing and Reflection: To get the best out of documentation, team members have to spend time on the actual writing, as well as in reflection on what they have written, preferably in an undisturbed environment.
  • Review Culture: Documentation can profit from reviews, provided a review culture has been established in which both authors and reviewers feel comfortable.
  • Information Marketplace: Documents gain more attention if the intended readers are actively invited to read them.

Kent Beck. "A Theory of Programming". Dr. Dobb's Journal, November 2007

Code communicates well when a reader can understand it, modify it, or use it. While programming it's tempting to think only of the computer. However, good things happen when I think of others while I program. I get cleaner code that is easier to read, it is more cost-effective, my thinking is clearer, I give myself a fresh perspective, my stress level drops, and I meet some of my social needs. Part of what drew me to programming in the first place was the opportunity to commune with something outside myself. However, I didn't want to deal with sticky, inexplicable, annoying human beings. Programming as if people didn't really exist paled after only a couple of decades. Building ever-more-elaborate sugar castles in my mind became colorless and stale.

One of the early experiences that led me to focus on communication was discovering Knuth's Literate Programming: a progam should read like a book. It should have plot, rhythm, and delightful little turns of phrase. When Ward Cunningham and I first read about literate programs, we decided to try it. We sat down with one of the cleanest pieces of code in the Smalltalk image, the ScrollController, and tried to make it into a story. Hours later we had completely rewritten the code on our way to a reasonable paper. Every time a bit of logic was a little hard to explain, it was easier to rewrite the code than explain why the code was hard to understand. The demands of communication changed our perspective on coding.

There is a sound economic basis for focusing on communication while programming. The majority of the cost of software is incurred after the software has been first deployed. Thinking about my experience of modifying code, I see that I spend much more time reading the existing code than I do writing new code. If I want to make my code cheap, therefore, I should make it easy to read.

Scott Ambler. "What's Normal? Models to Documentation", Software Development's Agile Modeling Newsletter, December 2004.

In Agile Modeling (AM), you want to travel as light as possible, so you need to choose the best artifact to record information. (I use the term "artifact" to refer to any model, document, source code, plan or other item created during a software development project.) Furthermore, you want to record information as few times as possible; ideally only once. For example, if you describe a business rule in a use case, then describe it in detail in a business rule specification, then implement it in code, you have three versions of the same business rule to maintain. It would be far better to record the business rule once, ideally as human-readable but implementable code, and then reference it from any other artifact as appropriate.

Why do you want to record a concept once? First, the more representations that you maintain, the greater the maintenance burden, because you'll want to keep each version in sync. Second, you'll have greater traceability needs with multiple copies because you'll have to relate each version to its alternate representations to keep them synchronized when a change does occur. AM does advise you to update only when it hurts, but having multiple copies of something means you?re likely to feel the pain earlier and more often. Finally, the more copies you have, the greater the chance you'll end up with inconsistent information, as you probably won't be able to keep the versions synchronized.

It's interesting that traditional processes typically promote multiple recordings of technical information, such as representing business rules three different ways, while also prescribing design concepts such as normalization and cohesion that promote developing designs that implement concepts once. For example, the rules of data normalization motivate you to store data in one place and one place only. Similarly, in object-oriented and component-based design, you want to build cohesive items (components, classes, methods and so on) that fulfill only one goal. If this is O.K. for your system design, shouldn?t it also be O.K. for your software development artifacts? We clearly need to rethink our approach.

It's clear that you should store system information in one place and one place only, ideally in the most effective place. In software development, this concept is called "artifact normalization"; in the technical documentation world, it's called "single sourcing". With single sourcing, you aim to record technical information only once and then generate various documentation views as needed from that information. A business rule, for example, would be recorded using some sort of business-rule definition language. A human-readable view would be generated for your requirements documentation (easier said than done, but for the sake of argument, let's assume it's possible) and an implementation view generated that would be run by your business rule engine or compiled as application source code.

To make the single-sourcing vision work, you need a common way to record information. Darwin Information Typing Architecture (DITA) is an XML-based format that's promoted for single-sourced technical documentation. Nothing can stop you from creating your own storage strategy: Single sourcing is often approached in a top-down manner with the data structure for the documentation typically defined early in a project. However, you can take a more agile approach in which the structure emerges over time.

The primary challenge with traditional single sourcing? It requires a fairly sophisticated approach to technical documentation. This is perfectly fine, but unfortunately, many organizations aren?t yet able to achieve this vision and eventually back away from the approach. This doesn?t mean that you need to throw out the baby with the bath water; you should still strive to normalize all of your software development artifacts.

Just as it's extremely rare to find a perfectly normalized relational database, I suspect that you'll never truly be able to fully normalize all of your software artifacts. In the case of databases, performance considerations (and to be fair, design mistakes made by project teams) result in less-than-normal schemas. Similarly, everyone won't be able to work with all types of artifacts -- it isn?t realistic to expect business stakeholders to be able to read program source code and the über-tools required to support this vision continue to elude us.

Charles Connell. It's Not About Lines of Code.

Everyone wants programmers to be productive. Managers of programmers want maximum productivity -- it gets the work done faster and makes the managers look good. All other things being equal, programmers like being productive. They can get home earlier, reduce stress during the workday, and feel better about their finished products.

Lines of code per day -- This is the classic definition of software productivity for individual programmers. Unfortunately, as other authors have noted as well, the definition makes little sense. Assume Fred's code is of such poor quality that, for each day of work he does, someone else must spend five days debugging the code. Is Fred highly productive?
Lines of correct code per day -- This definition adjusts for the problem of a programmer producing lousy code. Suppose Fred cleans up his act. Imagine his code is completely bug-free, but contains no comments at all. The next programmer to take over Fred's code will find it impenetrable, and possibly will be forced to rewrite the code in order to add any new features. Is Fred productive now?
Lines of correct, well-documented code per day -- This definition gets closer to what we want, but something still is missing. Imagine both Fred and another programmer, Danny Designer, are given similar assignments. Danny also completes his assignment in five days, but he writes only 500 lines of code (all correct and well-documented). Who was more productive? Probably Danny. His code is shorter and simpler, and simplicity is almost always better in engineering.
Lines of clean, simple, correct, well-documented code per day -- This is a pretty good definition of productivity, and one many experienced, savvy technical managers would accept. There is still something about this definition though that misses what software engineers ultimately are trying to do. Imagine Fred, Danny, and a third programmer, Ingrid Insightful, are given similar assignments. Something about the assignment bothers Ingrid however, so she decides to go outside for a walk. Ingrid wakes up with a start and realizes what bothers her about the programming assignment: this new feature is suspiciously like an existing feature. Ingrid opens up the source code for the existing feature and begins deleting large sections of it. Before long, she has generalized the existing feature so it is simpler, more intuitive, and includes the new capabilities she was asked to add. In the process, she has reduced the code base by 2000 lines. Who was more productive on this day? Certainly Ingrid. Getting away from a problem sometimes is a good way to solve it. And programmers who understand the big picture make smarter decisions, because they are able to reuse code and combine features effectively.
Ability to solve customer problems quickly -- This is the true definition of programmer productivity, and is what Ingrid accomplished in the example. What software engineering really is about is solving problems for the people who will use the software. Any other definition of programmer productivity misses the mark. This definition raises a difficult question though: If a programmer can be highly productive by writing a negative amount of code, how do we measure productivity for software engineers? There is no easy answer to this question, but the answer surely is not a rigid formula related to lines of code, bug counts, or face time in the office. Each of these measures has some value for some purposes, but managers should not lose site of what software engineers are doing. They are creating machines to solve human problems.