Software Maintenance: The Open Source Perspective

Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Athens, Greece

Course Introduction


Advanced Topics in Software Engineering

Course Goals


Course Notes

Course Grade

Schedule of Assignments


Recommended Reading

Course Overview

Further Reading

Exercises and Discussion Topics

Code as Part of the Software Development Process

Code as a Scientific Communication Vehicle

Social processes that contributed to the success of mathematical theorems:

Gains from High-quality Code

We learn:

Recognize Low-quality Code

Requirements Can Lead to Strange Code

How to Familiarize Yourself with a Body of Code

Don't panic! Don't Panic! (typically you don't have to understand all the code)

Four steps:

  1. Build
  2. Browse
  3. Improve
  4. Contribute
What you gain from OSS is a loan that you have to repay.

Code Reading Reasons

Code as Exemplar



Example: tail -f (NetBSD)

        while ((ch = getopt(argc, argv, "b:c:fn:r")) != -1)
                switch(ch) {
                case 'f':
                        fflag = 1;


        for (;;) {
                while ((ch = getc(fp)) != EOF)
                        if (putchar(ch) == EOF)
                if (!fflag)
                 * We pause for one second after displaying any data that has
                 * accumulated since we read the file.  Since sleep(3) takes
                 * eight system calls, use select() instead.
                second.tv_sec = 1;
                second.tv_usec = 0;
                if (select(0NULLNULLNULL, &second) == -1)
                        err(1"select: %s", strerror(errno));

Example: tail -f (FreeBSD)

        if (fflag) {
                kq = kqueue();
                if (kq < 0)
                action = ADD_EVENTS;
        for (;;) {
                while ((ch = getc(fp)) != EOF)
                        if (putchar(ch) == EOF)
                if (! fflag)

                switch (action) {
                case ADD_EVENTS:
                        n = 0;
                        ts.tv_sec = 0;
                        ts.tv_nsec = 0;

                        if (Fflag && fileno(fp) != STDIN_FILENO) {
                                EV_SET(&ev[n], fileno(fp), EVFILT_VNODE,
                                    EV_ADD | EV_ENABLE | EV_CLEAR,
                                    NOTE_DELETE | NOTE_RENAME, 00);
                        EV_SET(&ev[n], fileno(fp), EVFILT_READ,
                            EV_ADD | EV_ENABLE | EV_CLEAR, 000);

                        if (kevent(kq, ev, n, NULL0, &ts) < 0) {
                                action = USE_SLEEP;
                        } else {
                                action = USE_KQUEUE;
kqueue, kevent - kernel event notification mechanism

Maintenance and its Tools


Maintenance activities often take 80% of the effort over a complete life cycle.

Maintenance Activities


Example: locating the authentication code in the ftp command.


If the two environments differ a lot: focus on interfaces.




The only valid measurement of code quality: WTFs/minute

Further Reading

Exercises and Discussion Topics

  1. Are there examples of "open-source" communication in other engineering disciplines?
  2. Discuss possible business models behind an open-source distribution.
  3. Download, build from source, and install a small open-source program. (easy)
  4. Install an open-source operating system on your PC (e.g. FreeBSD (, Debian (, NetBSD ( (moderate)
  5. Modify your system's kernel to log all accesses to files in the /etc directory. (difficult)

The Open Source Landscape


Free and "Free" Software

The Open Source Definition

Software Categories (FreeBSD Ports)

Top-40 package categories (by population) in the FreeBSD Ports distribution.
CategoryNumber of Packages
cd /usr/ports
find * -prune -type d |
while read i
        echo "$i `ls $i | wc -l`"
done |
sort -rn +1 |
head -40

System Software

Operating Systems



Language Processors


Applications and Libraries


Development Tools

Text Processing

Web and Application Servers

Desktop Applications

Open Source Software Forges

Influence on Product Development



Development Process Advantages

Development Process Problems

License Distribution (Sourceforge)

GNU General Public License (GPL) 35807
GNU Library or Lesser General Public License (LGPL) 5447
BSD License 3587
Artistic License 1119
Apache Software License 905
MIT License 881
Mozilla Public License 1.1 (MPL 1.1) 539
Common Public License 265
Mozilla Public License 1.0 (MPL) 260
zlib/libpng License 245
Qt Public License (QPL) 205
Open Software License 184
Python License (CNRI Python License) 159
Academic Free License (AFL) 131
Python Software Foundation License 80
IBM Public License 70
PHP License 49
Apple Public Source License 44
Sun Industry Standards Source License (SISSL) 43
Sun Public License 38
Jabber Open Source License 36
wxWindows Library Licence 33
University of Illinois/NCSA Open Source License 29
Zope Public License 25
Nethack General Public License 24
W3C License 21
Intel Open Source License 19
Open Group Test Suite License 14
Sleepycat License 13
Apache License V2.0 12
Eiffel Forum License 10
Eiffel Forum License V2.0 10
Attribution Assurance License 9
Reciprocal Public License 7
Ricoh Source Code Public License 6
Historical Permission Notice and Disclaimer 6

Legal Exposure

Web sites

Further Reading

Exercises and Discussion Topics

  1. Examine which software categories are over or underepresented in open source software repositories. Discuss why this might be the case.
  2. What criteria will you use for determining the project to contribute?
  3. Describe the business model behind a packaging company. Is a similar business model used in another, non-software, area? Why (not)?
  4. Consider adopting some of the programs we described to improve your productivity.
  5. Learn a scripting language, like Ruby, Python, or Perl.
  6. Compile a Swiss-army-knife CD with all the open source software you would want to have with you on a desert island.

The Full Open Source Definition


Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria:

1. Free Redistribution

The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.

2. Source Code

The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.

3. Derived Works

The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software.

4. Integrity of The Author's Source Code

The license may restrict source-code from being distributed in modified form only if the license allows the distribution of "patch files" with the source code for the purpose of modifying the program at build time. The license must explicitly permit distribution of software built from modified source code. The license may require derived works to carry a different name or version number from the original software.

5. No Discrimination Against Persons or Groups

The license must not discriminate against any person or group of persons.

6. No Discrimination Against Fields of Endeavor

The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.

7. Distribution of License

The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.

8. License Must Not Be Specific to a Product

The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution.

9. License Must Not Restrict Other Software

The license must not place restrictions on other software that is distributed along with the licensed software. For example, the license must not insist that all other programs distributed on the same medium must be open-source software.

*10. License Must Be Technology-Neutral

No provision of the license may be predicated on any individual technology or style of interface.

Origins: Bruce Perens wrote the first draft of this document as "The Debian Free Software Guidelines", and refined it using the comments of the Debian developers in a month-long e-mail conference in June, 1997. He removed the Debian-specific references from the document to create the "Open Source Definition."

Copyright © 2004 by the Open Source Initiative (

Tackling Large Projects

Design and Implementation Techniques

FreeBSD 5.2 Release Policy

Subject: 5.2 is coming!!
Date: Wed, 7 Jan 2004 23:57:22 -0700 (MST)
From: Scott Long <>


We've delayed quite bit while install problems and various show-stopper
bugs got fixed.  I think that it's safe to say that we are getting
pretty close to closing the door on 5.2 now.  There are still a few MFC
requests trickling in, so I wanted to publicize the policy that needs
to be followed for the remainder of the 5.2 cycle.

First, there can be no 'instant MFCs'.  All 5.2 MFC candidates must spend
at least 36 hours in HEAD before going into RELENG_5_2.  It's fine to
contact re@ after committing to HEAD to state your desire to also commit
to RELENG_5_2, but don't expect instant approval.  I encourage the 36
hours to be used for as much review, testing, and verification as

Unless another show-stopper arises, I intend to tag the tree late
Friday night or early Saturday morning (MST -0700).  In accordance
with the previous paragraph, this means that there are approximately
12 more hours for bug fixes to go in and be 5.2 candidates.  This is
_not_ intended to be a signal for people to start rushing stuff in,
but rather a reminder that we cannot hold the release up for forever.
MFC approval is still at the discretion of the RE team.  After this
cut-off, only show-stoppers will be considered for approval.

While 5.2 will certainly have bugs, I'm happy to hold the release up for
true show-stoppers.  A show-stopper generally is a bug that causes a panic
[crash] during normal and reasonable operation of the OS or prevents the
OS from being installed in a way that is documented, and has no reasonable
work-around.  These bugs must be confirmed by a third party to ensure
that it is a problem that is likely to be reproduced in the user base. It
also includes security vulnerablilities that are published by mainstream
security outlets and have not already been addressed in some fashion.

Thanks a lot!


Directory Structure

Project Organization

Two approaches: Keep in mind: large structures are often very easy to navigate.
The FreeBSD directory structure

Source Code is Not Only Code

Key Processes

Project-Specific Tools

Building the IBM 3270 terminal emulator
Building the IBM 3270 terminal emulator

Debug Messages


Logging offers a number of advantages over using a debugger:


Example (from regular expression engine)
    if (dp != NULL)
    /* uh-oh... we couldn't find a subexpression-level match */
    assert(g->backrefs);    /* must be back references doing it */
    assert(g->nplus == 0 || m->lastpos != NULL);

Assertions in Java


 * Run With java -ea FindMax

class FindMax {

        /** Return the maximum number in non-empty array v */
        public static int findMax(int v[]) {
                int max = Integer.MIN_VALUE;

                // Precondition: v[] is not empty
                assert v.length > 0 : "v[] is empty";
                // Precondition: max <= v[i] for every i
                for (int i = 0; i < v.length; i++)
                        assert max <= v[i] : "Found value < MIN_VALUE";

                // Locate the real maximum value
                for (int i = 0; i < v.length; i++)
                        if (v[i] > max)
                                max = v[i];

                // Postcondition: max >= v[i] for every i
                for (int i = 0; i < v.length; i++)
                        assert max >= v[i] : "Found value > MIN_VALUE";
                return max;

        // Test harness
        public static void main(String argv[]) {
                int t[] = new int[5];

                t[0] = 4;
                t[1] = -4;
                t[2] = 145;
                t[3] = 0;
                t[4] = Integer.MIN_VALUE;
                System.out.println("Max value is " + findMax(t));

Further Reading

Exercises and Discussion Topics

  1. Propose ways to quickly determine whether a given project follows one of the design or implementation techniques we described. Test your proposal against one of the major projects available in the course's reference source code .
  2. Locate recommended design and implementation practices in a software engineering book. Explain how these are reflected in a project's source code.
  3. How can a standardized directory structure be utilized for automating aspects of the software development process?
  4. Examine and describe the directory structure of an installed version of Microsoft Windows.
  5. Configure a program from the course's reference source code to compile and run in a supported environment. Examine the configuration process output files and manually verify four different configuration options.
  6. Read the source code of a configuration script, explaining how two different configuration options are determined.
  7. How do you track revision histories in your environment? If you are not using a revision control system, consider adopting one.
  8. How is the build process managed in your favorite integrated development environment? Discuss the advantages and shortcomings of the employed approach.
  9. Locate two different test cases in the course's reference source code and explain how their execution is, or could be, automated.
  10. The Perl test cases suggest that

    ``A good test case has most of these attributes: fewest possible number of lines; few dependencies on external commands, modules, or libraries; runs on most platforms unimpeded; and is self-documenting.''

    Locate test cases in the course's reference source code and judge them against the above standard.

Version Control

Version Control

The Great Debate

Version Control Operations


Revision Tree Example

cat.c revision tree
cat.c revision tree

Life Under a VCS

The Goodies

Example of a Change Log

RCS file: /cvsroot/basesrc/bin/cat/cat.c,v
Working file: cat.c
head: 1.28
locks: strict
access list:
symbolic names:
    netbsd-1-5-PATCH002: 1.23
    netbsd-1-2-BETA: 1.11
    netbsd-1-2-base: 1.11
keyword substitution: kv
total revisions: 33;    selected revisions: 33
revision 1.28
date: 2001/09/16 12:12:13;  author: wiz;  state: Exp;  lines: +6 -4
Some KNF, via patch by Petri Koistinen in private mail.
revision 1.27
date: 2001/07/29 22:40:57;  author: wiz;  state: Exp;  lines: +6 -6
Some style improvements. [Nearly] #13592 by Petri Koistinen.
date: 1998/02/08 21:45:36;  author: mellon;  state: Exp;  lines: +18 -9
Pull up 1.16 and 1.17 (kleink)

Example of a File Difference List

$ cvs diff -c -r1.12 -r1.13 basesrc/bin/cat/cat.c
Index: basesrc/bin/cat/cat.c
RCS file: /cvsroot/basesrc/bin/cat/cat.c,v
retrieving revision 1.12
retrieving revision 1.13
diff -c -r1.12 -r1.13
*** 136,141 ****
--- 136,142 ----
                                fp = stdin;
                        else if ((fp = fopen(*argv, "r")) == NULL) {
                                warn("%s", *argv);
+                               rval = 1;

Example of an Annotated Listing

1.166        (jhb      16-Oct-02):      fdp = p->p_fd;
1.114        (alfred   13-Jan-02):      FILEDESC_LOCK(fdp);
1.157        (iedowse  02-Sep-02):      if ((unsigned)fd >= fdp->fd_nfiles ||
1.157        (iedowse  02-Sep-02):          (fp = fdp->fd_ofiles[fd]) == NULL) {
1.114        (alfred   13-Jan-02):              FILEDESC_UNLOCK(fdp);
1.106        (dillon   01-Sep-01):              error = EBADF;
1.106        (dillon   01-Sep-01):              goto done2;
1.106        (dillon   01-Sep-01):      }
1.157        (iedowse  02-Sep-02):      pop = &fdp->fd_ofileflags[fd];
1.94         (dillon   18-Nov-00):
1.157        (iedowse  02-Sep-02):      switch (cmd) {
1.1          (rgrimes  24-May-94):      case F_DUPFD:
1.241        (rwatson  07-Aug-04):              /* mtx_assert(&Giant, MA_NOTOWNED); */
1.158        (jhb      03-Sep-02):              FILEDESC_UNLOCK(fdp);

Extracting Metrics

Tracking Example

Maintainability index over time in the FreeBSD kernel.
Maintainability index over time in the FreeBSD kernel.

Best Practices

Subversion Cheat Sheet

(Results appear after the # sign.)

Create Repository

# /home/dds
svnadmin create repo
chmod -R go+rwX repo/

Create Project Structure

mkdir template
cd template
mkdir myproject
cd myproject/
mkdir trunk tags branches
cd ..
# /home/dds/template
# myproject

Initial Addition of a Project to the Repository

# /home/dds/template
svn import . file:///home/dds/repo/ --message 'Initial repository layout'
# Adding         myproject
# Adding         myproject/trunk
# Adding         myproject/branches
# Adding         myproject/tags

# Committed revision 1.
cd ..

Initial Checkout from the Repository

# /home/dds/
mkdir work
cd work
svn co file:///home/dds/repo/myproject
# A    myproject/trunk
# A    myproject/branches
# A    myproject/tags
# Checked out revision 1.
# myproject

Add a File to the Project

# /home/dds/work
cd myproject/trunk
echo hi >file.txt
svn add file.txt
# A         file.txt
svn commit --message 'Added file.txt'
# Adding         trunk/file.txt
# Transmitting file data .
# Committed revision 2.

Tag a Release and Create a Branch

svn copy file:///home/dds/repo/myproject/trunk  file:///home/dds/repo/myproject/tags/version-1.0 --message 'Tag version 1.0'
# Committed revision 3.
svn copy file:///home/dds/repo/myproject/trunk  file:///home/dds/repo/myproject/branches/bug-fix-r1.0 --message 'Branch version 1.0 for bug fixes'
# Committed revision 4.

Update with Changes

cd ..
# /home/dds/work/myproject
svn up
#A    branches/bug-fix-r1.0
#A    tags/version-1.0
#Updated to revision 4.

Commit a Change

echo hello >file.txt
svn up
# At revision 5.
svn commit -m 'Made greeting more formal'
#Sending        trunk/file.txt
#Transmitting file data .
#Committed revision 5.

Git Cheat Sheet

(Results appear after the # sign.)

Create Repository

# /home/dds
git init repo
chmod -R go+rwX repo/

Create Project

cd repo
mkdir myproject
cd myproject
# /home/dds/repo/myproject

Add a File to the Project

# /home/dds/repo
cd myproject
echo "My project repo." >README
git add README (or git add . to include every file present.)
git status
# On branch master
# Initial commit
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
# new file:   README
git commit -m 'README file for repo.'
#[master (root-commit) 138f654] README file for repo.
#1 files changed, 1 insertions(+), 0 deletions(-)
#create mode 100644 README
echo "TODO: configure the repo in $GIT_DIR/config." >>README
git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in
#   working directory)
# modified:   README
# no changes added to commit (use "git add" and/or "git commit -a")
git commit -a -m 'Updated README file.'
#[master 1376204] Updated README file.
#1 files changed, 1 insertions(+), 0 deletions(-)

Create a Branch

git branch
#* master
git branch bug-fixing
git branch -a
#  bug-fixing
#* master

Initial Checkout from the Repository

# /home/dds/
mkdir work
cd work
git clone file:///home/dds/repo
#Cloning into repo...
#remote: Counting objects: 306, done.
#remote: Compressing objects: 100% (92/92), done.
#remote: Total 306 (delta 214), reused 306 (delta 214)
#Receiving objects: 100% (306/306), 213.91 KiB, done.
#Resolving deltas: 100% (214/214), done.
# repo

Update with Changes

git pull
#Already up-to-date.

Work with Branches

git checkout bug-fixing
#Branch bug-fixing set up to track remote branch bug-fixing from origin.
#Switched to a new branch 'bug-fixing'.

Commit a Change

git branch
#* bug-fixing
echo "TODO: Improve layout in README." >>README
git commit -a -m 'New TODO in README.'
#[bug-fixing d758c7f] New TODO in README.
# 1 files changed, 1 insertions(+), 0 deletions(-)
git push
#Counting objects: 5, done.
#Delta compression using up to 2 threads.
#Compressing objects: 100% (2/2), done.
#Writing objects: 100% (3/3), 324 bytes, done.
#Total 3 (delta 0), reused 0 (delta 0)
#Unpacking objects: 100% (3/3), done.
#To file:///home/dds/repo/
#c6439c5..d758c7f  bug-fixing -> bug-fixing

Tag a Release

git tag "v1.1"
git push --tags
#Total 0 (delta 0), reused 0 (delta 0)
#To file:///home/dds/repo/
# * [new tag]         v1.1 -> v1.1

More Resources

Further Reading

Build Management

The Build Process

Diagram of the build process

Typical Project Dependencies

Typical project dependencies

Example: Apache Project Dependencies

Apache dependencies

Example: Mozilla Dependencies

Mozilla dependencies

Example: Internet Explorer Dependencies

Interner Explorer dependencies

Tool Types


Part of an Apache Makefile

  modules.o \
  $(MODULES) \
  main/libmain.a \
  $(OSDIR)/libos.a \

        $(CC) -c $(INCLUDES) $(CFLAGS) $<

        emximp -o $@ $<


lib$(TARGET).$(SHLIB_SUFFIX_NAME): subdirs modules.o
        $(CC) -c $(INCLUDES) $(CFLAGS) buildmark.c

target_static: subdirs modules.o
        $(CC) -c $(INCLUDES) $(CFLAGS) buildmark.c
              -o $(TARGET) buildmark.o $(OBJS) $(REGLIB) $(EXPATLIB) $(LIBS)

        -rm -f $(TARGET) lib$(TARGET).* *.o
        @for i in $(SUBDIRS); do \
                echo "===> $(SDP)$$i"\
                ( cd $$i && $(MAKE) $(MFLAGS_STATIC) SDP='$(SDP)' $@ ) || exit 1; \
                echo "<=== $(SDP)$$i"\


$(OBJS): Makefile subdirs

buildmark.o: buildmark.c include/ap_config.h include/ap_mmn.h \
 include/ap_config_auto.h os/unix/os.h include/ap_ctype.h \
 include/hsregex.h include/httpd.h include/ap_alloc.h include/buff.h \
 include/ap.h include/util_uri.h
modules.o: modules.c include/httpd.h include/ap_config.h \
 include/ap_mmn.h include/ap_config_auto.h os/unix/os.h \
 include/ap_ctype.h include/hsregex.h include/ap_alloc.h include/buff.h \
 include/ap.h include/util_uri.h include/http_config.h

Make Problems and Solutions


The Ant Approach

Some ant Tasks

Examples of ant built-in tasks:

Example Ant Build File

<project name="Jasper" default="deploy" basedir=".">
  <!-- ===================== Initialize Property Values =================== -->

  <!-- See "" in the top level directory for all     -->
  <!-- property values you must customize for successful building!!!        -->
  <property file=""/>
  <property file="../"/>
  <property file="${user.home}/"/>

  <!-- Build Defaults -->
  <property name="build.compiler"    value="classic"/>
  <property name="copy.crimson.jar"  value="../lib/crimson.jar"/>

  <!-- =================== BUILD: Create Directories ====================== -->
  <target name="build-prepare">

    <available classname="junit.framework.TestCase" property="junit.present" />

    <mkdir dir="${}"/>
    <mkdir dir="${}/jasper"/>
    <mkdir dir="${}/lib"/>


  <!-- =================== BUILD: Copy Static Files ======================= -->
  <target name="build-static" depends="build-prepare">

    <!-- Executable Commands -->
    <copy todir="${}/bin">
      <fileset dir="src/bin" />
    <fixcrlf srcdir="${}/bin" includes="*.sh" eol="lf"/>
    <fixcrlf srcdir="${}/bin" includes="*.bat" eol="crlf"/>
    <chmod perm="+x" file="${}/bin/"/>
    <chmod perm="+x" file="${}/bin/"/>

    <!-- Common Extensions -->
    <copy todir="${}/jasper" file="${copy.crimson.jar}"/>
    <copy todir="${}/jasper" file="${copy.jaxp.jar}"/>


  <!-- ================= BUILD: Compile Server Components ================= -->
  <target name="build-main" depends="build-static">

    <!-- Compile internal server components -->
    <javac srcdir="src/share" destdir="${}/classes"
           debug="${compile.debug}" deprecation="${compile.deprecation}"
      <classpath refid="jasper.classpath" />

The Maven Approach

Maven's Lifecycle

(Source: Apache Maven Lifecycle Reference (

Automating System Configuration

Best Practices

Best Practices (Variables)

Use variables for varying elements:

Further Reading

Code-Reading Tools

Typical Tasks

Regular Expressions

Regular Expression Symbols

^Beginning of a line
$End of a line
.Any character
Expression?The expression zero or one times
Expression*The expression zero or more times
Expression+The expression one or more times
Expression{n}The expression n times
Expression{n,}The expression at least n times
Expression{n,m}The expression at least n but no more than m times
Expression1|Expression2 The expression1 or the expression2
(Expression) The expression within the brackets
\1 \2 ... \n The content of the nth bracket

Character Classes

[abc]One of a, b, or c
[a-z]A letter from a to z
[^abc]Any letter appart from a, b, and c.
\t The character tab
\n The character newline
\r The character carriage return
\a The character alert
\f The character form feed
\e The character escape
\cx The character control-x (a-z)
\d Digit
\D Non-digit
\s Space
\S Non-space

The Editor as a Code Browser

Code Searching with grep

Locating File Differences

Roll Your Own Tool: Implementation Options

Choosing an Implementation

A Simple Grep Program in Java ...

 * Globally match regular expression and print
 * Modelled after the Unix command with the same name
 * D. Spinellis

import java.util.regex.*;

class Grep {
    public static void main(String args[]) {
        if (args.length != 2) {
            System.err.println("Usage: Grep pattern file");

        Pattern cre = null;        // Compiled RE
        try {
            cre = Pattern.compile(args[0]);
        } catch (PatternSyntaxException e) {
            System.err.println("Invalid RE syntax: " + e.getDescription());

        BufferedReader in = null;
        try {
            in = new BufferedReader(new InputStreamReader(
                 new FileInputStream(args[1])));
        } catch (FileNotFoundException e) {
            System.err.println("Unable to open file " +
                args[1] + ": " + e.getMessage());

        try {
            String s;
            while ((s = in.readLine()) != null) {
                Matcher m = cre.matcher(s);
                if (m.find())
        } catch (Exception e) {
            System.err.println("Error reading line: " + e.getMessage());

... And its Equivalent in Perl

#!/usr/bin/perl -n
BEGIN {$pat = shift;}
print if (/$pat/);

Tool Building Advice

Example: Signature Survey

Example of a signature survey

Using the Compiler

Compiler Warning Messages

(Depending on the language, some of the above may be errors)

The Compiler as a Code-Reading Tool

Code Browsers

Code browsers typicall offer the following facilities:

Code Browsers in OO

In OO languages given a class you can find:


Other related tools:

Runtime Tools

Drawing Diagrams

Lab Tasks (Java)

Lab Tasks (Unix)

Further Reading

Exercises and Discussion Topics

  1. If you are using Windows try installing a Unix-type operating system on a spare (old) computer, or disk drive, or disk partition. Linux (e.g. Ubunutu), FreeBSD, and Solaris are some possible choices for a free Unix-like operating system. Alternatively, install on your Windows machine the Cygwin environment, which offers comparable functionality.
  2. Learn the regular expression syntax and the related find commands provided by the editor you are using.
  3. Write regular expressions to locate integers, floating point numbers, and a given word within a character string. Where would you use such expressions?
  4. Learn about and experiment with the tag facility of the editor you are using.
  5. Often programmers identify code parts that need further attention using a special marker such as XXX or FIXME. Search for, count, and examine such instances in the source code tree.
  6. Propose a simple way to calculate a "measure of similarity" metric for two different files. Apply this metric to files in the source collection that share the same filename and create a list of files that could benefit from a structured approach towards code reuse.
  7. Identify code-reading tasks that can benefit by a custom-built tool. Briefly describe how each such tool could be built. Implement one of the tools you described.
  8. Provide (or, better yet, locate) code examples for some of the compiler warnings discussed in this lecture. Can your compiler detect them? Are there legitimate reasons for using such code?
  9. Many warnings generated by C compilers would be treated as errors in other strongly typed languages. Find a list of warnings that your C compiler generates, and mark those that identify code with legitimate uses. See how many of them are detected in other languages.
  10. Run some of the winning IOCCC ( entries through the C preprocessor and try to understand how the program works and how the original obfuscation was achieved.
  11. Compile a program (with compiler optimizations disabled) and read the generated symbolic code. Repeat this process with all optimizations enabled. Describe how arguments are passed between functions in each case.
  12. Format one of the course's example files using a pretty-printer.
  13. See if your editor supports user-defined syntax coloring. Define syntax-coloring for a language that you use and the editor does not support.

General Purpose Tools

The Unix Toolchest

Available under Unix, GNU/Linux, *BSD.

For Windows:

Working with Unix Tools

Data Fetching and Generation





Example: Analyze Java Files

Examine all Java files located in the directory src, and print the ten files with the highest number of occurrences of a method call to substring.
find src -name ’*.java’ -print |
xargs fgrep -c .substring |
sort -t-rn -k2 |
head -10

Example: Determine Commit Times

find . -type f |
grep -v CVS |
xargs cvs -d /home/ncvs log -SN 2>/dev/null |
awk '/^date/{
    hour = substr($3, 0, 2)
    lines[$5 " " hour] += $9
 END {
    for (i in lines) print i, lines[i]
  }' |
sed 's/;//g' |
sort >dev-times-lines.dat
join dev-times-lines.dat devlong >dev-times-lines-long.dat

Example Result:Plotted Commit Times

Round the clock development

Program Development

(More on the above later.)

Outwit Tools

Outwit Examples

Create an list of fax recipients ordered by the number of faxes they have received.
readlog Application |
awk -F: "/Outbound: Information: Fax Sent/{print $12}" |
sort |
uniq -c |
sort -rn
Extracts the email address from all records from the table users which is part of the database userDB and sends them the file message by email.
fmt message |
mail $(odbc DSN=userDB "select email from users")

An Editor Checklist

Two candidates: Emacs, vim.

Further Reading

Exercises and Discussion Topics

  1. Acquaint yourself with the command-line of the computer you're using. If it doesn't have ready access to the Unix tools, install them.
  2. Evaluate the editor you're using, and adopt a new one if needed.

Performance Measurement and Management

Program States


Measuring Workload

Workload and Tools

Profile r >>u+s s >u u ≈ r
Characterization I/O-bound kernel-bound CPU-bound
Tools Disk, net, VM stats, packet dumps System call tracing Function profiling, basic block counting

System Monitoring Tools

iostat (under *BSD), vmstat (Linux)
vmstat, free (Linux)
Example: virtual memory statistics while executing the find command
$ vmstat 1
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0  52296   3476  50092   0   0   340     0  189   190   0   6  94
 0  1  0      0  51736   3588  50428   0   0   448     0  214   243   0   5  95
 1  0  0      0  51072   3716  50724   0   0   424     0  210   218   0   5  95
 0  1  0      0  50596   3848  50968   0   0   376     0  196   204   0   4  96
 1  0  0      0  49952   3976  51304   0   0   464     0  220   247   0   6  94
 0  1  0      0  49556   4068  51512   0   0   300     0  177   172   2   2  96
 1  0  0      0  49024   4132  51816   0   0   368     0  196   195   0   7  93
 1  0  0      0  48228   4200  52416   0   0   668     0  270   307   0   9  91
 1  0  0      0  47668   4352  52712   0   0   448     0  215   227   0   5  95
 1  0  0      0  47048   4412  53092   0   0   440     0  213   244   1   5  94
 0  1  0      0  44912   4668  54080   0   0   432     0  211   225   0   4  96

Profiling System Calls

$ strace -c find . -name test
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 66.06    0.367503         584       629           getdents64
 26.31    0.146389          80      1826           lstat64
  2.27    0.012621          40       313           fcntl64
  1.83    0.010168          31       326        10 open
  1.78    0.009894          16       625           chdir
  1.23    0.006846          22       316           fstat64

Function Profiling

Profiling C/C++ Code with Gprof

A Flat Profile

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  6.35      0.33     0.33                             _Unwind_SjLj_Register
  5.77      0.63     0.30 10192020     0.00     0.00  __gnu_norm::__deque_buf_size(unsigned int)
  1.54      1.53     0.08       10     8.00   118.33  parse_parse()
  1.35      1.60     0.07  5096343     0.00     0.00  operator==(Tokid, Tokid)

A Call Graph Profile

index % time    self  children    called     name
                                  10             Pdtoken::process_pragma() <cycle 1> [360]
[4]     22.8    0.08    1.10      10         parse_parse() <cycle 1> [4]
                0.00    0.42    8348/17593       Token::unify(Token const&, Token const&) [5]
                0.00    0.26    4328/4328        Type::declare() [28]
                0.00    0.22    3511/3511        completed_typedef(Type) [33]
                0.00    0.03    3529/3540        Block::enter() [220]
                0.00    0.02    7402/27543       obj_lookup(std::string const&) [94]
                0.00    0.02  812361/834736      YYSTYPE::operator=(YYSTYPE const&) [324]
                0.01    0.01   85359/295237      Type::~Type() <cycle 3> [429]
                0.00    0.02    1690/1690        Call::register_call(Token const&, Id const*) [331]
                0.00    0.02   11362/11362       Type::set_abstract(Type) <cycle 4> [581]

Profiling Java Code with EJP

A Java Program Profile

An EJP profile of a Java program

Basic Block Counting in C/C++ Code

To investigate algorithms count the number of times each line is executed. Example:
           main(int argc, char *argv[])
      1            int i, j, k;
      1            int a = 0;

     11            for (i = 0; i < 10; i++)
    110                    for (j = 0; j < 10; j++)
   1100                            for (k = 0; k < 10; k++)
   1000                                    a += i + j + k;
      1    }

Architectural Inefficiencies

Architectural Profiling Tools

Processor event monitoring
Oprofile ( (Linux)
Locality optimizations
SLO: Suggestions for Locality Optimizations (
Events that Oprofile can analyze on a Core 2 Intel CPU.

Memory Performance

Memory as important as CPU cycles Again, we need to profile

Memory Profiling C/C++ Code with Valgrind

Example: sed under valgrind

echo ':abcdefgh: : :' |
valgrind  --tool=massif ./sed -f TEST/hanoi.sed
Valgrind's output

Inspection and Testing

Example: Processor Testing

Intel 8088 processor register test routine from IBM's 1981 PC BIOS. The carry flag is used as the loop index.
8088 Processor Test

Static Verification

Examples: Characteristics:

Identified Issues

Some FindBugs Issues

(FindBugs reports more than 200 issues.)

A FindBugs Run

A FindBugs run

GCC Best Practices

C Compilation Example

main(char *argv[])
        int a, b, c;

        a = b;
        printf("%g\n", a);
Plain compile:
$ gcc t.c
Compile with warnings:
$ gcc -O4 -Wall -Werror t.c
cc1: warnings being treated as errors
t.c:9: warning: return-type defaults to `int'
t.c:9: warning: first argument of `main' should be `int'
t.c:9: warning: `main' takes only zero or two arguments
t.c: In function `main':
t.c:13: warning: implicit declaration of function `printf'
t.c:13: warning: double format, different type arg (arg 2)
t.c:10: warning: unused variable `c'
t.c:14: warning: control reaches end of non-void function
t.c:10: warning: `b' might be used uninitialized in this function

Lint Best Practices

Tracing Tools

Operating Systemstrace
Resource Snapshotlsof

Using Tracing Tools

[sl]trace Tips and Tricks

Example: System Call Tracing

Which shared libraries is dot loading?
$ strace dot </dev/null 2>&1 | fgrep .so | fgrep -v ENOENT
open("/etc/", O_RDONLY)      = 3
open("/usr/lib/", O_RDONLY) = 3
open("/usr/lib/", O_RDONLY)  = 3
open("/usr/lib/", O_RDONLY) = 3
open("/usr/lib/", O_RDONLY)    = 3
open("/lib/", O_RDONLY)        = 3
open("/lib/", O_RDONLY)        = 3

Example: Library Call Tracing

How does the whoami command obtain its data?
$ ltrace whoami
geteuid()                                         = 1000
getpwuid(1000, 0x40013678, 0xbffffcac, 0x08048cff, 0x4012ee48) = 0x401310f0
)                                       = 4
exit(0 <unfinished ...>

Unit Testing

JUnit Example

Test Class and Fields

public class RationalTest {
    private Rational r12;
    private Rational r13;
    private Rational r56;

Initialization Code

    @Before public void setUp() {
        r12 = new Rational(12);
        r13 = new Rational(13);
        r56 = new Rational(56);

A Test Method

    @Test public void simpleAdd() {
        Rational result = r12.add(r13);

Run the Class's Test Methods

    public static void main (String... args) {;

Textual Regression Testing

Example: Testing the Sort Program

FILES=`cd input; echo *`

# Prime test data
if [ "$1" = "-p" ] ; then
        for in $FILES ; do
                sort input/$i >old/$i

# Compare old with new result
for in $FILES ; do
        sort input/$i >new/$i
        if diff old/$i new/$i ; then
                echo "OK $i"
                echo "FAIL $i"
                exit 1

Test Coverage in C/C++ Code

To investigate test coverage, you can use basic block profiling. Example:
 83.33% of 650 source lines executed in file hilbert.c

Example: Test coverage of Perl's source code versus the number of executed test cases

Test coverage of Perl's source code versus the number of executed test cases

Further Reading

Coding Standards and Conventions

File Names and Organization

Common File Names

File Name Contents
README Project overview
MANIFEST List of project files with brief explanations
INSTALL Installation instructions
Copying Licensing information
TODO Wish list for future extensions
NEWS Documentation on user-visible changes
Changes Code change summary
configure Platform configuration script
Makefile Build specification
Makefile.SH Shell script producing the above
config.h Platform configuration definitions
config_h.SH Shell script producing the above
patchlevel.h Defines the project release version

Common File Extensions

Extention Contents
.man Unix manual page source
.encoding Text in a particular encoding (e.g. .utf8)
.language-code Text in a particular language (e.g. .de for German)
.lib Library---collection of object files
.s Assembly language source (Microsoft DOS / Windows, Unix)
.psp Web server-executed source
.awk Awk (language) source
.frm Microsoft Visual Basic source
.com MS-DOS/NT, OS/2 Rexx , VMS commands
.png Microsoft Windows, X-Windows, portable bitmap file
.c C source
.cpp, .cxx C++ source
.cs C# source
.class Java compiled file
.sh Unix csh, sh (Bourne) shell source
.def Microsoft Windows or OS/2 executable or shared library definition
.so Shared object library (Microsoft Windows, Unix)
.vbp Microsoft Developer Studio project and workspace file
.dvi Documentation in troff or TeX device-independent output
.el Emacs lisp source
.tbl Equations, pictures, tables to be typeset by eqn, pic, tbl
.Z File compressed with gzip or compress
.h C or C++ header file
.hpp C++ header file
.icon Microsoft Windows, X-Windows icon
.idl Interface definition file
.info Documentation generated by GNU Texinfo
.tar File collection in Java, Unix shell, tape archive format
.java Java source code
.jcl IBM JCL instructions
.l Lex (lexical analyzer generator) source
.m4 M4 (macro processor) code
.mk Makefile (also often without any extension)
.mif Documentation exported by FrameMaker
.me Documentation using troff mm, me macros
.roff Documentation in plain troff format
.obj Object file
.ok Local additions to the spell-check dictionary
.pod Perl, module source, documentation
.ps Postscript source, or formatted documentation
.py Python source
.rb Ruby source
.res Microsoft Windows resource, compiled resource script
.sed Sed (stream editor) source
.test Test file
.tcl Tcl/Tk source
.texi Documentation in TeX or LaTeX, GNU Texinfo format
.y Yacc (parser generator) source




Comments are not only read by humans:

Naming Convention Styles

Three naming conventions:
  1. Capitalization, or CamelCase. Examples:
  2. Underscore separation (exponent_is_negative) (GNU)
  3. Initials (splbio : set processor level for buffered I/O); also removal of vowels and word endings (strcmp)

Java Naming Conventions

We can thus differentiate between static method calls and message dispatches:

Hungarian Notation

Primitive Types

b Flag (boolean) primitive type
dw Double word (32 bit wide) with arbitrary contents
w Word with arbitrary contents (typically unsigned)
b Byte with arbitrary contents
ch Character
sz Pointer to first character of a null-terminated string
h Handle of a heap-allocated object
fn Function
i Integer
l Long integer (32-bit wide)
n Short integer (16-bit wide)

Type Constructions

pX Pointer to X
dX Difference between two instances of X
cX Count of instances of type X
mpX Y Array (map) of Ys indexed by X
rgX Array (range) of Xs
iX Index to array rg X
cbX Size of instances of X in bytes
cwX Size of instances of X in words

Name Qualifiers

XFirst First element of an ordered set of type X values
XLast Last element of an ordered set of type X values.
XNext Next element of an ordered set of type X values
XPrev Previous element of an ordered set of type X values
XLim The upper limit of a range of X values.
XMac The current upper limit
XNil A special Nil value of type X
XT A temporary value of type X

VB Primitive Types

cur Currency
time Time
date Date
dt Date and time combined
qry Database query
tbl Database table

VB Common Controls

frm Form
mnu Form menu
cmd Command button
chk Check button
opt Radio button
lbl Text label
txt Text edit box
pb Picture box
pic Picture
lst List box
cbo Combo box
tmr Timer

Programming Practices

Established practices help you read standard code patterns, and identify deviations.


Process Standards

Further Reading

Exercises and Discussion Topics

  1. Identify the coding standards that apply in your organization. In the form of a table, compare their coverage against two other commonly used standards.
  2. Discuss the advantages and disadvantages of defining and declaring code elements in a strictly prescribed order. How do modern integrated development environments affect your view?
  3. Devise a guideline that prescribes under what circumstances the formatting of an imported piece of code should be changed to conform to the local coding standards. Take into account code quality, readability, maintenance, revision control, and future integration issues.
  4. Some argue that the Hungarian naming conventions duplicate work that is performed by a type checking compiler. Discuss.
  5. Create an itemized list of programming practices that can help code readability. Try, where possible, to reference existing examples.
  6. Identify the process standards that apply in your organization. Explain what software elements (files, directories) have to be examined in order to verify the conformance of a system against those standards.


Documentation Types

Shortcut for Code Understanding


    line = gobble = 0;
    for (prev = '\n'; (ch = getc(fp)) != EOF; prev = ch) {
        if (prev == '\n') {
            if (ch == '\n') {
                if (sflag) {
                    if (!gobble && putchar(ch) == EOF)
                    gobble = 1;
        gobble = 0;


-s  Squeeze multiple adjacent empty lines, causing the output to be
             single spaced.

Specifications for Code Inspection


    switch (*method) {
        case 'H':
           if (strcmp(method, "HEAD") == 0)
               return M_GET; /* see header_only in request_rec */
        case 'G':
           if (strcmp(method, "GET") == 0)
               return M_GET;
        case 'P':
           if (strcmp(method, "POST") == 0)
               return M_POST;
           if (strcmp(method, "PUT") == 0)
               return M_PUT;
           if (strcmp(method, "PATCH") == 0)
               return M_PATCH;


The Method token indicates the method to be performed on the
resource identified by the Request-URI. The method is

       Method         = "OPTIONS"                ; Section 9.2
                      | "GET"                    ; Section 9.3
                      | "HEAD"                   ; Section 9.4
                      | "POST"                   ; Section 9.5
                      | "PUT"                    ; Section 9.6
                      | "DELETE"                 ; Section 9.7
                      | "TRACE"                  ; Section 9.8
                      | extension-method

Obtain System Structure

Sendmail Files

arpadate.c, clock.c, collect.c, conf.c, convtime.c, daemon.c, deliver.c,
domain.c, envelope.c, err.c, headers.c, macro.c, main.c, map.c, mci.c,
mime.c, parseaddr.c, queue.c, readcf.c, recipient.c, safefile.c,
savemail.c, srvrsmtp.c, stab.c, stats.c, sysexits.c, trace.c, udb.c,
usersmtp.c, util.c, version.c,

Sendmail Documentation Headings

2.5. Configuration file readcf.c
3.3.1. Aliasing alias.c
3.4. Message collection collect.c
3.5. Message delivery deliver.c
3.6. Queued messages queue.c
3.7. Configuration conf.c
3.7.1. Macros macro.c
3.7.2. Header declarations headers.c, envelope.c
3.7.4. Address rewriting rules parseaddr.c

Understand complicated algorithms


for (arcp = memp->parents ; arcp ; arcp = arcp->arc_parentlist) {
    if ( headp -> npropcall ) {
        headp -> propfraction += parentp -> propfraction
                * ( ( (double) arcp -> arc_count )
                  / ( (double) headp -> npropcall ) );


Algorithm Documentation

Obtain the Meaning of Source Code Identifiers


#define TCPS_ESTABLISHED 4 /* established */
(Notice useless comment.)



ESTABLISHED - represents an open connection, data received can be delivered to the user. The normal state for the data transfer phase of the connection.

                                   +---------+ ---------\      active OPEN  
                                   |  CLOSED |            \    -----------  
                                   +---------+<---------\   \   create TCB  
                                     |     ^              \   \  snd SYN    
                        passive OPEN |     |   CLOSE        \   \           
                        ------------ |     | ----------       \   \         
                         create TCB  |     | delete TCB         \   \       
                                     V     |                      \   \     
                                   +---------+            CLOSE    |    \   
                                   |  LISTEN |          ---------- |     |  
                                   +---------+          delete TCB |     |  
                        rcv SYN      |     |     SEND              |     |  
                       -----------   |     |    -------            |     V  
      +---------+      snd SYN,ACK  /       \   snd SYN          +---------+
      |         |<-----------------           ------------------>|         |
      |   SYN   |                    rcv SYN                     |   SYN   |
      |   RCVD  |<-----------------------------------------------|   SENT  |
      |         |                    snd ACK                     |         |
      |         |------------------           -------------------|         |
      +---------+   rcv ACK of SYN  \       /  rcv SYN,ACK       +---------+
        |           --------------   |     |   -----------                  
        |                  x         |     |     snd ACK                    
        |                            V     V                                
        |  CLOSE                   +---------+                              
        | -------                  |  ESTAB  |                              
        | snd FIN                  +---------+                              
        |                   CLOSE    |     |    rcv FIN                     
        V                  -------   |     |    -------                     
      +---------+          snd FIN  /       \   snd ACK          +---------+
      |  FIN    |<-----------------           ------------------>|  CLOSE  |
      | WAIT-1  |------------------                              |   WAIT  |
      +---------+          rcv FIN  \                            +---------+
        | rcv ACK of FIN   -------   |                            CLOSE  |  
        | --------------   snd ACK   |                           ------- |  
        V        x                   V                           snd FIN V  
      +---------+                  +---------+                   +---------+
      |FINWAIT-2|                  | CLOSING |                   | LAST-ACK|
      +---------+                  +---------+                   +---------+
        |                rcv ACK of FIN |                 rcv ACK of FIN |  
        |  rcv FIN       -------------- |    Timeout=2MSL -------------- |  
        |  -------              x       V    ------------        x       V  
         \ snd ACK                 +---------+delete TCB         +---------+
          ------------------------>|TIME WAIT|------------------>| CLOSED  |
                                   +---------+                   +---------+

Rationale Behind Nonfunctional Requirements


if (newdp->d_cred > dp->d_cred) {
   /* better credibility.
    * remove the old datum.
   goto delete;


(P. Vixie's BIND Security Paper)
5.1. Cache Tagging
BIND now maintains for each cached RR a "credibility" level showing whether the data came from a zone, an authoritative answer, an authority section, or additional data section. When a more credible RRset comes in, the old one is completely wiped out. Older BINDs blindly aggregated data from all sources, paying no attention to the maxim that some sources are better than others.

Design Intelligence


Pike and Thompson on adopting UTF over 16-bit Unicode representation in Plan 9:

Unicode defines an adequate character set but an unreasonable representation. The Unicode standard states that all characters are 16 bits wide and are communicated in 16-bit units.... To adopt Unicode, we would have had to convert all text going into and out of Plan 9 between ASCII and Unicode, which cannot be done. Within a single program, in command of all its input and output, it is possible to define characters as 16-bit quantities; in the context of a networked system with hundreds of applications on diverse machines by different manufacturers, it is impossible.


The UTF encoding has several good properties. By far the most important is that a byte in the ASCII range 0-127 represents itself in UTF. Thus UTF is backward compatible with ASCII.

Internal Programming Interfaces


Test Cases and Examples of Actual Use

Examples from the tcpdump documentation:

To print all ftp traffic through internet gateway snup: (note that the expression is quoted to prevent the shell from (mis-)interpreting the parentheses):

tcpdump 'gateway snup and (port ftp or ftp-data)'

To print the start and end packets (the SYN and FIN packets) of each TCP conversation that involves a non-local host.

tcpdump 'tcp[13] & 3 != 0 and not src and dst net localnet'

Implementation Problems and Bugs

at: limitations

At and batch as presently implemented are not suitable when users are competing for resources. If this is the case for your site, you might want to consider another batch system, such as nqs.

cat: caveats

Because of the shell language mechanism used to perform output redirection, the command
"cat file1 file2 > file1"
will cause the original data in file1 to be destroyed! This is performed by the shell before cat is run.

strftime: humor

There is no conversion specification for the phase of the moon.

ctags: bugs

Recognition of functions, subroutines and procedures for FORTRAN and Pascal is done in a very simpleminded way. No attempt is made to deal with block structure; if you have two Pascal procedures in different blocks with the same name you lose.

Development and Execution Environment Problems

// The following function is not inline, to avoid build (template
// instantiation) problems with Sun C++ 4.2 patch 104631-07/SunOS 5.6.
(Often comments are harsher)

Trouble Spots

2001-09-17 Urban [...]
  * proc.c: Go back to the interruptible sleep as reconnects
    seem to handle it now.
2001-07-09 Jochen [...]
  * proc.c, ioctl.c: Allow smbmount to signal failure to reconnect
    with a NULL argument to SMB-IOC-NEWCONN (speeds up error
2001-04-21 Urban [...]
  * dir.c, proc.c: replace tests on conn-pid with tests on state
    to fix smbmount reconnect on smb_retry timeout and up the
    timeout to 30s.
2000-08-14 Urban [...]
  * proc.c: don't do interruptable_sleep in smb_retry to avoid
    signal problem/race.
1999-11-16 Andrew [...]
  * proc.c: don't sleep every time with win95 on a FINDNEXT

Undocumented Features


Additional Documentation Sources

Common Open-Source Documentation Formats

Important: properly typeset the documentation for printing.

Further Reading

Exercises and Discussion Topics

  1. Select three large projects from the course's reference source code and classify the available documentation.
  2. Comment on the applicability of the documentation types we described in open-source development efforts.
  3. Present an overview of the source organization of apache Web server by examining the provided documentation.
  4. Locate one instance of a published algorithm reference in the course's reference source code. Map the published version of the algorithm against its implementation.
  5. Categorize and tabulate the types of problems described in the Bugs section of the Unix manual pages and sort them according to their frequency. Discuss the results you obtained.
  6. The course's reference source code contains over 40 references to undocumented behavior. Locate them and discuss the most common observed cause of documentation discrepancies.
  7. Compare the documentation formats we described on usability, readability, features provided, and amenability to automated processing by ad-hoc tools.
  8. Locate and typeset on a high quality output device each of the different documentation formats available in the course's reference source code in your local environment. Discuss the difficulties you encountered.




The Maintainability Index

Maintainability Index definition
Parameter Name Measures
aveV Average Halstead complexity Computational density
aveV(g') Average extended cyclomatic complexity Logical complexity
aveLOC Average count of lines of code Code size
PerCM Average percent of lines of comments Human insight
Maintainability of the user code
Maintainability index over time in the FreeBSD user-space code.

Metrics for Object-oriented Programs

Dependency Metrics on Packages

An unstable package in Tomcat
An unstable package in Tomcat

A stable package in the Eclipse distribution
A stable package in the Eclipse distribution


Analyzability: Dependencies and Coupling




Effects of the Development Environment

Further Reading

Exercises and Discussion Topics

Course Projects


March 3rd, 2013
Team forms due
March 16th, 2013 23:59
Project forms due
March 20th, 2013
Existing project presentation (must demonstrate ability to compile and run the project)
April 3rd, 2013
Design presentation
June 5th, 2013
Implementation presentation
Unless otherwise noted, all deadlines refer to the time the course lecture begins on the corresponding day (e.g. 15:00) or the end of the day (23:59) in Athens local time.

Late entries will affect the project's grade.

Team Form

Please complete the form here (

Project Form

Please complete the form here (

2013 teams

A Planet of the teams' blogs can be found here (

Hall of Fame

The Class of 2004

Nikos Korfiatis

Viky Skordali

Panagiotis Toumasis

The Class of 2005

Achilleas Anagnostopoulos

Alexandros Charos

Dalia Martisiute

Zbyněk Mužík

Athina Spiliopoulou

Polina Tzavala

The Class of 2006

Jon Igual Brun and Guillermo Barbero Maiz

John Georgiou and George Paraskevopoulos

Argyro Kazaki

Ioannis Sermetziadis

Giorgos Veiogloannis

Costas Alexandropoulos

(Honourable mention.)

Kelly Mantagou and Ioanna Mantzouratou

(Honourable mention.)

The Class of 2007

Panagiotis Adamopoulos and Georgia-Bilma Todri

Appendix: Basic Programming Elements

A Complete Program

/*      $NetBSD: echo.c,v 1.7 1997/07/20 06:07:03 thorpej Exp $ */

 * Copyright (c) 1989, 1993
 *      The Regents of the University of California.  All rights reserved.
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. All advertising materials mentioning features or use of this software
 *    must display the following acknowledgement:
 *      This product includes software developed by the University of
 *      California, Berkeley and its contributors.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.

#include <sys/cdefs.h>
#ifndef lint
"@(#) Copyright (c) 1989, 1993\n\
        The Regents of the University of California.  All rights reserved.\n");
#endif /* not lint */

#ifndef lint
#if 0
static char sccsid[] = "@(#)echo.c      8.1 (Berkeley) 5/31/93";
__RCSID("$NetBSD: echo.c,v 1.7 1997/07/20 06:07:03 thorpej Exp $");
#endif /* not lint */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int     __Pmain ((intchar *[]));

main(argc, argv)
        int argc;
        char *argv[];
        int nflag;

        /* This utility may NOT do getopt(3) option parsing. */
        if (*++argv && !strcmp(*argv, "-n")) {
                nflag = 1;
                nflag = 0;

        while (*argv) {
                (void)printf("%s", *argv);
                if (*++argv)
                        putchar(' ');
        if (!nflag)
Important issues:

Expand: A Larger Example

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <unistd.h>

 * expand - expand tabs to equivalent spaces
int     nstops;
int     tabstops[100];

static  void    getstops __P((char *));
        int     main __P((intchar **));
static  void    usage __P((void));

main(argc, argv)
        int argc;
        char *argv[];
        int c, column;
        int n;

        /* handle obsolete syntax */
        while (argc > 1 && argv[1][0] && isdigit(argv[1][1])) {
                argc--; argv++;

        while ((c = getopt (argc, argv, "t:")) != -1) {
                switch (c) {
                case 't':
                case '?':
                        /* NOTREACHED */
        argc -= optind;
        argv += optind;

        do {
                if (argc > 0) {
                        if (freopen(argv[0], "r"stdin) == NULL) {
                        argc--, argv++;
                column = 0;
                while ((c = getchar()) != EOF) {
                        switch (c) {
                        case '\t':
                                if (nstops == 0) {
                                        do {
                                                putchar(' ');
                                        } while (column & 07);
                                if (nstops == 1) {
                                        do {
                                                putchar(' ');
                                        } while (((column - 1) % tabstops[0]) != (tabstops[0] - 1));
                                for (n = 0; n < nstops; n++)
                                        if (tabstops[n] > column)
                                if (n == nstops) {
                                        putchar(' ');
                                while (column < tabstops[n]) {
                                        putchar(' ');

                        case '\b':
                                if (column)


                        case '\n':
                                column = 0;
        } while (argc > 0);

static void
        char *cp;
        int i;

        nstops = 0;
        for (;;) {
                i = 0;
                while (*cp >= '0' && *cp <= '9')
                        i = i * 10 + *cp++ - '0';
                if (i <= 0 || i > 256) {
                        fprintf(stderr"Bad tab stop spec\n");
                if (nstops > 0 && i <= tabstops[nstops-1])
                        goto bad;
                tabstops[nstops++] = i;
                if (*cp == 0)
                if (*cp != ',' && *cp != ' ')
                        goto bad;

static void
        (void)fprintf (stderr"usage: expand [-t tablist] [file ...]\n");

Declarations and visibility

int     nstops;
int     tabstops[100];

static  void    getstops (char *);
        int     main (intchar **);
static  void    usage(void);

Functions and Global Variables

Functions can help you identify a program's major parts.

To find what a function does:

while Loops, Conditions, and Blocks

        while ((c = getopt (argc, argv, "t:")) != -1) {
                switch (c) {
                case 't':
                case '?':
                        /* NOTREACHED */

switch Statements

for Loops

Loop Control Statements

Character Expressions

Boolean Expressions

goto Statements

Refactoring in the Small

Illustrative Example

        if (*cp == 0)
        if (*cp != ',' && *cp != ' ')
                goto bad;
Changed into:
        if (*cp == 0)
        else if (*cp != ',' && *cp != ' ')
                goto bad;

Refactoring Example

Original Code

(worms game)
op = &(!x ? (!y ? upleft : (y == bottom ? lowleft : left)) :
(x == last ? (!y ? upright : (y == bottom ? lowright : right)) :
(!y ? upper : (y == bottom ? lower : normal))))[w->orientation];
What is going one here?

More Readable Representation

(Formatted as cascading if-else)
op = &(
!x ? (
    !y ? 
    : ( y == bottom ? 
) : ( x == last ? (
    !y ? 
    : ( y == bottom ? 
) : (
    !y ?
    : ( y == bottom ?


struct options *locations[3][3] = {
        {upleft,  upper,  upright},
        {left,    normal, right},
        {lowleft, lower, lowright},
int xlocation, ylocation;

if (x == 0)
        xlocation = 0;
else if (x == last)
        xlocation = 2;
        xlocation = 1;
if (y == 0)
        ylocation = 0;
else if (y == bottom)
        ylocation = 2;
        ylocation = 1;
op = &(locations[ylocation][xlocation])[w];

Other Implementation

(Suggested by Guy Steele)
op =
  &(        !y ? (!x ?  upleft : x!=last ?  upper :  upright ) :
     y!=bottom ? (!x ?    left : x!=last ? normal :    right ) :
                 (!x ? lowleft : x!=last ?  lower : lowright )

Other Refactoring Changes

Assignment Expressions

Bit Manipulation

Reading Control Structures

In structured programs:

Further Reading

Exercises and Discussion Topics

  1. Experiment to find-out how your C, C++, and Java compilers deal with uninitialized variables. Outline your results and propose an inspection procedure for locating uninitialized variables.
  2. Look in the course's source code repository for programs that do not verify the result of library calls. Propose practical fixes.
  3. Examine the visibility of functions and variables in programs in your environment. Can it be improved (made more conservative)?
  4. Discover how the editor you are using can identify matching braces and parentheses. If it can not, consider to start using another editor.
  5. Provide five examples from the course's source code collection where the code structure can be improved to make it more readable.
  6. You can find tens of intentionally unreadable C programs at the International Obfuscated C Code Contest web site ( Most of them use several layers of obfuscation to hide their algorithms. See how gradual code changes can help you untangle their code. If you are not familiar with the C preprocessor try to avoid programs with a large number of #define lines.
  7. The Perl language mandates the use of braces for all its control structures. Comment on how this affects the readability of Perl programs.
  8. The code body of switch statements in the course's source code collection is formatted differently from the other statements. Express the formatting rule used, and explain its rationale.
  9. The for statement in the C language family is very flexible. Examine the source code provided to create a list of ten different uses.
  10. Locate ten occurrences of break and continue in the source code provided with the course. For each case indicate the point where execution will transfer after the corresponding statement is executed, and explain why the statement is used. Do not try to understand in full the logic of the code; simply provide an explanation based on the statement's use pattern.
  11. Find, simplify, and reason about five non-trivial boolean expressions in the source-code base. Do not spend time on understanding what the expression elements mean, concentrate on the conditions that will make the expression become true or false. Where possible, identify and utilize the properties of short-circuit evaluation.
  12. Study Section 2.11 from the book. Provide a proof about the correctness of a sort algorithm, found in the course's soruce repository.

Appendix: Advanced C Data Types







Dynamic Memory Allocation

typedef Declarations

Further Reading

Exercises and Discussion Topics

  1. If you are familiar with C++ or Java explain how it is possible to minimize (in C++) or avoid (in Java) the use of pointers.
  2. How does a C pointer differ from a memory address? How does that affect your understanding of code? Which tools take advantage of the difference?
  3. How can a program using structures to specify a memory layout ensure that they are correctly allocated?
  4. How would you translate a program using unions into Java?

Appendix: Data Structures in C


Asymmetric Bounds

#define MAX_ADS 5
struct dr {            /* accumulated advertisements */
    n_long  dr_pref;   /* preference adjusted by metric */
} *cur_drp, drs[MAX_ADS];
    struct dr *drp;
    for (drp = drs; drp < &drs[MAX_ADS]; drp++) {
        drp->dr_recv_pref = 0;
        drp->dr_life = 0;
In the code above drs (the lower bound) is the first element of the array; drs[MAX_ADS] (the upper bound) the first element outside the array.

When using asymmetric bounds:






Hash Tables


Singly Linked Lists

Singly linked list

Doubly Linked Lists

Doubly linked list

Circular Linked Lists

Circular linked list


Binary tree

Further Reading

Exercises and Discussion Topics

  1. Select a large software system, locate where arrays are used, and categorize their uses.
  2. Fixed-length arrays notoriously impose arbitrary limits to program elements. Locate 20 instances of such limits in existing code and check whether these are documented in the respective system's documentation.
  3. For the RDBMS of your choice locate documentation regarding cursors. Compare the functionality offered by cursors to the functionality offered by pointers to structures.
  4. A number of linear algebra and matrix libraries are available on the Web in open-source form. Download two such packages and identify how matrices are stored.
  5. Examine the stack operations performed on implementations based on explicit code. Identify those operations that do not fit into the model we described. Propose function names and code for implementing them.
  6. Locate code where explicit control structures can be substituted by an array lookup mechanism. Explain how the code would be implemented.
  7. Do you envisage any difficulties in implementing a hash-based map as an abstract data type? Suggest a way to overcome them.
  8. Locate instances in the course's reference source code where sets are being used. Suggest other implementation strategies for each particular problem. Comment on whether a set data-type is used partly, because it is easy to create an efficient implementation for it.
  9. Locate five instances of singly and doubly-linked lists in the course's reference source code and explain the function they perform.
  10. Draw a binary tree containing the host names in your email address book. Add the names in a random order; not alphabetically. Systematically walk through the tree to derive a sorted listing of the host names. Construct a new tree adding the host names in the order they appear in the listing you constructed. Comment on the efficiency of searching data in the two trees.

Appendix: Collaboration

Mailing Lists

VCS Notifications (CVS)

Example Notification

Date: Sat, 19 Aug 2006 08:24:01 +0000 (UTC)
Subject: cvs commit: src/usr.bin/pkill Makefile
X-FreeBSD-CVS-Branch: HEAD

yar         2006-08-19 08:24:01 UTC

  FreeBSD src repository

  Modified files:
    usr.bin/pkill        Makefile
  Install pkill(1), aka pgrep(1), to /bin so that rc scripts
  can use this small and nifty utility.  Create compatibility
  symlinks from /usr/bin for the time being to avoid breaking
  custom scripts relying on the hardcoded path to the utility.

  Idea by:        des
  Discussed with: brooks (in cvs-src and cvs-all)

  Revision  Changes    Path
  1.6       +5 -0      src/usr.bin/pkill/Makefile

Setting up CVS Notifications (CVS)

        id -P |
        awk -F: '{
                print $1 " (" gensub(",.*", "", "", $8) ")\t" strftime("%F %R:%S %z (%Z)")
        END {
                print "\nRepository: '"$2"'\n"
        echo "Directory: $1"
        egrep -v '^In directory|^Update of'
) | mail -s "[$2] cvs commit $1" $3

Setting up CVS Notifications (git)

Documentation is Ubiquitous


Advantages of Wikis for storing documentation. The MediaWiki web page

Things to put on a wiki

Wiki Best Practices

Issue Tracking

Issue Reports

Track bugs according to:

Example: Bugzilla

Bugzilla screen dump

Severity of a Bug

Blocker Blocks development or testing work (e.g. a build)
Critical Crashes, loss of data, severe memory leak
Major major functionality problem
Minor Minor functionality problem, or easy to work around
Trivial Cosmetic problem
Enhancement Request for enhancement

An Issue's Lifecycle

A bug's lifecycle

Resolution Options

Example: Releasing

Before a release.

Bug Tracking Best Practices

Further Reading