Building a text preprocessor

Let's say we want to process a bunch of files, each in the same way. An easy example is a text preprocessor: we'll look through the text and replace variables of the form %VARNAME% with real content. In our example, we'll use variable %VERSION%, representing a version number, and %DATE%, with today's date.

To play with this code on your own machine, get the redo source code and look in the docs/cookbook/defaults/ directory.

Input files

Let's create some input files we can use. The input to our preprocessor will have file extension .in. The output can be in various formats.

For fun, let's also make a python test program that calls the newly generated version.py:

Finally, we need to provide values for the variables. Let's put each one in its own file, named version and date, respectively.

version1.0
date1970-01-01

default.do files

Now we want to teach redo how to do our substitutions: how to, in general, generate file X from file X.in.

We could write a separate .do file for every output file X. For example, we might make test.txt.do and version.py.do. But that gets tedious. To make it easier, if there is no specific X.do for a target named X, redo will try using default.do instead. Let's write a default.do for our text preprocessor.

If default.do is asked to build X, and there exists a file named X.in, we use sed to do our variable substitutions. In that case, default.do uses redo-ifchange to depend on X.in, version, and date. If a file named X.in does not exist, then we don't know what to do, so we give an error.

On the other hand, if we try to generate a file that already exists, like test.py, redo does not call default.do at all. redo only tries to create files that don't exist, or that were previously generated by redo. This stops redo from accidentally overwriting your work.

$ redo test.txt
redo  test.txt

$ cat test.txt
This is the documentation for MyProgram version
1.0.  It was generated on 1970-01-01.

$ redo chicken
redo  chicken
default.do: Fatal: don't know how to build 'chicken'
redo  chicken (exit 99)

$ redo version.py
redo  version.py

# test.py was created by us, so it's a "source" file.
# redo does *not* call default.do to replace it.
$ redo test.py
redo: test.py: exists and not marked as generated; not redoing.

$ python test.py
Version '1.0' has build date '1970-01-01'

Nice!

While we're here, let's make an all.do so that we don't have to tell redo exactly which files to rebuild, every single time.

Results:

$ redo
redo  all
redo    test.txt
redo    version.py
redo    include/version.h

# input files didn't change, so nothing to rebuild
$ redo
redo  all

$ touch test.txt.in

$ redo
redo  all
redo    test.txt

Auto-generating the version and date (redo-always and redo-stamp)

Of course, in a real project, we won't want to hardcode the version number and date into a file. Ideally, we can get the version number from a version control system, like git, and we can use today's date.

To make that happen, we can replace the static version and date files with version.do and date.do. default.do already uses redo-ifchange to depend on version and date, so redo will create them as needed, and if they change, redo will rebuild all the targets that depend on them.

However, the version and date files are special: they depend on the environment outside redo itself. That is, there's no way to declare a dependency on the current date. We might generate the date file once, but tomorrow, there's no way for redo to know that its value should change.

To handle this situation, redo has the redo-always command. If we run redo-always from a .do file, it means every time someone depends on that target, it will be considered out-of-date and need to be rebuilt. The result looks like this:

$ redo 
redo  all
redo    test.txt
redo      version
redo      date
redo    version.py
redo    include/version.h

# version.do and date.do are redo-always, so
# everything depending on them needs to rebuild
# every time.
$ redo
redo  all
redo    test.txt
redo      version
redo      date
redo    version.py
redo    include/version.h

Of course, for many uses, that's overcompensating: the version number and date don't change that often, so we might end up doing a lot of unnecessary work on every build. To solve that, there's redo-stamp. redo-stamp does the opposite of redo-always: while redo-always makes things build more often, redo-stamp makes things build less often. Specifically, it lets a .do file provide a "stamp value" for its output; if that stamp value is the same as before, then the target should be considered unchanged after all.

The most common stamp value is just the content itself. Since in redo, we write the content to $3, we can also read it back from $3:

And the final result is what we want. Although version and date are generated every time, the targets which depend on them are not:

$ redo clean
redo  clean

# version and date are generated just once per run,
# the first time they are used.
$ redo
redo  all
redo    test.txt
redo      version
redo      date
redo    version.py
redo    include/version.h

# Here, (test.txt) means redo is considering building
# test.txt, but can't decide yet. In order to decide,
# it needs to first build date and version.  After
# that, it decides not to build test.txt after all.
$ redo
redo  all
redo    (test.txt)
redo    date
redo    version

Temporary overrides

Sometimes you want to override a file even if it is a target (ie. it has previously been built by redo and has a valid .do file associated with it). In our example, maybe you want to hardcode the version number because you're building a release. This is easy: redo notices whenever you overwrite a file from outside redo, and will avoid replacing that file until you subsequently delete it:

$ echo "1.0" >version

$ redo
redo  all
redo    (test.txt)
redo    date
redo: version - you modified it; skipping
redo    test.txt
redo    version.py
redo    include/version.h

$ redo
redo  all
redo    (test.txt)
redo    date
redo: version - you modified it; skipping

$ rm version

$ redo
redo  all
redo    (test.txt)
redo    date
redo    version
redo    test.txt
redo    version.py
redo    include/version.h

default.do, subdirectories, and redo-whichdo

There's one more thing we should mention, which is the interaction of default.do with files in subdirectories. Notice that we are building include/version.h in our example:

$ redo include/version.h
redo  include/version.h
redo    version
redo    date

$ cat include/version.h
// C/C++ header file identifying the current version
#ifndef __VERSION_H
#define __VERSION_H

#define VERSION "redo-0.31-3-g974eb9f"
#define DATE "2018-11-26"

#endif // __VERSION_H

redo works differently from the make command when you ask it to build files in subdirectories. In make's case, it always looks for a Makefile in the current directory, and uses that for all build instructions. So make include/version.h and cd include && make version.h are two different things; the first uses Makefile, and the second uses include/Makefile (or crashes if the latter does not exist).

redo, on the other hand, always uses the same formula to find a .do file for a particular target. For a file named X, that formula is as follows:

  • first, try X.do
  • then try default.do
  • then try ../default.do
  • then try ../../default.do
  • ...and so on...

(Note: for targets with an extension, like X.o, redo actually tries even more .do files, like default.o.do and ../default.o.do. For precise details, read the redo man page.)

You can see which .do files redo considers for a given target by using the redo-whichdo command. If redo-whichdo returns successfully, the last name in the list is the .do file it finally decided to use.

$ redo-whichdo include/version.h
include/version.h.do
include/default.h.do
include/default.do
default.h.do
default.do

Redo always runs in the .do file's directory

To ensure consistency, redo always changes the current directory to the directory containing the selected .do file (not the directory containing the target, if they are different). As a result, redo include/version.h and cd include && redo version.h always have exactly the same effect:

$ redo include/version.h
redo  include/version.h
redo    version
redo    date

$ (cd include && redo version.h)
redo  version.h
redo    ../version
redo    ../date

(redo's display is slightly different between the two: it always shows the files it's building relative to the $PWD at the time you started redo.)

This feature is critical to redo's recursive nature; it's the reason that essays like Recursive Make Considered Harmful don't apply to redo. Any redo target, anywhere in your source tree, can use redo-ifchange to depend on any of your other targets, and the dependency will work right.

Why does redo change to the directory containing the .do file, instead of the directory containing the target? Because usually, the .do file needs to refer to other dependencies, and it's easier to always express those dependencies without adjusting any paths. In our text preprocessor example, default.do does redo-ifchange version date; this wouldn't work properly if it were running from the include/ directory, because there are no files named version and date in there.

Similarly, when compiling C programs, there are usually compiler options like -I../mylib/include. If we're compiling foo.o and mydir/bar.o, we would like -I../mylib/include to have the same meaning in both cases.