2.  The CVS Program

      cvs (Concurrent Versions System) is a front end to the RCS revision control system which extends the notion of revision control from a collection of files in a single directory to a hierarchical collection of directories each containing revision controlled files. Directories and files in the cvs system can be combined together in many ways to form a software release. cvs provides the functions necessary to manage these software releases and to control the concurrent editing of source files among multiple software developers.

      The six major features of cvs are listed below, and will be described in more detail in the following sections:

Concurrent access and conflict-resolution algorithms to guarantee that source changes are not ``lost.''
Support for tracking third-party vendor source distributions while maintaining the local modifications made to those sources.
A flexible module database that provides a symbolic mapping of names to components of a larger software distribution. This symbolic mapping provides for location independence within the software release and, for example, allows one to check out a copy of the ``diff'' program without ever knowing that the sources to ``diff'' actually reside in the ``bin/diff'' directory.
Configurable logging support allows all ``committed'' source file changes to be logged using an arbitrary program to save the log messages in a file, notesfile, or news database.
A software release can be symbolically tagged and checked out at any time based on that tag. An exact copy of a previous software release can be checked out at any time, regardless of whether files or directories have been added/removed from the ``current'' software release. As well, a ``date'' can be used to check out the exact version of the software release as of the specified date.
A ``patch'' format file [Wall] can be produced between two software releases, even if the releases span multiple directories.

      The sources maintained by cvs are kept within a single directory hierarchy known as the ``source repository.'' This ``source repository'' holds the actual RCS ``,v'' files directly, as well as a special per-repository directory (CVSROOT.adm) which contains a small number of administrative files that describe the repository and how it can be accessed. See Figure 1 for a picture of the cvs tree.


Figure 1.
cvs Source Repository

2.1.  Software Conflict Resolution[note 4]

      cvs allows several software developers to edit personal copies of a revision controlled file concurrently. The revision number of each checked out file is maintained independently for each user, and cvs forces the checked out file to be current with the ``head'' revision before it can be ``committed'' as a permanent change. A checked out file is brought up-to-date with the ``head'' revision using the ``update'' command of cvs. This command compares the ``head'' revision number with that of the user's file and performs an RCS merge operation if they are not the same. The result of the merge is a file that contains the user's modifications and those modifications that were ``committed'' after the user checked out his version of the file (as well as a backup copy of the user's original file). cvs points out any conflicts during the merge. It is the user's responsibility to resolve these conflicts and to ``commit'' his/her changes when ready.

      Although the cvs conflict-resolution algorithm was defined in 1986, it is remarkably similar to the ``Copy-Modify-Merge'' scenario included with NSE[note 5] and described in [Honda] and [Courington]. The following explanation from [Honda] also applies to cvs:

Simply stated, a developer copies an object without locking it, modifies the copy, and then merges the modified copy with the original. This paradigm allows developers to work in isolation from one another since changes are made to copies of objects. Because locks are not used, development is not serialized and can proceed in parallel. Developers, however, must merge objects after the changes have been made. In particular, a developer must resolve conflicts when the same object has been modified by someone else.

      In practice, Prisma has found that conflicts that occur when the same object has been modified by someone else are quite rare. When they do happen, the changes made by the other developer are usually easily resolved. This practical use has shown that the ``Copy-Modify-Merge'' paradigm is a correct and useful one.

2.2.  Tracking Third-Party Source Distributions

      Currently, a large amount of software is based on source distributions from a third-party distributor. It is often the case that local modifications are to be made to this distribution, and that the vendor's future releases should be tracked. Rolling your local modifications forward into the new vendor release is a time-consuming task, but cvs can ease this burden somewhat. The checkin program of cvs initially sets up a source repository by integrating the source modules directly from the vendor's release, preserving the directory hierarchy of the vendor's distribution. The branch support of RCS is used to build this vendor release as a branch of the main RCS trunk. Figure 2 shows how the ``head'' tracks a sample vendor branch when no local modifications have been made to the file.


Figure 2.
cvs Vendor Branch Example

Once this is done, developers can check out files and make local changes to the vendor's source distribution. These local changes form a new branch to the tree which is then used as the source for future check outs. Figure 3 shows how the ``head'' moves to the main RCS trunk when a local modification is made.


Figure 3.
cvs Local Modification to Vendor Branch

      When a new version of the vendor's source distribution arrives, the checkin program adds the new and changed vendor's files to the already existing source repository. For files that have not been changed locally, the new file from the vendor becomes the current ``head'' revision. For files that have been modified locally, checkin warns that the file must be merged with the new vendor release. The cvs ``join'' command is a useful tool that aids this process by performing the necessary RCS merge, as is done above when performing an ``update.''

      There is also limited support for ``dual'' derivations for source files. See Figure 4 for a sample dual-derived file.


Figure 4.
cvs Support For ``Dual'' Derivations

This example tracks the SunOS distribution but includes major changes from Berkeley. These BSD files are saved directly in the RCS file off a new branch.

2.3.  Location Independent Module Database

      cvs contains support for a simple, yet powerful, ``module'' database. For reasons of efficiency, this database is stored in ndbm(3) format. The module database is used to apply names to collections of directories and files as a matter of convenience for checking out pieces of a large software distribution. The database records the physical location of the sources as a form of information hiding, allowing one to check out whole directory hierarchies or individual files without regard for their actual location within the global source distribution.

      Consider the following small sample of a module database, which must be tailored manually to each specific source repository environment:

		#key      [-option argument] directory [files...]
		diff      bin/diff
		libc      lib/libc
		sys       -o sys/tools/make_links sys
		modules   -i mkmodules CVSROOT.adm modules
		kernel    -a sys lang/adb
		ps        bin Makefile ps.c

      The ``diff'' and ``libc'' modules refer to whole directory hierarchies that are extracted on check out. The ``sys'' module extracts the ``sys'' hierarchy, and runs the ``make_links'' program at the end of the check out process (the -o option specifies a program to run on checkout). The ``modules'' module allows one to edit the module database file and runs the ``mkmodules'' program on checkin to regenerate the ndbm database that cvs uses. The ``kernel'' module is an alias (as the -a option specifies) which causes the remaining arguments after the -a to be interpreted exactly as if they had been specified on the command line. This is useful for objects that require shared pieces of code from far away places to be compiled (as is the case with the kernel debugger, kadb, which shares code with the standard adb debugger). The ``ps'' module shows that the source for ``ps'' lives in the ``bin'' directory, but only Makefile and ps.c are required to build the object.

      The module database at Prisma is now populated for the entire UNIX distribution and thereby allows us to issue the following convenient commands to check out components of the UNIX distribution without regard for their actual location within the master source repository:

		example% cvs checkout diff
		example% cvs checkout libc ps
		example% cd diff; make

      In building the module database file, it is quite possible to have name conflicts within a global software distribution. For example, SunOS provides two cat programs: one for the standard environment, /bin/cat, and one for the System V environment, /usr/5bin/cat. We resolved this conflict by naming the standard cat module ``cat'', and the System V cat module ``5cat''. Similar name modifications must be applied to other conflicting names, as might be found between a utility program and a library function, though Prisma chose not to include individual library functions within the module database at this time.

2.4.  Configurable Logging Support

      The cvs ``commit'' command is used to make a permanent change to the master source repository (where the RCS ``,v'' files live). Whenever a ``commit'' is done, the log message for the change is carefully logged by an arbitrary program (in a file, notesfile, news database, or mail). For example, a collection of these updates can be used to produce release notices. cvs can be configured to send log updates through one or more filter programs, based on a regular expression match on the directory that is being changed. This allows multiple related or unrelated projects to exist within a single cvs source repository tree, with each different project sending its ``commit'' reports to a unique log device.

      A sample logging configuration file might look as follows:

	#regex      filter-program
	DEFAULT     /usr/local/bin/nfpipe -t %s utils.updates
	^diag       /usr/local/bin/nfpipe -t %s diag.updates
	^local      /usr/local/bin/nfpipe -t %s local.updates
	^perf       /usr/local/bin/nfpipe -t %s perf.updates
	^sys        /usr/local/bin/nfpipe -t %s kernel.updates

      This sample allows the diagnostics and performance groups to share the same source repository with the kernel and utilities groups. Changes that they make are sent directly to their own notesfile [Essick] through the ``nfpipe'' program. A sufficiently simple title is substituted for the ``%s'' argument before the filter program is executed. This logging configuration file is tailored manually to each specific source repository environment.

2.5.  Tagged Releases and Dates

      Any release can be given a symbolic tag name that is stored directly in the RCS files. This tag can be used at any time to get an exact copy of any previous release. With equal ease, one can also extract an exact copy of the source files as of any arbitrary date in the past as well. Thus, all that's required to tag the current kernel, and to tag the kernel as of the Fourth of July is:

	example% cvs tag TEST_KERNEL kernel
	example% cvs tag -D 'July 4' PATRIOTIC_KERNEL kernel
The following command would retrieve an exact copy of the test kernel at some later date:
	example% cvs checkout -fp -rTEST_KERNEL kernel
The -f option causes only files that match the specified tag to be extracted, while the -p option automatically prunes empty directories. Consequently, directories added to the kernel after the test kernel was tagged are not included in the newly extracted copy of the test kernel.

      The cvs date support has exactly the same interface as that provided with RCS, however cvs must process the ``,v'' files directly due to the special handling required by the vendor branch support. The standard RCS date handling only processes one branch (or the main trunk) when checking out based on a date specification. cvs must instead process the current ``head'' branch and, if a match is not found, proceed to look for a match on the vendor branch. This, combined with reasons of performance, is why cvs processes revision (symbolic and numeric) and date specifications directly from the ``,v'' files.

2.6.  Building ``patch'' Source Distributions

      cvs can produce a ``patch'' format [Wall] output file which can be used to bring a previously released software distribution current with the newest release. This patch file supports an entire directory hierarchy within a single patch, as well as being able to add whole new files to the previous release. One can combine symbolic revisions and dates together to display changes in a very generic way:

	example% cvs patch -D 'December 1, 1988' \
	                   -D 'January 1, 1989' sys
This example displays the kernel changes made in the month of December, 1988. To release a patch file, for example, to take the cvs distribution from version 1.0 to version 1.4 might be done as follows:
	example% cvs patch -rCVS_1_0 -rCVS_1_4 cvs