• An Introduction to Version Control

    Product:
    Subversion, CEE, CollabNet SourceForge Enterprise

    Component:

    Summary:

    A short overview of Version Control, based on a Wikipedia article but edited for Subversion users.



     

    Version control (also known as revision control) is the management of multiple versions of the same unit of information. It is commonly used in engineering and software development teams to manage ongoing evolution of digital documents, such as source code, blueprints, or electronic models and other critical materials. Changes to these documents are identified by changing an associated number or letter code (called the version number, version level, or simply version) and associated historically with the person making the change.

    For example, here’s a simple form of version control:

    • You create a drawing.
    • You designate this first attempt as version number 1.
    • When you make the next change to your drawing, you call this version number 2.
    • Subsequent changes to your drawing are designated versions 3, 4, and 5, and so on, until you finish your drawing.

    Software tools for version control are increasingly recognized as necessary for most software development projects.

    Overview

    Engineering version control developed from formalized processes based on tracking versions of early blueprints. Implicit in this control was the option to be able to return to any earlier state of the design, for cases in which an engineering dead-end was reached in iterating any particular engineering design.

    Likewise, in computer software engineering, version control is any practice which tracks and provides controls over changes to source code. Software developers sometimes use version control software to maintain documentation and configuration files, as well as source code. In theory, version control can be applied to any type of information record. However, in practice, the more sophisticated techniques and tools for version control have rarely been used outside software development circles (though they could actually be of benefit in many other areas).

    As software is developed and deployed, it is extremely common for multiple versions of the same software to be deployed in different sites, and for the software's developers to work individually on updates. Bugs and other issues with software are often only present in certain versions (for example, as a program evolves, the developer might fix one set of problems, but inadvertently introduce others). Therefore, for the purposes of locating and fixing bugs, it is vitally important for developers to be able to retrieve and run different versions of the software to determine in which version(s) the problem occurs. It may also be necessary to develop two versions of the software concurrently (for instance, one version might have bugs fixed, but no new features, while the other version contains the new features).

    At the simplest level, developers can simply retain multiple copies of the different versions of the program, and number them appropriately. This simple approach has been used on many large software projects. While this method can work, it is inefficient (as many near-identical copies of the program are kept around), requires a lot of self-discipline on the part of developers to save appropriate copies of their work, and often leads to mistakes. Consequently, systems to automate some or all of the version control process have been developed.

    Traditionally, version control systems have used a centralized model, where all the version control functions are performed on a shared server. A few years ago, systems began using a model where developers work directly with their own local working copies and check in code only when needed. There are two mechanisms that ensure that developers do not overwrite each others work when checking in code.

    The Lock-Modify-Unlock Solution

    In most software development projects, multiple developers work on the program at the same time. If two developers try to change the same file at the same time, without some method of managing access, the developers may overwrite each other's work. Most version control systems solve this in one of two ways.

    Many version control systems use a lock-modify-unlock model to address the problem of many authors clobbering each other's work. In this model, the repository allows only one person to change a file at a time. This exclusivity policy is managed using locks. For example:

    • Harry must lock a file before he can begin making changes to it.
    • If Harry has locked a file, then Sally can not also lock it, and therefore can not make changes to that file. All she can do is read the file, and wait for Harry to finish his changes and release his lock.
    • After Harry unlocks the file, Sally can take her turn by locking and editing the file.

    CollabNet Subversion

    The Copy-Modify-Merge Solution

    Subversion and other version control systems additionally can use a copy-modify-merge model as an alternative to locking. In this model, each user's client contacts the project repository and creates a personal working copy—a local reflection of the repository's files and directories. Users then work in parallel, modifying their private copies. Finally, the private copies are merged together into a new, final version. The version control system often assists with the merging, but ultimately a human being is responsible for making it happen correctly.

    For example:

    • Say that Harry and Sally each create working copies of the same project, copied from the repository. They work concurrently, and make changes to the same File A within their copies.
    • Sally saves her changes to the repository first.
    • When Harry attempts to save his changes later, the repository informs him that his File A is out-of-date. In other words, that File A in the repository has somehow changed since he last copied it.
    • So Harry asks his client to merge any new changes from the repository into his working copy of File A. Chances are that Sally's changes don't overlap with his own; so once he has both sets of changes integrated, he saves his working copy back to the repository.

    CollabNet Subversion

    But what if Sally's changes do overlap with Harry's changes? What then? This situation is called a conflict, and it's usually not much of a problem:

    • When Harry asks his client to merge the latest repository changes into his working copy, his copy of File A is somehow flagged as being in a state of conflict.
    • Harry can see both sets of conflicting changes, and manually choose between them.

    The copy-modify-merge model may sound a bit chaotic, but in practice, it runs extremely smoothly. Users can work in parallel, never waiting for one another. When they work on the same files, it turns out that most of their concurrent changes don't overlap at all; conflicts are infrequent. And the amount of time it takes to resolve conflicts is far less than the time lost by a locking system.

    Reviewers

    Some systems attempt to manage who is allowed to make changes to different aspects of the program, for instance, allowing changes to a file to be checked by a designated reviewer before being added.

    Delta Compression

    Most version control software use delta compression, which retains only the differences between successive versions of files. This allows more efficient storage of many different versions of files. Subversion has this capability.

    Integration with other tools

    Some of the more advanced version control tools offer many other facilities, allowing deeper integration with other tools and software engineering processes. Plugins are often available for IDEs such as Eclipse, the NetBeans IDE, and Visual Studio. Version control systems are also often at the heart of Application Lifecycle Management Solutions such as CollabNet Enterprise Edition.

    Vocabulary

    These are some common terms used in version control:

    Atomic Commit: A collection of modifications either goes into the repository completely, or not at all. This allows developers to construct and commit changes as logical chunks, and prevents problems that can occur when only a portion of a set of changes is successfully sent to the repository.

    Baseline: An approved version of a document or source file from which subsequent changes can be made.

    Change: A change (also known as a diff or delta) represents a specific modification to a document under version control. The granularity of the modification considered a change varies between version control systems.

    Change List: On many version control systems with atomic multi-change commits, a changelist (or change set) identifies the set of changes made in a single commit. This can also represent a sequential view on the source code, allowing source to be examined as of any particular changelist ID.

    Check-Out: A check-out (or checkout or co) creates a local working copy from the repository. Either a version is specified or the latest is used.

    Commit: A commit occurs when a copy of the changes made to the working copy is placed into the repository.

    Conflict: A conflict occurs when two changes are made by different parties to the same document or place within a document. Because the software may not be intelligent enough to decide which change is correct, a user is required to resolve the conflict.

    Directory Versioning: The ability of modern version control system to not only version individual files, but also track changes to whole directory trees over time. Files and directories are versioned.

    Export: An export is similar to a check-out, except that it creates a clean directory tree without the version control metadata used in a working copy. An export is often used prior to publishing the contents.

    Import: An import is the action of copying a local directory tree (not a working copy) into the repository.

    Merge / Integration: A merge or integration brings together (merges) concurrent changes into a unified version.

    Resolve: The act of user intervention to address a conflict between different changes to the same document.

    Repository: The repository is where the file data is stored, often on a server.

    Versioned metadata: ability to add arbitrary key/value pairs to files and directories, including the tracking of versions to these values over time.

    Update: An update (or sync) copies the changes that were made to the repository (by other people) into the local working directory.

    Working copy: The working copy is the local copy of files from a repository, at a specific time or version. All work done to the files in a repository is done on a working copy, hence the name.

    Most content from this article was derived from the Wikipedia article "Version Control," licensed under the GNU Free Documentation License. Additional content was derived from "Version Control with Subversion," licensed under the Creative Commons Attribution License.