Enhancing legacy software system analysis by combining behavioural and semantic information sources

Cutting, David (2016) Enhancing legacy software system analysis by combining behavioural and semantic information sources. Doctoral thesis, University of East Anglia.

Download (2058kB) | Preview


    Computer software is, by its very nature highly complex and invisible yet subject
    to a near-continual pressure to change. Over time the development process has
    become more mature and less risky. This is in large part due to the concept
    of software traceability; the ability to relate software components back to their
    initial requirements and between each other. Such traceability aids tasks such
    as maintenance by facilitating the prediction of “ripple effects” that may result,
    and aiding comprehension of software structures in general. Many organisations,
    however, have large amounts of software for which little or no documentation
    exists; the original developers are no longer available and yet this software still
    underpins critical functions. Such “legacy software” can therefore represent a high
    risk when changes are required.
    Consequently, large amounts of effort go into attempting to comprehend and
    understand legacy software. The most common way to accomplish this, given
    that reading the code directly is hugely time consuming and near-impossible, is
    to reverse engineer the code, usually to a form of representative projection such
    as a UML class diagram. Although a wide number of tools and approaches exist,
    there is no empirical way to compare them or validate new developments. Consequently
    there was an identified need to define and create the Reverse Engineering
    to Design Benchmark (RED-BM). This was then applied to a number of industrial
    tools. The measured performance of these tools varies from 8.8% to 100%,
    demonstrating both the effectiveness of the benchmark and the questionable performance
    of several tools.
    In addition to the structural relationships detectable through static reverse
    engineering, other sources of information are available with the potential to reveal
    other types of relationships such as semantic links. One such source is the mining
    of source code repositories which can be analysed to find components within a
    software system that have, historically, commonly been changed together during
    the evolution of the system and from the strength of that infer a semantic link. An
    approach was implemented to mine such semantic relationships from repositories
    and relationships were found beyond those expressed by static reverse engineering.
    These included groups of relationships potentially suitable for clustering.
    To allow for the general use of multiple information sources to build traceability
    links between software components a uniform approach was defined and
    illustrated. This includes rules and formulas to allow combination of sources.
    The uniform approach was implemented in the field of predictive change impact
    analysis using reverse engineering and repository mining as information sources.
    This implementation, the Java Code Relationship Anlaysis (jcRA) package, was
    then evaluated against an industry standard tool, JRipples. Depending on the
    target, the combined approach is able to outperform JRipples in detecting potential
    impacts with the risk of over-matching (a high number of false-positives
    and overall class coverage on some targets).

    Item Type: Thesis (Doctoral)
    Faculty \ School: Faculty of Science > School of Computing Sciences
    Depositing User: Jackie Webb
    Date Deposited: 30 Nov 2016 12:41
    Last Modified: 30 Nov 2016 12:41
    URI: https://ueaeprints.uea.ac.uk/id/eprint/61542

    Actions (login required)

    View Item