Reverse engineering conjures an image of the "magic slot." We feed an unstructured, undocumented source listing into the slot and...
Reverse engineering conjures an image of the "magic slot." We feed an unstructured, undocumented source listing into the slot and out the other end comes full documentation for the computer program. Unfortunately, the magic slot doesn't exist. Reverse engineering can extract design information from source code, but the abstraction level, the completeness of the documentation, the degree to which tools and a human analyst work together, and the directionality of the process are highly variable.
The abstraction level of a reverse engineering process and the tools used to effect it refers to the sophistication of the design information that can be extracted from source code. Ideally, the abstraction level should be as high as possible. That is, the reverse engineering process should be capable of deriving procedural design representations (a low-level abstraction), program and data structure information (a somewhat higher level of abstraction), data and control flow models (a relatively high level of abstraction), and entity relationship models (a high level of abstraction). As the abstraction level increases, the software engineer is provided with information that will allow easier understanding of the program.
The completeness of a reverse engineering process refers to the level of detail that is provided at an abstraction level. In most cases, the completeness decreases as the abstraction level increases. For example, given a source code listing, it is relatively easy to develop a complete procedural design representation. Simple data flow representations may also be derived, but it is far more difficult to develop a complete set of data flow diagrams or entity-relationship models.
Completeness improves in direct proportion to the amount of analysis performed by the person doing reverse engineering. Interactivity refers to the degree to which the human is "integrated" with automated tools to create an effective reverse engineering process. In most cases, as the abstraction level increases, interactivity must increase or completeness will suffer.
If the directionality of the reverse engineering process is one way, all information extracted from the source code is provided to the software engineer who can then use it during any maintenance activity. If directionality is two way, the information is fed to a reengineering tool that attempts to restructure or regenerate the old program.
The reverse engineering process is represented in figure. Before reverse engineering activities can commence, unstructured (“dirty”) source code is restructured so that it contains only the structured programming constructs.This makes the source code easier to read and provides the basis for all the subsequent reverse engineering activities.
The core of reverse engineering is an activity called extract abstractions. The engineer must evaluate the old program and from the (often undocumented) source code, extract a meaningful specification of the processing that is performed, the user interface that is applied, and the program data structures or database that is used.
Reverse Engineering to Understand Processing
The first real reverse engineering activity begins with an attempt to understand and then extract procedural abstractions represented by the source code. To understand procedural abstractions, the code is analyzed at varying levels of abstraction: system, program, component, pattern, and statement.
The overall functionality of the entire application system must be understood before more detailed reverse engineering work occurs. This establishes a context for further analysis and provides insight into interoperability issues among applications within the system. Each of the programs that make up the application system represents a functional abstraction at a high level of detail. A block diagram, representing the interaction between these functional abstractions, is created. Each component performs some subfunction and represents a defined procedural abstraction. A processing narrative for each component is created. In some situations, system, program and component specifications already exist. When this is the case, the specifications are reviewed for conformance to existing code.
Things become more complex when the code inside a component is considered. The engineer looks for sections of code that represent generic procedural patterns. In almost every component, a section of code prepares data for processing (within the module), a different section of code does the processing, and another section of code prepares the results of processing for export from the component. Within each of these sections, we can encounter smaller patterns; for example, data validation and bounds checking often occur within the section of code that prepares data for processing.
For large systems, reverse engineering is generally accomplished using a semiautomated approach. CASE tools are used to “parse” the semantics of existing code. The output of this process is then passed to restructuring and forward engineering tools to complete the reengineering process.
Reverse Engineering to Understand Data
Reverse engineering of data occurs at different levels of abstraction. At the program level, internal program data structures must often be reverse engineered as part of an overall reengineering effort. At the system level, global data structures (e.g., files, databases) are often reengineered to accommodate new database management paradigms (e.g., the move from flat file to relational or object-oriented database systems). Reverse engineering of the current global data structures sets the stage for the introduction of a new systemwide database.
Internal data structures. Reverse engineering techniques for internal program data focus on the definition of classes of objects. This is accomplished by examining the program code with the intent of grouping related program variables. In many cases, the data organization within the code identifies abstract data types. For example, record structures, files, lists, and other data structures often provide an initial indicator of classes.
Breuer and Lano suggest the following approach for reverse engineering of classes:
1. Identify flags and local data structures within the program that record important information about global data structures (e.g., a file or database).
2. Define the relationship between flags and local data structures and the global data structures. For example, a flag may be set when a file is empty; a local data structure may serve as a buffer that contains the last 100 records acquired from a central database.
3. For every variable (within the program) that represents an array or file, list all other variables that have a logical connection to it.
These steps enable a software engineer to identify classes within the program that interact with the global data structures.
Database structure. Regardless of its logical organization and physical structure, a database allows the definition of data objects and supports some method for establishing relationships among the objects. Therefore, reengineering one database schema into another requires an understanding of existing objects and their relationships.
The following steps may be used to define the existing data model as a precursor to reengineering a new database model:
1. Build an initial object model. The classes defined as part of the model may be acquired by reviewing records in a flat file database or tables in a relational schema. The items contained in records or tables become attributes of a class.
2. Determine candidate keys. The attributes are examined to determine whether they are used to point to another record or table. Those that serve as pointers become candidate keys.
3. Refine the tentative classes. Determine whether similar classes can be combined into a single class.
4. Define generalizations. Examine classes that have many similar attributes to determine whether a class hierarchy should be constructed with a generalization class at its head.
5. Discover associations. Use techniques that are analogous to the CRC approach to establish associations among classes.
Once information defined in the preceding steps is known, a series of transformations can be applied to map the old database structure into a new database structure.
Reverse Engineering User Interfaces
Sophisticated GUIs have become de rigueur for computer-based products and systems of every type. Therefore, the redevelopment of user interfaces has become one of the most common types of reengineering activity. But before a user interface can be rebuilt, reverse engineering should occur.
To fully understand an existing user interface (UI), the structure and behavior of the interface must be specified. Merlo and his colleagues suggest three basic questions that must be answered as reverse engineering of the UI commences:
• What are the basic actions (e.g., keystrokes and mouse clicks) that the interface must process?
• What is a compact description of the behavioral response of the system to these actions?
• What is meant by a “replacement,” or more precisely, what concept of equivalence of interfaces is relevant here?
Behavioral modeling notation can provide a means for developing answers to the first two questions. Much of the information necessary to create a behavioral model can be obtained by observing the external manifestation of the existing interface. But additional information necessary to create the behavioral model must extracted from the code.
It is important to note that a replacement GUI may not mirror the old interface exactly (in fact, it may be radically different). It is often worthwhile to develop new interaction metaphors. For example, an old UI requests that a user provide a scale factor (ranging from 1 to 10) to shrink or magnify a graphical image. A reengineered GUI might use a slide-bar and mouse to accomplish the same function.