One of the goals of the project was to develop a tool that would detect sites in existing code where a short-lived dynamic allocation could be replaced with a stack allocation. Ideally, it would also generate a patch that could be applied at or just before compile time, so as to not clutter the code.
In order to simplify the development of this tool, a static analysis tool to extend needed to be chosen. The Frama-C platform was chosen for its design which emphasises modularity and extensibility.
Frama-C and its plugins are written in OCaml, a statically typed language, with type inference, mixing features of imperative, impure functional, and object-oriented programming; however most of the advanced language features are left unused by Forgetful.
Frama-C places an emphasis on correctness of the analysis, allowing plugins to annotate the constructed AST with information such as: proof burdens for other plugins to fulfil; metadata about values and expressions; and completed proofs (using Hoare-style weakest precondition computations and standard proof assistants), called properties.
This system of communication allows plugins to take advantage of information computed by other plugins, and hence the emphasis placed on modularity.
Frama-C unfortunately suffers from a lack of documentation for the usage of its core plugins. Plugin development itself, however, is reasonably well documented.
The Frama-C kernel doesn't provide functionality outside of the scope of simple parsing and simplification of the program AST, opting instead to leave implementation of functionality such as escape analysis and value analysis to plug-ins. As such functionality is essential to many use-cases, a plug-in is provided alongside the kernel to provide it, and is called Evolved Value Analysis (EVA).
The EVA plugin is of particular interest as it provides ranges or sets of possible values for variables. This can be used to track the result of an allocation through a function and help to determine its lifetime. It also works well in conjunction with the semantic constant folding to capture allocations performed in loops.
EVA can also be used to determine potential sizes of allocations, though the effectiveness of this analysis is unlikely to be high as code is unlikely to have constraints defined in the source to assist the analysis. Additionally, EVA cannot determine the relative frequency of different allocation sizes as well, meaning that a site could be suggested despite it only performing small allocations a small minority of the time.
By joining lifetime detection with allocation size prediction, sites of small, transient allocations can be found and highlighted to the user who can then determine whether they're worth changing.
Unfortunately EVA's API documentation is quite poor which, combined with lack of experience with Frama-C and OCaml, means that a large amount of time was lost and the plugin isn't as complete as intended.
The core functionality of the plugin is provided by a combination of one of Frama-C's Visitor base classes and the EVA plugin. The visitor classes are (intuitively) classes that use the visitor pattern to callback to user defined code during a traversal of the constructed AST (annotated with anything previous plugins may have added).
In visiting the AST, we're interested only in statements, and in particular in statements containing malloc
and free
(in theory, we're interested in any function that leads to dynamic allocation, which could be achieved by tracking any functions that eventually lead to a call to brk
, sbrk
, or mmap
, or even just adding on calloc/realloc, but ultimately tracking these other functions has its own limitations and complications).
Visiting is performed on a normalised AST, which simplifies the processing code, as only certain limited types of statement need to be examined. This allows code containing unusual and/or complex structures allowed in C to be processed as easily as more straightforward code.
Through Frama-C's documentation, and experimentation with a perfectly reasonable looking C testing file (shown at the bottom of the page), it was determined that malloc
only occurred in statements which Frama-C classes as Instr
, and in particular only in Call
(a reassignment of the form blah = malloc(1)
and Local_init
(a declaration and initialisation of the form int *blah = malloc(sizeof(int))
). free
only occurs in Call
(in the form free(blah)
).
Next, some way of tracking an allocated block of memory through the code needed to be found. Frama-C has a concept of bases, which represent a pointer to the heap (it has no concept of the heap itself). In order to retrieve values from malloc to track through the code, EVA is used. However, this is the point where EVA's lack of documentation caused delays. The only API functions referred to are value
and eval_expr
/eval_lval
, but these return values before execution of a statement, not after (which is required to determine the return value of malloc).
The documentation suggested reading the existing GUI code in order to learn about EVA's API. This led to a large amount of time being spent trying to read complex and in some cases heavily duplicated code, which used an outdated method to access EVA's API calls, discourged by the plugin development guide in new code.
Eventually, a solution was found on StackOverflow (as usual!), which allowed retrieval of values after the execution of a statement, encapsulated (as these hard to track down problems often are) in a simple function using EVA's API.
At this point, the Forgetful plugin can track what bases a particular free
could possibly be freeing, and what bases are created as a result of a malloc
. To keep scope simple, checking is only performed intraprocedurally, by storing the location and size of a base's allocation at a malloc
if the allocation is smaller than or equal to in size to a configurable maximum size. Each time a new function is encountered, the storage is cleared out.
Since variables can be associated with multiple bases, individual allocations can be reported on even if the variable itself isn't always eligible for the optimisation. For example, if the varied
variable is randomly assigned either a 4 byte allocation or a 68 byte allocation and the max size is set to 64, the 4 byte allocation will be reported.
Due to limitations in EVA itself, the Forgetful plugin can't work on recursive functions.
Now for the live demo on the demo file!