Literature Review/State of the Art

There are a number of aspects to consider when determining what the appropriate literature/state of the art is.

For

Comparison of the plugin and the platform it built on: other static analysis tools
Alternative methods to approach the problem: profiling and runtime tools
Allocator methods and their performance: papers comparing them
Validation of the project space: youngest generation GC, alternative allocation methods and allocators to reduce allocation burden
Discussion of results: typical performance increase from compiler optimisations

Plugin and its Platform

Three options are examined - Frama-C itself, Infer, and clang-analyzer.

Frama-C

While Frama-C was a decent choice for its extensibility and availability of introductory guides to plugin development, finding assistance/documentation for it was relatively difficult, even for basic functionality of EVA.
Perhaps a better choice would have been to modify/extend a static analysis tool with a more active and modern community (irregular releases consist of single dumps of source code, with the first beta release in March 2008), which likely would've made it easier to dig into the codebase even if it wasn't built as intentionally for extensibility.
Frama-C's (open source) community is not hugely active, with only a few external plugins available, and not many plugins having been created between its initial release a decade ago and now.

Infer

To the end of finding a large and active community, Facebook's Infer might have been a better choice. By its own description, "Infer checks for null pointer dereferences, memory leaks, coding conventions and unavailable API’s", it may also have been well suited to the problem thanks to its emphasis on memory issues and tracking memory.
Facebook first open-sourced Infer in June of 2015, at which point it already supported C, Objective-C, C++, Java (≤ 1.7) and contained ~100k LoC. Since then, it has averaged 4-6 commits per day (dependending on whether you count weekends), now has ~300k LoC, and supports Java 8 as well as the original languages.
While it wasn't explicitly designed for modularity like Frama-C, its internal structure for checkers is consistent and logical, which should make the addition of new checkers not prohibitively difficult.

clang-analyzer

clang-analyzer, which Infer is built on, is the only analyser in this list not written in OCaml, instead being in C++ to match the rest of the clang codebase. Among features listed in its documentation are a few memory analysis checkers (null, double free, stack reference escape detection) that could prove to be a useful base to build off. My familiarity with C++ (at least compared to my lack of familiarity with OCaml) could also have proven useful in development, reducing delays.
clang-analyzer's development is much less active than Infer's, but still more active (or at least more frequently updated) than Frama-C's. It also has mailing lists and other community communication channels which could have been useful. It's the oldest of the three, started in September 2007, but has had a huge amount of development since then (a caveat here: its full development lifecycle is public, from inception to its current state, unlike the other two, which can make it seem more active).
One unique mark against clang-analyser is its heavy emphasis on OS X related features and tutorials/setup instructions revolving around assumptions of the use of Apple hardware and software, which could cause minor delays in development.

Alternative Approaches

Other than static analysis/compile-time checking, run-time analysis can also be used to surface issues in the code. This profiling has an added benefit that (used properly), it can better reflect performance under real workloads. There are three main ways of achieving this, which are all different levels of the same thing.

Custom Built Wrapper Functions

The first method involves a custom allocator wrapper function, which is the approach taken by cURL. In this method, calls to memory management functions are intercepted by redefining them to instead use the custom allocators, using something like #define malloc(size) curl_domalloc(size, __LINE__, __FILE__) to intercept calls and add in debugging information.
Usage of this system can usually be enabled or disabled at compile time by defining or undefining certain symbols, and a detailed implementation can be seen in cURL itself.
This method has the obvious downside that the system must be maintained by its users, who usually have goals orthogonal to it. However, it also allows the most fine-grained control of the three methods, and can easily be extended to also track other functions.

Provided Allocator Wrapper Functions

When the fine-grained control of intercepting any given function, or adding more things to profile is not needed, wrapper functions can be found already written available online to be compiled along with your existing code.
Similar to the above, these intercept calls to memory management functions, outputting profiling data to a specified log file. One such example is malloc_count, which can also track stack usage. Since these projects are more focused on this specific task, they also attempt to meet standards, which allows the user to use their profiling data with existing graphing tools for memory profiling dumps.
While this method doesn't have the advantage of fine-tuned control of what exact data is dumped nor which functions are intercepted, it also doesn't suffer the disadvantages of maintenance and development of the system. It also benefits over the next method by being faster.

Run-time Profiling/Interception

Without requiring function interception to be included in code at compile time, hooks can be used to intercept calls to any given function. Programs such as Valgrind can use this functionality, for example, to track memory leaks at run-time by tracking all allocated and freed memory, or determine when uninitialised data is used as if it was initialised (as a branch condition, for example).
Similarly, programs such as Heaptrack use hooks in order to profile memory usage. As such, it requires no changes to the actual source or files included for compilation. However, in order to get more detailed information about allocations (such as the line in the source where it occurs), debug symbols must be included in the binary, which involves a minor change to the compilation procedure.
As has been the trend in this section, the increased convenience comes with a decrease in tunability. However, again in comes with an improvement in ease of consumption of data - Heaptrack is a two part tool, one producing a detailed dump while the other consumes the dump to produce a large number of statistics and graphics.

Allocator Methods and their Performance

There are knock-on effects on performance to consider as a side effect of the optimisation, and direct effects that can moderate the effectiveness of the optimisation.

Locality

One of the knock-on effects is on locality of the data. Replacing dynamic allocations with stack allocation could affect cache-hit frequency, as the stack is likely to be in cache, whereas the next section of the heap may not be. However, in An Empirical and Analytic Study of Stack vs. Heap Cost for Languages with Closures, Appel, Shao, 1996, they found the effect of using a stack or a heap for frame allocation to be too trivial to matter in terms of effect on the cache miss rate.
They do note that the cache write-miss rate is very high for heap-allocated frames, but this can be mitigated with an appropriate write-miss strategy.
In short, it seems unlikely that the cache miss/hit rate will have a significant impact on the results of the optimisation, but of course architectures and timings have changed over the past two decades - what was trivial then may not be any more (as cache increases in relative speed).

Real-Time Considerations

Another consideration for the utility of the optimisation is the different potential target users. For systems with real-time considerations, using stack allocation instead of heap allocation can be beneficial even without improvement in average case performance increase, due to capping the worst-case performance.
In Real-Time Performance of Dynamic Memory Allocation Algorithms, Puaut, 2002, she finds that ratio of average to worst-case performance (obtained analytically) of memory allocation algorithms varies from about 10-10000 (the lower bound algorithms have an average case of about 10x those of the higher bound), while stack allocation has fixed performance, even if bounds checking is added. Of course, similar benefits could be obtained with correct use of an allocator designed for that purpose.
However, Puaut also finds that the actual observed ratio of average to worst-case performance ranges from about 1-35, so for real workloads the effect may, again, not be great.

Validation of Project Space

In order to validate the project space, other attempts to target the same inefficiencies were searched for. Small amounts of short-lived memory is the purview of the youngest generation of generational garbage collection, while other approaches to tackling the same issue could instead go directly to the allocator method in order to explicitly split out these allocations ahead of time.

Generational Garbage Collection

Generational (or ephemeral) garbage collection relies on a hypothesis, supported by empirical measurements, that the most recently created objects are also those most likely to become unreachable quickly.
Supporting this hypothesis, in Uniprocessor Garbage Collection Techniques, Wilson, 1992, he claims that, while figures vary depending on the source language and program, 80-98% of all newly-allocated objects "die within a few million instructions, or before another megabyte has been allocated; the majority of objects die even more quickly, within tens of kilobytes of allocation".

In terms of efficiency, Garbage Collection Can Be Faster Than Stack Allocation, Appel, 1987 makes the counter-intuitive claim the title suggests it will. The claim relies on the usage of a copying garbage collector and a sufficiently large amount of physical memory being made available. Key to the assertion is that not every allocation will need to be handled once it's unreachable, as only objects that survive until the next garbage collection need to be handled by copying them to the new heap.
Exact formulae are provided in the paper, parametrisable in terms of: the memory available; instructions to copy an object; average size of an object; instructions to explicitly free an item; number of allocated items; instructions to traverse the object graph. However, the conclusion is that even arbitrarily efficient explicit freeing always eventually loses to larger amounts of available memory.

Direct to Allocator

There are cases where garbage collection can't be used for one reason or another (insufficient timing guarantees for real-time systems for example), so in these cases alternate systems can be used to take advantage of the large proportion of short-lived allocations.

Using Lifetime Predictors to Improve Memory Allocation Performance, Barrett, Zorn, 1993 discusses the use of profiling to determine which allocations are short-lived. In particular, they describe an algorithm for lifetime prediction using a combination of profiling data, allocation site and allocation size which (depending on which program they used to benchmark) correctly predicted the lifetimes of 42-99% of allocations. Clearly these results vary too widely (and only 4 programs were benchmarked) to be a definitive indicator of whether this method is worthwhile.
However, in the same paper, simulated results using lifetime prediction to segregate allocations into specialised areas in the heap (with shorter-lived allocations/deallocations being cheaper) showed that there was significant potential for reduction of memory overhead, improvement of reference locality (by having recent allocations in a small section of the heap likely to be in cache) and occasionally improvement of performance.

Results

There are two sets of results to discuss (one for each of the case studies). While the real world case study was less than impressive, there are a few points to moderate it.

Specialised Benchmarks

The specialised benchmarks, being constructed to be an ideal case, clearly validate that the optimisation can be worthwhile in specific cases.

That being said, they also serve as a reminder that even in simple code there may be hidden issues and complexity, in particular alloca's performance under O0 in both the parallel and sort benchmarks and the unexpected drop in the dynamic method's performance under O3 in the sort benchmark.

Lastly, it also highlights that in certain cases the optimisation makes no performance difference (a wash by 16 items in the sort benchmark, minimal by 64 items in the parallel benchmark) in certain cases while still having the negative effects (very large stack, increased risk of stack overflow, programmer error leading to escaping pointers to stack allocated items).

cURL Benchmarks

With Stenberg's claims about performance increases and removal of huge amounts of mallocs, the results found are quite disappointing, being indistinguishable from variance in the test environment itself.

However, this doesn't mean the optimisation isn't worth performing. Compiler optimisations produce minimal performance benefits over time, and build up mutually to more significant gains over time and in conjunction with each other.

This was somewhat formalised by Proebsting's Law (Todd A. Proebsting of University of Arizona), which claimed "compiler optimization advances double computing power every 18 years", with figures chosen to reasonably match up with the better known Moore's Law. This is equivalent to 4% increases per year, which makes even vanishingly small improvements in any given case seem better by comparison.

Proebsting's law is further researched in On Proebsting’s law, Scott, 2001, in which Scott finds it to most likely be true, producing a range of possible figures for yearly improvement between 2.8-4.9% depending on a few factors.
However, this doesn't mean optimisations aren't worthwhile. He also refers to a lecture by Bill Pugh (University of Maryland) titled Is Code Optimization (Research) Relevant, created in response to Proebsting's Law, in which Pugh argues no one will turn down a free performance improvement from compiler optimisations, but suggests that focus on producing optimisations for high-level constructs to free up programmers to be more productive could be more worthwhile.