An evaluation of the uses of program control flow information in instruction memory hierarchies: prefetching and predictability

Keith Boland

An evaluation of the uses of program control flow information in instruction memory hierarchies: prefetching and predictability

1993

Keith Boland

Compiler techniques for the optimization of memory system performance have only recently begun to be investigated. This thesis extends the boundary of knowledge about the nature of the program control flow information that is available at compile time. Specifically, program control flow information is used to quantify the cost and performance of prefetching instructions into primary caches and to explore the potential for reducing the variance in the miss ratios of instruction caches for real-time system applications. The cost of instruction prefetching, which is related to the branching factor of programs, is quantified by the formal definition of a function called Lookahead(n), where n is the depth to which exhaustive instruction prefetching is to be implemented. It is shown that for a suite of realistic benchmark programs, the worst-case complexity of the Lookahead(n) function is significantly less than what might be expected in the static case and clearly deviates from an exponential trend in the dynamic case. The performance of instruction prefetching is modeled through the development of a theoretical model called DCP, for Demand-based Cluster Prefetching. By extending the domain of the model, it is shown that the performance of any instruction prefetching algorithm will be at best only slightly better than a two-level cache hierarchy at memory latencies above about 50 processor clock cycles. The motivation for reducing the variance in the miss ratios of instruction caches can be found in the area of real-time systems, where it is important to know the bounds of the worst case performance of the memory system. An algorithm is developed to restructure programs so that their instruction cache mappings yield significantly reduced variances in the miss ratios produced over their input domains. The restructuring algorithm is then applied to a sample application for a range of cache sizes, yielding miss ratio variance reductions that are typically greater than 60%. It is then shown through the use of extreme value theory that the reduction in the variance of the miss ratios yields a much tighter bound on the "worst case" miss ratio of the cache.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations