トップ 差分 一覧 ソース 検索 ヘルプ PDF RSS ログイン

Online Performance Auditing: Using Hot Optimizations Without Getting Burned

[HPM][PLDI][JIT][JVM][未読了]

@article{1134010,
author = {Jeremy Lau and Matthew Arnold and Michael Hind and Brad Calder},
title = {Online performance auditing: using hot optimizations without getting burned},
journal = {SIGPLAN Not.},
volume = {41},
number = {6},
year = {2006},
issn = {0362-1340},
pages = {239--251},
doi = {http://doi.acm.org/10.1145/1133255.1134010},
publisher = {ACM},
address = {New York, NY, USA},
}

Abstract

  • Although optimizations typically perform well on average, they often have unpredictable impact on running time, sometimes degrading performance significantly.
  • an online framework for evaluating the effectiveness of optimizations, enabling an online system to automatically identify and correct performance anomalies that occur at runtime.

Introduction

  • a low-overhead completely automatic online system
  • an offline feasibilty study demonstrating the robustness of the statistical technique used in the framework

Background

  Why Modeling Is Not Enough

  • Example of compiler optimization that have used explicit models are inlining, instruction selection, instruction scheduling, and register allocation
  • Implicit models are more prevalent than explicit models.
  • Most optimizations that use profiling information (often referred to as feedback-directed optimization, or FDO) still employ a model, whether explicit or implicit.
  • they generally do not measure program performance directly
  • optimizations are designed to assume that branch predictors will predict forward and backward branches in a certain way
  • one way to address the short comings of performance models is to refine the models to more accurately predict performance, but we believe this is an impossible, never-ending task.
  • with the current trend of adding more levels of complexity and virtualization to all levels of the execution stack, the problem will only become more difficult as true performance continues

  Examples to Motivate Searching the Optimization Space

  • evidence that demonstrates the difficulty of modeling the performance impact of optimizations

Experimental Setup

  • modified IBM's J9 JIT compiler to read the HW cycle counter on entry and exit spent in a given method
    • this infrastructure allows us to evaluate the impact of optimizations on individual methods in the application
  • to evaluate the quality, these experiments measure steady-state performance
    • which is a configuration of the benchmark that excludes the early execution of a program when all (dynamic) compilation would occur.

Method Inlining

  • method inlining has a fairly clear cost-benefit model
    • the potential benefit of inlining is the performance gained by removing call overhead and opimizing the callee in the context of the coller.
    • the potential cost of inlining is a decrease in instruction cache locality, and increasing the pressure on downstream optimizations such as register allocation.

The Performance Auditor

  • Performance Auditor, a framework that enables online performance evaluation
    • the framework tracks the bottom-line performance of multiple implementations of a region of code to determine which one is fastest.
    • this work uses method boundaries to delimit code regions and focuses on comparing only two versions of each method at a time
    • we refer to the process of identifying the faster method variant as a performance bakeoff
      • the component that requests and uses the bakeoff results is called the client
  • client of our framework is responsible for
    • providing the alternative optimized implementations of the code region to be compared
    • dciding what action to take with the outcome of the bakeoff
  • our solution has 2 components
    • a technique to collect execution times of an implementation of a code region
      • the client provides N optimized versions of the code region to evaluate
      • the key idea is to randomly select one of these implementations whenever the code region is entered, and record the amount of time spent
    • a statistical mechanism to determine which set of timings is fastest
      • the executions most likely occur with different program state(parameters and/or memory values)
      • this implies that the different implementations of this code region could have taken different paths and operated on difference data during their executions.
      • 入力値によるアルゴリズムの選択に関する話、
      • To calculate our confidence metric, the mean and ariance is computed for each set of timings

Methodology

  • using a development version of IBM's J9 VM and its high-performance optimizing JIT compiler
  • 20 benchmarks(SPECjvm98, SPECjbb2000, DaCapo)
  • hot mthoeds as methods that consume enough cycles to be selected by J9 for aggressive feedback-directed optimizations
  • Pentium 4's read time-stamp counter(RDTSC) instruction

Offline Convergence Study

  Experimental Setup

  • our goal is to determine how many samples are needed to accurately detect a performance difference between two compilations of a method

  Fixed Number of Samples

prior online systems that performed a bakeoff between different compiler optimizations took either on or two timing samples for each optimized version of the code.

  Statistical Approach

Using Yield Points for TIming

Online Performance Auditor

  Adaptive Optimization in Virtual Machines

  • VMs use the JIT compiler's multiple optimization levels to trade-off the cost of hgh-level optimizations with their benefit
  • once to insert instrumentation to gather detailed profile information about the method, and then a second time to take advantage of the profile information after the method has run for some duration

  Method Dispatcher

  • we must intercept the method dispatch for the method beng timed(M) so that execution can ump to an implementation at random

  Data processing

  Overhead

  Inlining Client: Full Online Optimization

  • we modified the recompilation logic to perform a bakeoff for these hot methods after they have been instrumented

Discussion

  • The approach taken by this paper - comparing the performance of multiple versions of a region of code without attempting to hold inputs and program state constant - is often initially dismissed as infeasible.
  • One technical challenge was engineering the system to ensure that the collection and processing of data did not skew the results
  • one source of noise is when timings are polluted by VM activity, such as JIT compilation of garbage collection; these timing can be identified and eliminated fairly easily.

Related Work

  Adaptive Compilation with Empirical Search

  Dynamic Optimization Using Heuristics and Feedback Directed Profiling

  Offline Empirical Search

  Other Techniques for Improving Optimization Decisions

Conclusions