[Cell][ヘテロジニアス][JVM]
- Our solution consists of a virtual machine layer that sits on top of the heterogeneous hardware and automatically distributes work to the different multiprocessing cores.
- Of particular note is our solution to the data latency problem
- main memory has a latency of upward of 1000 processor cycles.
- It is a dual-issue design that does not dynamically reorder instructions at issue time
- The PPE is capable of executing two threads simultaneously and can be viewed as a two-way multiprocessor with shared data flow
- 32-KB first-level (L1) instruction and data cache
- a 512-KB second level(L2) cache
- the instruction unit (IU),
- a fixed-point execution unit (XU)
- a vector scalar unit (VSU).
- It fetches four instructions per cycle per thread into a buffer and dispatches
- Up to two instructions per cycle can be executed
- The XU consists of a 64-bit wide, 32 entry large general-purpose register file per thread
- a fixed-point execution unit and a load/store unit with a 16-entry store queue.
- VSU: 64-bit wide, 32 entry large register file per thread as well as a ten-stage double precision pipeline