トップ 一覧 Farm 検索 ヘルプ RSS ログイン

Diary/2009-1-4の変更点

  • 追加された行はこのように表示されます。
  • 削除された行はこのように表示されます。
!東京に戻る
新幹線で.
ちょうど新幹線が行ったところでホームに上ったので
丸々1本分並んだことに.
すぐ横でTVカメラが三脚立てて撮影していたので,
きっと微塵も写っていないことでしょう.
窓際の電源席には座れなかったものの,
なんとか無事席を確保して,一路東京へ.
しかし,通路にも人がいっぱいでトイレに行くのも一苦労.
まあ,座っていられるだけ,随分幸せなものです.

!Programming with Tiles
{{cateogry 論文読み}}
2006のPPoPPで提案されたHierarchically Tiled Arrayに
動的分割と重なり合いの二つの新しいクラスを追加した.
記述の容易さとパフォーマンスの比較による評価

::HTA
Hierarchically Tiled Arrays(HTAs) are arrays that may be partitioned into tiles. THese tiles can be conventional arays or lower level HTAs. Tiles can be distributed across processors in a distributed-memory machine or be stored in a single machine according to a user specified layout

The C++ implementation of the HTA class is a library with ~18000 lines of code. It only contains header files, as most classes in the library are C:: templates to facilitate inlining.

::Dynamic partitioning
cache oblivious algorithms, FLAME require dynamic changes of the tile layout.
part/rmPartを追加.
←TBB requres more lines of code, variables, and data types than the HTA to express the same problem

::Overlapped tiling
Stencil codes benefit from tiling, because they increase locality and determine data distribution when running in parallel.
← programmers create a shadow or ghost region around each tile that contains a copy of the elements of the neighbor tiles
← automatically or manually update

::Evaluation
* 性能評価
** sequential(行列積,LU分解,3D Jacobi)
** parallel(Parallel Merge, MG/LU NAS)
* Readability/Productivity
** the programmng effort[17]
** the cyclomatic number[22]
** lines of code

!A Portable Runtime Interface For Multi-Level Mmeory Hierarchies
{{category 論文読み}}
for moving data and computation through parallel machines with multi-level memory hierarchies
for multi-core/SMP, Cell B.E/分散メモリクラスタ

:: The Runtime Interface
adaptation of the Sequoia compiler
* initialize/setup of the machine, including communicaton resources and resources at all levels where tasks can be executed
* data transfers between memory levels using asynchronous bulk transfers between arrays
* task execution at specified levels of the machine

* バルク転送を強化
* DISKもメモリも同様のインターフェイスで
* Top APIとBottom API

!Multiscalar Processors
{{category 論文読み}}
Multiscalar processors use a new, aggressive implementation paradigm for extractign large quantities of ILP from ordinary high level languages programs.