!A Task-centric...(PACT'09)
{{category 論文読み}}
 @inproceedings{kelm-pact09,
 author = {John H. Kelm and Daniel R. Johnson, Steven S. Lumetta and Matthew I. Frank and Sanjay J. Patel},
 title = {A Task-centric Memory Model for Scalable Accelarator Architectures},
 booktitle = {PACT '09: Proceedings of the 18th international conference on Parallel architectures and compilation techniques},
 year = {2009},
 pages = {???--???},
 location = {Raleigh, North Carolina},
 }

1024-coreで，MIMD(SIMDっぽくない，従来不向きとされていた)プログラムを
実行するためのメモリ管理の話．
visual programにおける共有データへのアクセスを解析
全体のキャッシュをH/Wで管理するのではなくS/Wで管理するためのプロトコルを規定
キューによるタスク管理でcompleteなタイミングでバリアを実現->[6]
結果は他のタスクマネジメント/メモリマネジメント手法との比較ではなかった．
命令のロードとかってどうしてるんだろう？

以下論文より

::Abstract
* task-centric memory model
** uses a software protocol
** working in collaboration with hardware caches
** to maintain a coherent, singl-address space view of memory w/o HW support
* for 1024-core MIMD accelarotor; Rigel

::Introduction
* task-centric memory model
** hw/sw protocol for maintaining a coherent view of shared memory for accelarotor
* visual computingが対象
** a form of bulk sync. processsingを使って開発される
*** barrierの間(interval)は独立した並列処理の単位(task)が並列に実行
** analysisによるとwell-structured sharing patternsである
* DSMと似てる．違いは
** private $をもった1chipのプロセッサであるためshared-global $へのアクセスコストが小さい
* 1024-coreの accelarotor である Rigelが対象
** a single cacheable address space
** w/o hardware-enforced $ coherence across all cores on the chip
* Contributions
** data shareing patterns for class of emerging workloadsの観察
** a scalable task-centric memory model (for 1000-cores)
** optimization
*** prefetching from DRAM is unimpeded and most beneficial to perf.
** overhead of the task-centric model can be minimal

:: Motivation/Background
* data-parallel execution modelだけじゃなくてirregular task-parallel computationも考えたい
* Application Chracterization
** Parallelism Structure(Programming styles)
*** bulk sync. processing
***- the tasks exchange little or no data within an interval
***- at the barrier, modified shared data is made globally visible
***- mostly-data-parallel, task-based shared-memory programming model, coherence management is required to enable sharing
***- do not depend on the HW support
*** the programmer's attempt to create scalable code(minimum sharing)

** Sharing Patterns
*** sync. characteristics
*** benchmarks
***- MRI benchmark(VISBench)
***- CG, sobl edge detection, k-means clustering, DMM(Rigel kernle benchmark suite)
***- GJK collision dtection benchmark(a freely-available seq.)
***- Heat (Cilk)
*** Fig.1 and Fig.2 は，the freq. of non-private loads/stores
***- the majority of non-private loads are reads to data produced before the current interval began
***- both conflict reads and writes to data shared
** Accelarotor Workload Characteristics
*** characteristics
***- read shared data is present within an iterval
***- sync. is coars-graind
***- small amounts of write-shared data within an interval
***- Fine-graind sync. (ex. atomic updates to shared data) is present but rare
***- wirte sharing within an interval is rare
*** little coherence management is required
** Cache Coherence Management
*** weekly consistent memory models
*** explict local and global memory operations
*** task-based programming model
*** as a substitute for HW $

* Related Work
** bulk-sync. parallel(BSP) model $→$ CUDA, OpenCL
** OpenMP, Intel's TBB
** Workload(PARSEC, ALPBench)
** Memory Models

::Rigel Architecture and Task Model
!今日のつぶやき
* MacOSXのiCal使いになってしまった．およよ (Fri Aug 21 15:55:44 2009)
* MindNodeを使ってみてる．かっこいいけど，自動で位置調整してくれないのは，若干不便． (Fri Aug 21 14:29:22 2009)
* さすがに，昨日の残りのドンブリに入ったビールは飲めないです． (Fri Aug 21 14:04:57 2009)
* A Task-centric Memory Model for Scalable Accelerator Architectures を読む (Fri Aug 21 11:22:18 2009)
* PACT'09の論文って読めるのも多いのね． (Fri Aug 21 11:21:39 2009)