トップ 差分 一覧 Farm ソース 検索 ヘルプ PDF RSS ログイン

Diary/2009-8-25

シミュレーション環境あれこれ

シミュレーション環境あれこれについて,ざっと論文を読んでみた

How to simulate 1000 cores
@article{1577133,
author = {Monchiero, Matteo and Ahn, Jung Ho and Falc\'{o}n, Ayose and Ortega, Daniel and Faraboschi, Paolo},
title = {How to simulate 1000 cores},
journal = {SIGARCH Comput. Archit. News},
volume = {37},
number = {2},
year = {2009},
issn = {0163-5964},
pages = {10--19},
doi = {http://doi.acm.org/10.1145/1577129.1577133},
publisher = {ACM},
address = {New York, NY, USA},
}
  • functional-firstなfull-system simulation
  • trace-driven
  • the framework extracts threads from a multi-threaded application running in the full-system simulator and feeds them to a timing simulator of many-core architecture
    • mapping application threads to simulating cores is to identify the different threads running in the functional simulator
    • insert a special instruction in the OS scheduler code, telling our simulator the processID(PID) and threadID(TID) of the next task
  • 1024-coreまでスケールすることをベンチマークで示す
    • simulation overhead of only 30% with respect to the single-core simulation

The M5 Simulator: Modeling Networked Systems
@article{1159085,
author = {Binkert, Nathan L. and Dreslinski, Ronald G. and Hsu, Lisa R. and Lim, Kevin T. and Saidi, Ali G. and Reinhardt, Steven K.},
title = {The M5 Simulator: Modeling Networked Systems},
journal = {IEEE Micro},
volume = {26},
number = {4},
year = {2006},
issn = {0272-1732},
pages = {52--60},
doi = {http://dx.doi.org/10.1109/MM.2006.82},
publisher = {IEEE Computer Society Press},
address = {Los Alamitos, CA, USA},
}

  • OOなシミュレーション環境
    • 高いレイヤはpython,低いレイヤはC++
  • Network I/Oに着目したfull-systemシミュレーション
  • GEMSはSimFlexとは同時期の開発.ただし,SimICSに依存しない

Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
@article{1105747,
author = {Martin, Milo M. K. and Sorin, Daniel J. and Beckmann, Bradford M. and Marty, Michael R. and Xu, Min and Alameldeen, Alaa R. and Moore, Kevin E. and Hill, Mark D. and Wood, David A.},
title = {Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset},
journal = {SIGARCH Comput. Archit. News},
volume = {33},
number = {4},
year = {2005},
issn = {0163-5964},
pages = {92--99},
doi = {http://doi.acm.org/10.1145/1105734.1105747},
publisher = {ACM},
address = {New York, NY, USA},
}
  • simicsとの協調によるfull-system evaluation
  • timing-first simulation approach
  • Overview
    • Ruby: multiprocessor memory system
    • random tester modules
    • micro-benchmark module
    • simcs : functional simulator, in-order processor with no-pipeline stalls
    • Opal dynamically-scheduled SPARC v9 processor

Simics: A Full System Simulation Platform
@article{621909,
author = {Magnusson, Peter S. and Christensson, Magnus and Eskilson, Jesper and Forsgren, Daniel and H{\aa}llberg, Gustav and H\"{o}gberg, Johan and Larsson, Fredrik and Moestedt, Andreas and Werner, Bengt},
title = {Simics: A Full System Simulation Platform},
journal = {Computer},
volume = {35},
number = {2},
year = {2002},
issn = {0018-9162},
pages = {50--58},
doi = {http://dx.doi.org/10.1109/2.982916},
publisher = {IEEE Computer Society Press},
address = {Los Alamitos, CA, USA},
}

A methodology and a case-study for network-on-chip based MP-SoC architectures
@inproceedings{1459303,
author = {Tota, Sergio V. and Casu, Mario R. and Motto, Paolo and Roch, Massimo Ruo and Zamboni, Maurizio},
title = {A methodology and a case-study for network-on-chip based MP-SoC architectures},
booktitle = {Nano-Net '07: Proceedings of the 2nd international conference on Nano-Networks},
year = {2007},
isbn = {978-963-9799-10-3},
pages = {1--5},
location = {Catania, Italy},
publisher = {ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering)},
address = {ICST, Brussels, Belgium, Belgium},
}
  • modularでscalableなSoC向けの開発手法
  • FPGAにmappingして評価
    • a paralle graphics ray tracer rendering engine has been mapped on a FPGA prototype board

MC-Sim: an efficient simulation tool for MPSoC designs
@inproceedings{1509541,
author = {Cong, Jason and Gururaj, Karthik and Han, Guoling and Kaplan, Adam and Naik, Mishali and Reinman, Glenn},
title = {MC-Sim: an efficient simulation tool for MPSoC designs},
booktitle = {ICCAD '08: Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design},
year = {2008},
isbn = {978-1-4244-2820-5},
pages = {364--371},
location = {San Jose, California},
publisher = {IEEE Press},
address = {Piscataway, NJ, USA},
}
  • a heterogeneous multi-core simulator framework
    • is capable of accurately simulating a variety of processor
  • a methodology to automatically generate fast, cycle-true behavioral, C-based simulators for coprocessors using a high-level synthesis tool and integrate them with MC-Sim
  • C-based simulatios, 45x improvement over RTL-based
  • employs a heavily modified version of the SESC simulator

Full-system timing-first simulation
@inproceedings{511349,
author = {Mauer, Carl J. and Hill, Mark D. and Wood, David A.},
title = {Full-system timing-first simulation},
booktitle = {SIGMETRICS '02: Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems},
year = {2002},
isbn = {1-58113-531-9},
pages = {108--116},
location = {Marina Del Rey, California},
doi = {http://doi.acm.org/10.1145/511334.511349},
publisher = {ACM},
address = {New York, NY, USA},
}
  • TFsim
  • for rapid proto-typeingなfull-system simulation
  • 先にtiming simulation して founctional simulaitonで検証
  • TFsim's mostly correct functional implementation introduces a worst-case performance error of 4.8% for our commercial worklaods
    • Simics, 18-36%
  • TFsim's absolute performance is comparable to previous simulators

A practical FPGA-based framework for novel CMP research
@inproceedings{1216936,
author = {Wee, Sewook and Casper, Jared and Njoroge, Njuguna and Tesylar, Yuriy and Ge, Daxia and Kozyrakis, Christos and Olukotun, Kunle},
title = {A practical FPGA-based framework for novel CMP research},
booktitle = {FPGA '07: Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays},
year = {2007},
isbn = {978-1-59593-600-4},
pages = {116--125},
location = {Monterey, California, USA},
doi = {http://doi.acm.org/10.1145/1216919.1216936},
publisher = {ACM},
address = {New York, NY, USA},
}
  • ATLAS, the first prototype for CMPs with hardware support for Tranaction Memory
    • uses the BEE2
    • with 8 PowerPC cores @100MHz, Linux
  • 100x performance improvement over a software simulator(TASSLEとの比較)

Exploring Large-Scale CMP Architectures Using ManySim
@article{1308616,
author = {Zhao, Li and Iyer, Ravi and Moses, Jaideep and Illikkal, Ramesh and Makineni, Srihari and Newell, Don},
title = {Exploring Large-Scale CMP Architectures Using ManySim},
journal = {IEEE Micro},
volume = {27},
number = {4},
year = {2007},
issn = {0272-1732},
pages = {21--33},
doi = {http://dx.doi.org/10.1109/MM.2007.66},
publisher = {IEEE Computer Society Press},
address = {Los Alamitos, CA, USA},
}
  • trace-driven


ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
@article{1534925,
author = {Chung, Eric S. and Papamichael, Michael K. and Nurvitadhi, Eriko and Hoe, James C. and Mai, Ken and Falsafi, Babak},
title = {ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs},
journal = {ACM Trans. Reconfigurable Technol. Syst.},
volume = {2},
number = {2},
year = {2009},
issn = {1936-7406},
pages = {1--32},
doi = {http://doi.acm.org/10.1145/1534916.1534925},
publisher = {ACM},
address = {New York, NY, USA},
}
  • uses FPGAs to accelerate full-system multiprocessor simulation
  • facilitate high-performance instrumentaion
  • virtualizes the execution of many logical processors onto a consolidated number of multiple-context execution engines on the FPGA
    • 従来: Prior FPGA approaches that prototype a complete systemin hardware
    • components in a simulated sysmte are selectively partitioned across both FPGA and software hosts
  • full-system functional simulator for a 16-way UltraSPARC III symmetric multiprocessor server(= BlueSPARC)
    • time-multiplexed interleaving
    • hybrid simulation with transplanting
    • On average, 38x speedup

@misc{cell,
title = {IBM Full-System Simulator for the Cell Broadband Engine Processor},
author = {http://www.alphaworks.ibm.com/tech/cellsystemsim},
year = {2005},
month = {11},
}

Performance analysis and visualization tools for cell/B.E. multicore environment
@inproceedings{1463777,
author = {Vianney, Duc and Haber, Gad and Heilper, Andre and Zalmanovici, Marcel},
title = {Performance analysis and visualization tools for cell/B.E. multicore environment},
booktitle = {IFMT '08: Proceedings of the 1st international forum on Next-generation multicore/manycore technologies},
year = {2008},
isbn = {978-1-60558-407-2},
pages = {1--12},
location = {Cairo, Egypt},
doi = {http://doi.acm.org/10.1145/1463768.1463777},
publisher = {ACM},
address = {New York, NY, USA},
}
  • Cellのシミュレータ/SDKの話


時間軸分割並列化による高速マイクロプロセッサシミュレーション
@article{高崎透:20050815,
author="高崎 透 and 中田 尚 and 津邑 公暁 and 中島 浩",
title="時間軸分割並列化による高速マイクロプロセッサシミュレーション(プロセッサシミュレーション)",
journal="情報処理学会論文誌. コンピューティングシステム",
ISSN="03875806",
publisher="社団法人情報処理学会",
year="20050815",
volume="46",
number="12",
pages="84-97",
URL="http://ci.nii.ac.jp/naid/110002769826/",
DOI="",
}
  • シミュレーションを時間方向で分割
    • シミュレーションの並列化を実現
    • 分割点の状態の一致/履歴で正当性を検証
  • 8ノード8分割で対SimpleScalarで,2.7倍,平均1.8倍
    • 命令レベルシミュレーションがsim-cache(低速)ベース

共有メモリ型マルチプロセッサの分散シミュレータShamanの実装と評価
@article{松尾治幸:20010725,
author="松尾 治幸 and 大野 和彦 and 中島 浩",
title="共有メモリ型マルチプロセッサの分散シミュレータShamanの実装と評価",
journal="情報処理学会研究報告. 計算機アーキテクチャ研究会報告",
ISSN="09196072",
publisher="社団法人情報処理学会",
year="20010725",
volume="2001",
number="76",
pages="1-6",
URL="http://ci.nii.ac.jp/naid/110002774920/",
DOI="",
}
  • キャッシュを含む高精度のシミュレーションを並列実行で高速化
  • フロントエンドでソフトウェア分散教諭メモリで並列実行.メモリ参照履歴を生成
  • バックエンドで,履歴をもとにキャッシュを含むメモリの挙動をシミュレーション
  • SPLASH-2の2つのカーネルで評価.16ノードのフロントエンドで55M-clock/sec
    • ノード数8以上で飽和傾向

時間軸分割並列マイクロプロセッサシミュレータの高速化と評価
@article{矢野聖宗:20070301,
author="矢野 聖宗 and 高崎 透 and 中田,尚 and 中島 浩",
title="時間軸分割並列マイクロプロセッサシミュレータの高速化と評価(シミュレーション・エミュレーション,「ハイパフォーマンスコンピューティングとアーキテクチャの評価」に関する北海道ワークショップ(HOKKE-2007))",
journal="情報処理学会研究報告. 計算機アーキテクチャ研究会報告",
ISSN="09196072",
publisher="社団法人情報処理学会",
year="20070301",
volume="2007",
number="17",
pages="187-192",
URL="http://ci.nii.ac.jp/naid/110006249891/",
DOI="",
}
  • シミュレーションを時間方向で分割
    • シミュレーションの並列化を実現
    • 分割点の状態の一致/履歴で正当性を検証
  • ワークロード最適化シミュレーション技術により命令レベルのシミュレーション部分を高速化
    • マシン状態を近似的に求める部分
  • sim-outorder比,最大9.41倍,平均6.4倍

A simulation methodology for reliability analysis in multi-core SoCs
@inproceedings{1127933,
author = {Ayse K. Coskun and Tajana Simunic Rosing and Yusuf Leblebici and Giovanni De Micheli},
title = {A simulation methodology for reliability analysis in multi-core SoCs},
booktitle = {GLSVLSI '06: Proceedings of the 16th ACM Great Lakes symposium on VLSI},
year = {2006},
isbn = {1-59593-347-6},
pages = {95--99},
location = {Philadelphia, PA, USA},
doi = {http://doi.acm.org/10.1145/1127908.1127933},
publisher = {ACM},
address = {New York, NY, USA},
}
  • the first to proived system-on-chip level fine-graind reliablity analisys
  • the reliablity effects of design choices such as thermal packaging and placement


Multi-processor operating system emulation framework with thermal feedback for systems-on-chip
@inproceedings{1228787,
author = {Salvatore Carta and Andrea Acquaviva and Pablo G. Del Valle and David Atienza and Giovanni De Micheli and Fernando Rincon and Luca Benini and Jose M. Mendias},
title = {Multi-processor operating system emulation framework with thermal feedback for systems-on-chip},
booktitle = {GLSVLSI '07: Proceedings of the 17th great lakes symposium on Great lakes symposium on VLSI},
year = {2007},
isbn = {978-1-59593-605-9},
pages = {311--316},
location = {Stresa-Lago Maggiore, Italy},
doi = {http://doi.acm.org/10.1145/1228784.1228787},
publisher = {ACM},
address = {New York, NY, USA},
}
  • a new MPSoc OS emulation framework that enables the sutdy of thermal managemtn strategies
    • at the architectural- and OS-levels with the help of a standard FPGA

COTSon: infrastructure for full system simulation
@article{1496921,
author = {Argollo, Eduardo and Falc\'{o}n, Ayose and Faraboschi, Paolo and Monchiero, Matteo and Ortega, Daniel},
title = {COTSon: infrastructure for full system simulation},
journal = {SIGOPS Oper. Syst. Rev.},
volume = {43},
number = {1},
year = {2009},
issn = {0163-5980},
pages = {52--61},
doi = {http://doi.acm.org/10.1145/1496909.1496921},
publisher = {ACM},
address = {New York, NY, USA},
}
  • fast and accurate evaluation of current and future computing sysmtes
  • covering the full software stack and complete hardware model
  • functional emulators and timing models
  • uses AMD's SimNow simulation
  • distrubutes the simulation of the different cores over multiple hosts


今日のつぶやき


  • 関連研究がぼろぼろと.予想はしてたけど. (Tue Aug 25 22:09:09 2009)
  • ここから辿るか...明日の朝までに何本読めるだろうか... (Tue Aug 25 21:27:09 2009)
  • "How to simulate 1000 cores" (Tue Aug 25 21:26:10 2009)
  • スケジュール管理なアレが欲しいです. (Tue Aug 25 20:24:38 2009)
  • IrfanViewで画像に部分的にモザイクかけられるのか.便利. (Tue Aug 25 17:27:00 2009)
  • ドタバタと机の環境を変更.ええ,現実逃避です. (Tue Aug 25 17:26:30 2009)
  • 右側に外部ディスプレイをおいていたら,なんだか体のバランスがおかしくなってきた気がするので左側に移動してみようと思う (Tue Aug 25 16:12:15 2009)
  • 先生にスケジュール管理をちゃんとしなさいと言われたのは二度目.次はないな.iPhone買うか...(違 (Tue Aug 25 14:12:37 2009)
  • update-alternative --config javaでsun-java5-jdkを使うよう変更 (Tue Aug 25 14:04:16 2009)
  • つづりをまちがえすぎという指摘をうけた.Zaurusが正しいですね. (Tue Aug 25 14:01:20 2009)
  • @ororog あったので,読みました.雑誌かあ.残念. (Tue Aug 25 13:37:30 2009)
  • とりあえず,自分でカーネルコンパイルできる環境つくるか... (Tue Aug 25 13:34:59 2009)
  • @ororog あたらしいやつ?8巻? (Tue Aug 25 13:34:38 2009)
  • AndroidをひろってきてZaurusuにいれてみた.日本語入力ができないどころか,タッチパネルが使えなかった. (Tue Aug 25 13:34:18 2009)
  • おなかすいた (Tue Aug 25 12:19:19 2009)
  • ここは,ZaurusのAndroid化か?無線LANカード使えるといいな...弱っ (Tue Aug 25 12:03:23 2009)
  • Zarusuに無線LANカードさして幸せになれた...と思ったら内蔵のNetFrontでは,JavaScriptが動作しないのだった. (Tue Aug 25 12:01:19 2009)