トップ 一覧 Farm 検索 ヘルプ RSS ログイン

Diary/2020-12-3の変更点

  • 追加された行はこのように表示されます。
  • 削除された行はこのように表示されます。
!Ubuntu/ZCU106
ZCU111と同じようにビルド.
スクリプト化した.https://github.com/miyo/build-zcu106-linux
あとは必要なものをSDカードに書くだけ.

:: SDカードの用意
先頭に200MくらいのFAT領域,残りにext4領域を作る
FAT領域のタイプはc(W95 FAT32 (LBA)),ext4の方は83(Linux)にセットする.
で,
 mkfs.vfat -F 32 -n boot /dev/sdX1
 mkfs.ext4 -L root /dev/sdX2
などとしてフォーマット.Xのとこは自分の環境にあわせる.

:: コピー
あとは,SDカードの先頭に作ったFATパーティションに ${WORK}/imageの中身をコピー.
二番目のパーティションにはQEMUで作ったルートパーティションを展開

:: USB-UART
Linuxだと/dev/ttyUSB{0,1,2,3}が見える./dev/ttyUSB0に接続する.
(ZCU111は/dev/ttyUSB1だったので注意)

!rftoolをビルドしてみよう
petalinuxなプロジェクトからrftoolを実機にコピーしてビルドしてみることに
* "rfdc.h"を求められるので<rfdc.h>に書き変え
* rfdcのヘッダファイルを/usr/local/include/rfdcにコピー
* Makefileに CFLAGS = -I/usr/local/include/rfdc
を追加.これで,とりあえずビルドはできた.

!fpgautilがない
rftoolを動かしてみようとおもったら fpgautil がなかった.
 git clone https://github.com/Xilinx/meta-xilinx-tools.git
で,
 cd meta-xilinx-tools/recipes-bsp/fpga-manager-script/files
 make fpgautil
でよかった.

!ZCU106のベンチマーク
とりあえず BYTE UNIX Benchmarks

 Benchmark Run: Thu Dec 03 2020 05:46:53 - 06:14:57
 4 CPUs in system; running 1 parallel copy of tests
 
 Dhrystone 2 using register variables        6372358.5 lps   (10.0 s, 7 samples)
 Double-Precision Whetstone                     1156.6 MWIPS (9.8 s, 7 samples)
 Execl Throughput                               1674.6 lps   (30.0 s, 2 samples)
 File Copy 1024 bufsize 2000 maxblocks        168388.8 KBps  (30.0 s, 2 samples)
 File Copy 256 bufsize 500 maxblocks           50505.4 KBps  (30.0 s, 2 samples)
 File Copy 4096 bufsize 8000 maxblocks        438224.3 KBps  (30.0 s, 2 samples)
 Pipe Throughput                              397549.8 lps   (10.0 s, 7 samples)
 Pipe-based Context Switching                  73965.6 lps   (10.0 s, 7 samples)
 Process Creation                               4355.1 lps   (30.0 s, 2 samples)
 Shell Scripts (1 concurrent)                   2905.9 lpm   (60.0 s, 2 samples)
 Shell Scripts (8 concurrent)                    963.4 lpm   (60.0 s, 2 samples)
 System Call Overhead                         608372.4 lps   (10.0 s, 7 samples)
 
 System Benchmarks Index Values               BASELINE       RESULT    INDEX
 Dhrystone 2 using register variables         116700.0    6372358.5    546.0
 Double-Precision Whetstone                       55.0       1156.6    210.3
 Execl Throughput                                 43.0       1674.6    389.4
 File Copy 1024 bufsize 2000 maxblocks          3960.0     168388.8    425.2
 File Copy 256 bufsize 500 maxblocks            1655.0      50505.4    305.2
 File Copy 4096 bufsize 8000 maxblocks          5800.0     438224.3    755.6
 Pipe Throughput                               12440.0     397549.8    319.6
 Pipe-based Context Switching                   4000.0      73965.6    184.9
 Process Creation                                126.0       4355.1    345.6
 Shell Scripts (1 concurrent)                     42.4       2905.9    685.3
 Shell Scripts (8 concurrent)                      6.0        963.4   1605.7
 System Call Overhead                          15000.0     608372.4    405.6
                                                                    ========
 System Benchmarks Index Score                                         430.0
 
 ------------------------------------------------------------------------
 Benchmark Run: Thu Dec 03 2020 06:14:57 - 06:43:02
 4 CPUs in system; running 4 parallel copies of tests
 
 Dhrystone 2 using register variables       25487070.9 lps   (10.0 s, 7 samples)
 Double-Precision Whetstone                     4627.6 MWIPS (9.8 s, 7 samples)
 Execl Throughput                               6123.9 lps   (30.0 s, 2 samples)
 File Copy 1024 bufsize 2000 maxblocks        318714.5 KBps  (30.0 s, 2 samples)
 File Copy 256 bufsize 500 maxblocks           88220.4 KBps  (30.0 s, 2 samples)
 File Copy 4096 bufsize 8000 maxblocks        914718.4 KBps  (30.0 s, 2 samples)
 Pipe Throughput                             1596831.6 lps   (10.0 s, 7 samples)
 Pipe-based Context Switching                 287838.8 lps   (10.0 s, 7 samples)
 Process Creation                              12114.5 lps   (30.0 s, 2 samples)
 Shell Scripts (1 concurrent)                   7948.9 lpm   (60.0 s, 2 samples)
 Shell Scripts (8 concurrent)                   1031.4 lpm   (60.1 s, 2 samples)
 System Call Overhead                        2331185.6 lps   (10.0 s, 7 samples)
 
 System Benchmarks Index Values               BASELINE       RESULT    INDEX
 Dhrystone 2 using register variables         116700.0   25487070.9   2184.0
 Double-Precision Whetstone                       55.0       4627.6    841.4
 Execl Throughput                                 43.0       6123.9   1424.2
 File Copy 1024 bufsize 2000 maxblocks          3960.0     318714.5    804.8
 File Copy 256 bufsize 500 maxblocks            1655.0      88220.4    533.1
 File Copy 4096 bufsize 8000 maxblocks          5800.0     914718.4   1577.1
 Pipe Throughput                               12440.0    1596831.6   1283.6
 Pipe-based Context Switching                   4000.0     287838.8    719.6
 Process Creation                                126.0      12114.5    961.5
 Shell Scripts (1 concurrent)                     42.4       7948.9   1874.8
 Shell Scripts (8 concurrent)                      6.0       1031.4   1719.0
 System Call Overhead                          15000.0    2331185.6   1554.1
                                                                    ========
 System Benchmarks Index Score                                        1187.7
 
 user@zcu106:~/byte-unixbench/UnixBench$

STREAMは,

 user@zcu106:~/STREAM-master$ gcc -DSTREAM_ARRAY_SIZE=40000000 -O2 -fopenmp -o stream stream.c
 user@zcu106:~/STREAM-master$ ./stream
 -------------------------------------------------------------
 STREAM version $Revision: 5.10 $
 -------------------------------------------------------------
 This system uses 8 bytes per array element.
 -------------------------------------------------------------
 Array size = 40000000 (elements), Offset = 0 (elements)
 Memory per array = 305.2 MiB (= 0.3 GiB).
 Total memory required = 915.5 MiB (= 0.9 GiB).
 Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
 -------------------------------------------------------------
 Number of Threads requested = 4
 Number of Threads counted = 4
 -------------------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds.
 Each test below will take on the order of 119349 microseconds.
    (= 119349 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 -------------------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 -------------------------------------------------------------
 Function    Best Rate MB/s  Avg time     Min time     Max time
 Copy:            9058.9     0.071932     0.070649     0.074034
 Scale:           7858.5     0.084209     0.081440     0.090125
 Add:             7354.7     0.131362     0.130529     0.132072
 Triad:           5933.6     0.162834     0.161791     0.163859
 -------------------------------------------------------------
 Solution Validates: avg error less than 1.000000e-13 on all three arrays
 -------------------------------------------------------------
 user@zcu106:~/STREAM-master$ OMP_NUM_THREADS=1 ./stream
 -------------------------------------------------------------
 STREAM version $Revision: 5.10 $
 -------------------------------------------------------------
 This system uses 8 bytes per array element.
 -------------------------------------------------------------
 Array size = 40000000 (elements), Offset = 0 (elements)
 Memory per array = 305.2 MiB (= 0.3 GiB).
 Total memory required = 915.5 MiB (= 0.9 GiB).
 Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
 -------------------------------------------------------------
 Number of Threads requested = 1
 Number of Threads counted = 1
 -------------------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds.
 Each test below will take on the order of 298444 microseconds.
    (= 298444 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 -------------------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 -------------------------------------------------------------
 Function    Best Rate MB/s  Avg time     Min time     Max time
 Copy:            4044.8     0.163095     0.158229     0.170722
 Scale:           2209.2     0.291470     0.289698     0.293948
 Add:             2277.1     0.421664     0.421598     0.421831
 Triad:           1804.5     0.532179     0.532003     0.532374
 -------------------------------------------------------------
 Solution Validates: avg error less than 1.000000e-13 on all three arrays
 -------------------------------------------------------------
 user@zcu106:~/STREAM-master$ OMP_NUM_THREADS=2 ./stream
 -------------------------------------------------------------
 STREAM version $Revision: 5.10 $
 -------------------------------------------------------------
 This system uses 8 bytes per array element.
 -------------------------------------------------------------
 Array size = 40000000 (elements), Offset = 0 (elements)
 Memory per array = 305.2 MiB (= 0.3 GiB).
 Total memory required = 915.5 MiB (= 0.9 GiB).
 Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
 -------------------------------------------------------------
 Number of Threads requested = 2
 Number of Threads counted = 2
 -------------------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds.
 Each test below will take on the order of 160329 microseconds.
    (= 160329 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 -------------------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 -------------------------------------------------------------
 Function    Best Rate MB/s  Avg time     Min time     Max time
 Copy:            7377.2     0.090815     0.086754     0.093672
 Scale:           4324.7     0.150540     0.147987     0.152486
 Add:             4442.9     0.216218     0.216074     0.216463
 Triad:           3487.0     0.276336     0.275306     0.277016
 -------------------------------------------------------------
 Solution Validates: avg error less than 1.000000e-13 on all three arrays
 -------------------------------------------------------------