トップ 差分 一覧 Farm ソース 検索 ヘルプ PDF RSS ログイン

Diary/2020-12-3

Ubuntu/ZCU106

ZCU111と同じようにビルド.
スクリプト化した.https://github.com/miyo/build-zcu106-linux
あとは必要なものをSDカードに書くだけ.

SDカードの用意

先頭に200MくらいのFAT領域,残りにext4領域を作る
FAT領域のタイプはc(W95 FAT32 (LBA)),ext4の方は83(Linux)にセットする.
で,

mkfs.vfat -F 32 -n boot /dev/sdX1
mkfs.ext4 -L root /dev/sdX2

などとしてフォーマット.Xのとこは自分の環境にあわせる.

コピー

あとは,SDカードの先頭に作ったFATパーティションに ${WORK}/imageの中身をコピー.
二番目のパーティションにはQEMUで作ったルートパーティションを展開

USB-UART

Linuxだと/dev/ttyUSB{0,1,2,3}が見える./dev/ttyUSB0に接続する.
(ZCU111は/dev/ttyUSB1だったので注意)

rftoolをビルドしてみよう

petalinuxなプロジェクトからrftoolを実機にコピーしてビルドしてみることに

  • "rfdc.h"を求められるので<rfdc.h>に書き変え
  • rfdcのヘッダファイルを/usr/local/include/rfdcにコピー
  • Makefileに CFLAGS = -I/usr/local/include/rfdc

を追加.これで,とりあえずビルドはできた.

fpgautilがない

rftoolを動かしてみようとおもったら fpgautil がなかった.

git clone https://github.com/Xilinx/meta-xilinx-tools.git

で,

cd meta-xilinx-tools/recipes-bsp/fpga-manager-script/files
make fpgautil

でよかった.

ZCU106のベンチマーク

とりあえず BYTE UNIX Benchmarks

Benchmark Run: Thu Dec 03 2020 05:46:53 - 06:14:57
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        6372358.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1156.6 MWIPS (9.8 s, 7 samples)
Execl Throughput                               1674.6 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        168388.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           50505.4 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        438224.3 KBps  (30.0 s, 2 samples)
Pipe Throughput                              397549.8 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  73965.6 lps   (10.0 s, 7 samples)
Process Creation                               4355.1 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2905.9 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    963.4 lpm   (60.0 s, 2 samples)
System Call Overhead                         608372.4 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    6372358.5    546.0
Double-Precision Whetstone                       55.0       1156.6    210.3
Execl Throughput                                 43.0       1674.6    389.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     168388.8    425.2
File Copy 256 bufsize 500 maxblocks            1655.0      50505.4    305.2
File Copy 4096 bufsize 8000 maxblocks          5800.0     438224.3    755.6
Pipe Throughput                               12440.0     397549.8    319.6
Pipe-based Context Switching                   4000.0      73965.6    184.9
Process Creation                                126.0       4355.1    345.6
Shell Scripts (1 concurrent)                     42.4       2905.9    685.3
Shell Scripts (8 concurrent)                      6.0        963.4   1605.7
System Call Overhead                          15000.0     608372.4    405.6
                                                                   ========
System Benchmarks Index Score                                         430.0

------------------------------------------------------------------------
Benchmark Run: Thu Dec 03 2020 06:14:57 - 06:43:02
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables       25487070.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4627.6 MWIPS (9.8 s, 7 samples)
Execl Throughput                               6123.9 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        318714.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           88220.4 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        914718.4 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1596831.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 287838.8 lps   (10.0 s, 7 samples)
Process Creation                              12114.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   7948.9 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1031.4 lpm   (60.1 s, 2 samples)
System Call Overhead                        2331185.6 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   25487070.9   2184.0
Double-Precision Whetstone                       55.0       4627.6    841.4
Execl Throughput                                 43.0       6123.9   1424.2
File Copy 1024 bufsize 2000 maxblocks          3960.0     318714.5    804.8
File Copy 256 bufsize 500 maxblocks            1655.0      88220.4    533.1
File Copy 4096 bufsize 8000 maxblocks          5800.0     914718.4   1577.1
Pipe Throughput                               12440.0    1596831.6   1283.6
Pipe-based Context Switching                   4000.0     287838.8    719.6
Process Creation                                126.0      12114.5    961.5
Shell Scripts (1 concurrent)                     42.4       7948.9   1874.8
Shell Scripts (8 concurrent)                      6.0       1031.4   1719.0
System Call Overhead                          15000.0    2331185.6   1554.1
                                                                   ========
System Benchmarks Index Score                                        1187.7

user@zcu106:~/byte-unixbench/UnixBench$

STREAMは,

user@zcu106:~/STREAM-master$ gcc -DSTREAM_ARRAY_SIZE=40000000 -O2 -fopenmp -o stream stream.c
user@zcu106:~/STREAM-master$ ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 40000000 (elements), Offset = 0 (elements)
Memory per array = 305.2 MiB (= 0.3 GiB).
Total memory required = 915.5 MiB (= 0.9 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 119349 microseconds.
   (= 119349 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            9058.9     0.071932     0.070649     0.074034
Scale:           7858.5     0.084209     0.081440     0.090125
Add:             7354.7     0.131362     0.130529     0.132072
Triad:           5933.6     0.162834     0.161791     0.163859
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
user@zcu106:~/STREAM-master$ OMP_NUM_THREADS=1 ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 40000000 (elements), Offset = 0 (elements)
Memory per array = 305.2 MiB (= 0.3 GiB).
Total memory required = 915.5 MiB (= 0.9 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 298444 microseconds.
   (= 298444 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            4044.8     0.163095     0.158229     0.170722
Scale:           2209.2     0.291470     0.289698     0.293948
Add:             2277.1     0.421664     0.421598     0.421831
Triad:           1804.5     0.532179     0.532003     0.532374
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
user@zcu106:~/STREAM-master$ OMP_NUM_THREADS=2 ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 40000000 (elements), Offset = 0 (elements)
Memory per array = 305.2 MiB (= 0.3 GiB).
Total memory required = 915.5 MiB (= 0.9 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 2
Number of Threads counted = 2
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 160329 microseconds.
   (= 160329 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            7377.2     0.090815     0.086754     0.093672
Scale:           4324.7     0.150540     0.147987     0.152486
Add:             4442.9     0.216218     0.216074     0.216463
Triad:           3487.0     0.276336     0.275306     0.277016
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------