Broadcom VideoCore V QPU Instruction Set


VideoCore V QPU の存在

2015年7月発行のMagPi Issue 35において,Eben UptonがVideoCore Vの存在を明らかにした [1].この記事においてVideoCore Vの詳細は語られなかったが, IntelからPi Towersに転職したEric AnholtがVideoCore向けのオープンソースなドライバを 書いていることが明らかにされた.

2017年2月3日,Eric Anholtが自身のGitHubのmesaリポジトリにおいて,未完成の VideoCore V向けドライバを公開した.そのコミットの中には,VideoCore V QPUの 逆アセンブラのコード[2]や,VideoCore Vのアーキテクチャに 関する記述[3]が含まれている.

2017年7月6日,Eric AnholtがMesa-devメーリングリストにおいて,上記リポジトリの存在を アピールした[4].このメールにおいて,VideoCore IV QPUに 存在していたregfile A/BがVideoCore V QPUでは統合されること,また, Mesaで半精度浮動小数点数がサポートされる予定であることが述べられた.

本記事では,上記のディスアセンブラのコードから読み取れる VideoCore V QPUの特徴および命令セットを述べる.


VideCore Vのアーキテクチャ

VideoCore IVのV3Dのバージョンが2.1および2.6だった 2のに対し,VideoCore VのV3Dのバージョンは 3.3である. V3D 3.3では,QPUにMMUが追加され,OpenGL 3.1がサポートされている. QPUにMMUが追加されたことにより,メモリ入出力領域のアドレスが物理的に連続する必要が なくなり,また,不正なメモリ読み書きを検出しブロックすることも可能になった.


VideoCore V QPUの命令セット

VideoCore V QPUの命令には大きく分けてALU命令 (Table 1) と Branch命令 (Table 2) がある.

Table 1. ALU Instruction
Bit(s) Field Description Reference
63:58 op_mul Mul ALU opcode Table 4. Mul ALU opcode
57:53 sig Signaling bits Table 7. Instruction signal
52:46 cond ALU push/uniform(?)/insn. condition code Table 5. ALU push/uniform(?)/insn. cond
45 mm If 0, mul ALU writes to normal regfile, else special function register Table 8. Special function registers
44 ma If 0, add ALU writes to normal regfile, else special function register
43:38 waddr_m Write address for mul output
37:32 waddr_a Write address for add output
31:24 op_add Add ALU opcode Table 3. Add ALU opcode
23:21 mul_b Input mux control for B port of mul ALU Table 9. Mux
20:18 mul_a Input mux control for A port of mul ALU
17:15 add_b Input mux control for B port of add ALU
14:12 add_a Input mux control for A port of add ALU
11:6 raddr_a Read address A of register Table 8. Special function registers
5:0 raddr_b Read address B of register

Table 2. Branch Instruction
Bit(s) Field Description Reference
57:56 sig bit[57]=1 implies branch.
55:35 addr_low addr[23:3].
34:32 cond_br Branch condition. Table 6. Branch cond
31:24 addr_high addr[31:24].
22:21 msfign 0=none: Ignore multisample flags when determining branch condition.
1=p: If no multisample flags are set in the lane (a pixel in the FS, a vertex in the VS), ignore the lane's condition when computing the branch condition.
2=q: If no multisample flags are set in a 2x2 quad in the FS, ignore the quad's a/b conditions.
20:18 Reserved
17:15 bdu Selects how to compute the new uniforms pointer if the branch is taken. (ABS/REL implicitly load a uniform and use that)
0=abs, 1=rel, 2=link_reg, 3=regfile
14 ub If set, then udest determines how the uniform stream will branch, otherwise the uniform stream is left as is.
13:12 bdi Selects how to compute the new IP if the branch is taken.
0=abs, 1=rel, 2=link_reg, 3=regfile
11:6 raddr_a Read address of register Table 8. Special function registers
5:0 Reserved

Table 3. Add ALU opcode 2
Instruction opcode Descrtiption
fadd/faddnf 0-47
Reserved? 48-52
vfpack 53-55, 57-59, 61-63
add 56
sub 60
fsub 64-111
Reserved? 112-119
min 120
max 121
umin 122
umax 123
shl 124
shr 125
asr 126
ror 127
fmin/fmax 128-175
vfmin 176-180
and 181
or 182
xor 183
vadd 184
vsub 185
not 186 add_b=0.
neg add_b=1.
flapush add_b=2.
flbpush add_b=3.
flbpop add_b=4.
setmsf add_b=6.
setrevf add_b=7.
nop 187 add_b=0, add_a=0.
tidx add_b=0, add_a=1.
eidx add_b=0, add_a=2.
lr add_b=0, add_a=3.
vfla add_b=0, add_a=4.
vflna add_b=0, add_a=5.
vflb add_b=0, add_a=6.
vflnb add_b=0, add_a=7.
fxcd add_b=1, add_a=0-2.
xcd add_b=1, add_a=3.
fycd add_b=1, add_a=4-6.
ycd add_b=1, add_a=7.
msf add_b=2, add_a=0.
revf add_b=2, add_a=1.
vdwwt add_b=2, add_a=2.
tmuwt add_b=2, add_a=5.
vpmwt add_b=2, add_a=6.
vpmsetup add_b=3.
Reserved? 188-191
msf N/A
revf N/A
iid N/A
sampid N/A
patchid N/A
ldvpmv N/A
ldvpmd N/A
ldvpmp N/A
ldvpmg N/A
fcmp 192-239
vfmax 240-244
fround 245 add_b=0-2
ftoin add_b=3
ftrunc add_b=4-6
ftoiz add_b=7
ffloor 246 add_b=0-2
ftouz add_b=3
fceil add_b=4-6
ftoc add_b=7
fdx 247 add_b=0-2
fdy add_b=4-6
stvpmv/stvpmd/stvpmp 248 Distinguished by the waddr_a field.
Reserved? 249-251
itof 252 add_b=0-2
clz add_b=3
utof add_b=4-6
Reserved? 253-255

Table 4. Mul ALU opcode 2
Instruction opcode Descrtiption
add 1
sub 2
umul24 3
vfmul 4-8
smul24 9
multop 10
fmov 14
fmov 15 mul_b=0-3
mov mul_b=4, mul_a=0
nop mul_b=7
fmul 16-63

Table 5. ALU push/uniform(?)/insn. cond
ValueDescription
0b0000000None.
0b00000xxAdd ALU push condition.
0b000xxxxAdd ALU uniform? condition.
0b0010000Invalid.
0b00100xxMul ALU push condition.
0b001xxxxMul ALU uniform? condition.
0b010xxyyAdd ALU instruction condition (xx) and mul ALU push condition (yy).
0b011xxyyMul ALU instruction condition (xx) and add ALU push condition (yy).
0b1xx00yyMul ALU instruction condition (xx) and add ALU instruction condition (yy).
0b1xxyyyyMul ALU instruction condition (xx) and add ALU uniform? condition (yyyy).

Table 5.1. ALU push condition
ValueDescription
0None.
1PUSHZ
2PUSHN
3PUSHC

Table 5.2. ALU uniform? condition
ValueDescription
0Invalid.
1ANDZ
2ANDNZ
3NORNZ
4NORZ
5ANDNZ
6ANDNN
7NORNN
8NORN
9ANDC
10ANDNC
11NORNC
12NORC

Table 5.3. ALU instruction condition
ValueDescription
0IFA
1IFB
2IFNA
3IFNB

Table 6. Branch cond
Cond code Descrtiption
always 0
Reserved? 1
a0 2
na0 3
alla 4
anyna 5
anya 6
allna 7

Table 7. Instruction signal
ValueWrites toMisc
r3r4r5
0    
1   THRSW
2  LDUNIF 
3  LDUNIFTHRSW
4 LDTMU  
5 LDTMU THRSW
6 LDTMULDUNIF 
7 LDTMULDUNIFTHRSW
8LDVARY   
9LDVARY  THRSW
10LDVARY LDUNIF 
11LDVARY LDUNIFTHRSW
12LDVARYLDTMU  
13LDVARYLDTMU THRSW
14LDVARY  SMIMM
15   SMIMM
16LDTLB   
17LDTLBU   
18-21Reserved
22   UCB
23   ROT
24LDVPM   
25LDVPM  THRSW
26LDVPM LDUNIF 
27LDVPM LDUNIFTHRSW
28LDVPMLDTMU  
29LDVPMLDTMU THRSW
30LDVPM  SMIMM
31   SMIMM

Table 8. Special function registers
AddrName
0R0
1R1
2R2
3R3
4R4
5R5
6NOP
7TLB
8TLBU
9TMU
10TMUL
11TMUD
12TMUA
13TMUAU
14VPM
15VPMU
16SYNC
17SYNCU
18Reserved
19RECIP
20RSQRT
21EXP
22LOG
23SIN
24RSQRT2
25-31Reserved?

Table 9. Mux
ValueDesctiption
0R0
1R1
2R2
3R3
4R4
5R5
6A
7B

References

  1. Raspberry Pi Foundation, MagPi Issue 35 p.30-33, "Inside VideoCore", https://www.raspberrypi.org/magpi-issues/MagPi35.pdf , July 2015
  2. Eric Anholt, "broadcom: Add V3D 3.3 QPU instruction pack, unpack, and disasm.", https://github.com/anholt/mesa/commit/1fb588e76d62e8bee34cf248141f120fa59598a4 , 3 Feb 2017
  3. Eric Anholt, "broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.", https://github.com/anholt/mesa/commit/6b0929535f370c493a342f1439cbbdcc112e19cb , 3 Feb 2017
  4. Eric Anholt, "BCM7268 V3D3.3 ("vc5") driver release", https://lists.freedesktop.org/archives/mesa-dev/2017-July/162087.html , 6 July 2017

Footnotes

1. Raspberry Piに搭載されたVideoCore IVのV3Dのバージョンは 2.1である
2. If an operation doesn't use an arg or two, unused mux values may be used to identify the operation type.

Copyright notice

© 2017 Idein Inc. All rights reserved.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.