基础知识

通用寄存器:r0-r31, 32位寄存器的名称是w0-w31,64位寄存器的名称是x0-x31。其中

  • r31:SP|WSP
  • r30:LR
  • r29:FP
  • r19~28 callee preserved[all 64bits need preserved even using ILP32 modle!] # 使用前需保存值,使用完后还原值
  • r18:platform related(inter-procedural state or PIC)
  • r16=IP0 r17=IP1(intra-procedure-call)
  • r9~15 temporal # 临时寄存器
  • r8:indirect result location # 返回值的地址指针
  • r0~r7 parameter/result # 8个参数寄存器,把前8个参数放都寄存器中,如果超过8个,会压入栈中,压栈的顺序为从右向左。

SIMD寄存器:v0-v31。其中

  • v8-v15使用前需保存信息,使用完后还原值。
  • 其余寄存器随便使用。

关于指令前缀或者后缀

  • S 表示Signed
  • U 表示Unsigned
  • F 表示Float
  • P 表示Polynomial 或者 寄存器内部组对操作pairwise
  • V 表示Across,即对整个寄存器的操作
  • 2 一般表示高64位(64-128)的操作
  • H 表示半操作,即截取高位
  • N 表示窄化Narrow
  • L 表示宽化Long
  • 指令(element) element表示从寄存器的某个idx位置取数

图解网站

http://shell-storm.org/armv8-a/ISA_v85A_A64_xml_00bet8/xhtml/fpsimdindex.html

A64 – Base Instructions 通用寄存器

A64 – SIMD and Floating-point Instructions SIMD寄存器指令

逻辑运算和比较运算

AND、BIC、EOR、ORN 和 ORR (寄存器)(按位与、位清除、异或、或非、或(寄存器))

在两个寄存器之间执行按位逻辑运算,并将结果存放到目标寄存器中。

AND (vector): Bitwise AND (vector). 按位与。

BIC (vector, register): Bitwise bit Clear (vector, register). 位清除

EOR (vector): Bitwise Exclusive OR (vector). 按位异或

ORN (vector): Bitwise inclusive OR NOT (vector). 按位或非

ORR (vector, register): Bitwise inclusive OR (vector, register). 按位或(寄存器)

BIC 和 ORR (立即数)(按位位清除和或(立即数))

BIC (vector, immediate): Bitwise bit Clear (vector, immediate). 按位位清除(立即数)。 获取目标向量的每个元素,对其与一个立即数执行按位与求补运算,并将结果返回到目标向量。

ORR (vector, immediate): Bitwise inclusive OR (vector, immediate). 按位或(立即数)。获取目标向量的每个元素,对其与一个立即数执行按位或运算,并将结果返回到目标向量。

BIF、BIT 和 BSL (为 False 时按位插入,为 True 时按位插入以及按位选择)

BIT (为 True 时按位插入):如果第二个操作数的对应位为 1,则该指令将第一个操作数中的每一位插入目标中;否则将目标位保持不变。

BIF (为 False 时按位插入):如果第二个操作数的对应位为 0,则该指令将第一个操作数中的每一位插入目标中;否则将目标位保持不变。

BSL (按位选择):如果目标的对应位为 1,则该指令从第一个操作数中选择目标的每一位;如果目标的对应位为 0,则从第二个操作数中选择目标的每一位。

BIF (vector): Bitwise Insert if False. 为 False 时按位插入

BIT (vector): Bitwise Insert if True. 为 True 时按位插入

BSL (vector): Bitwise Select. 按位选择

CEQ、CGE、CGT、CLE 和 CLT (比较)

向量比较获取向量中每个元素的值,并将其与另一个向量中相应元素的值或零进行比较。 如果条件为 True,则将目标向量中的相应元素全部设置为 1。 否则,全部设置为 0。

CMEQ (register): Compare bitwise Equal (vector).

CMEQ (zero): Compare bitwise Equal to zero (vector).

CMGE (register): Compare signed Greater than or Equal (vector).

CMGE (zero): Compare signed Greater than or Equal to zero (vector).

CMGT (register): Compare signed Greater than (vector).

CMGT (zero): Compare signed Greater than zero (vector).

CMHI (register): Compare unsigned Higher (vector).

CMHS (register): Compare unsigned Higher or Same (vector).

CMLE (zero): Compare signed Less than or Equal to zero (vector).

CMLT (zero): Compare signed Less than zero (vector).

TST (测试位)

TST (向量测试位)获取向量中的每个元素,并将其与另一个向量中的相应元素执行按位逻辑“与”运算。 如果结果不为 0,则将目标向量中的相应元素全部设置为 1。 否则,全部设置为 0。

CMTST: Compare bitwise Test bits nonzero (vector). 测试位

其他位操作

RBIT (vector): Reverse Bit order (vector).

通用数据处理指令

CVT (在定点数或整数与浮点数之间)定点数或整数与浮点数之间的向量转换。

CVT (向量转换)按下列方式之一转换一个向量中的每个元素,并将结果存放
到目标向量中:

  • 浮点数到整数
  • 整数到浮点数
  • 浮点数到定点数
  • 定点数到浮点数

舍入

  • 整数或定点数到浮点数的转换使用向最接近的数舍入。
  • 浮点数到整数或定点数的转换使用向零舍入。

SCVTF (scalar, fixed-point): Signed fixed-point Convert to Floating-point (scalar).

SCVTF (scalar, integer): Signed integer Convert to Floating-point (scalar).

SCVTF (vector, fixed-point): Signed fixed-point Convert to Floating-point (vector).

SCVTF (vector, integer): Signed integer Convert to Floating-point (vector).

UCVTF (scalar, fixed-point): Unsigned fixed-point Convert to Floating-point (scalar).

UCVTF (scalar, integer): Unsigned integer Convert to Floating-point (scalar).

UCVTF (vector, fixed-point): Unsigned fixed-point Convert to Floating-point (vector).

UCVTF (vector, integer): Unsigned integer Convert to Floating-point (vector).

DUP 将标量复制到向量的所有向量线。

DUP (向量复制)将标量复制到目标向量的每个元素。 源可以是 NEON 标量或ARM 寄存器。

将一个立即数充满SIMD寄存器操作步骤:

mov	w0, #imm
dup v0.8h, w0

DUP (element): Duplicate vector element to vector or scalar.

DUP (general): Duplicate general-purpose register to vector.

EXT 提取。

EXT (向量提取)从第二个操作数向量的低位和第一个操作数的高位提取 8 位元素,将这些元素连接起来,并将结果存放到目标向量中。

armv8 架构中文手册 armv8 processor rev4 v81_aarch64

EXT: Extract vector from pair of vectors.

MOV、MVN (立即数) 移动和求反移动(立即数)。

MOV (向量移动)和 MVN (向量求反移动)(立即数)生成一个立即数,并将结果存放到目标寄存器。

向量移动(寄存器)将源寄存器中的值复制到目标寄存器中。

向量求反移动(寄存器)对源寄存器中每一位的值执行求反运算,并将结果存放到目标寄存器中。

MOV (element): Move vector element to another vector element: an alias of INS (element).

MOV (from general): Move general-purpose register to a vector element: an alias of INS (general).

MOV (scalar): Move vector element to scalar: an alias of DUP (element).

MOV (to general): Move vector element to general-purpose register: an alias of UMOV.

MOV (vector): Move vector: an alias of ORR (vector, register).

MOVI Move immediate

MVN: Bitwise NOT (vector): an alias of NOT. 求反移动

NOT: Bitwise NOT (vector).

MVNI: Move inverted Immediate (vector).

XTN、XTL 向量宽化和窄化

SXTL, SXTL2: Signed extend Long: an alias of SSHLL, SSHLL2.

UXTL, UXTL2: Unsigned extend Long: an alias of USHLL, USHLL2.

XTN, XTN2: Extract Narrow.

SQXTN, SQXTN2: Signed saturating extract Narrow.

SQXTUN, SQXTUN2: Signed saturating extract Unsigned Narrow.

UQXTN, UQXTN2: Unsigned saturating extract Narrow.

通用寄存器和SIMD寄存器交互

INS (element): Insert vector element from another vector element.

INS (general): Insert vector element from general-purpose register.

SMOV: Signed Move vector element to general-purpose register.

UMOV: Unsigned Move vector element to general-purpose register.

REV 反转向量中的元素。

REV16 (向量在半字中反转)反转向量每个半字中的 8 位元素的顺序,并将结果存放到对应的目标向量中。

REV32 (向量在字中反转)反转向量每个字中的 8 位或 16 位元素的顺序,并将结果存放到对应的目标向量中。

REV64 (向量在双字中反转)反转向量每个双字中的 8 位、16 位或 32 位元素的顺序,并将结果存放到对应的目标向量中。

REV16 (vector): Reverse elements in 16-bit halfwords (vector).

REV32 (vector): Reverse elements in 32-bit words (vector).

REV64 (vector): Reverse elements in 64-bit doublewords (vector).

TBL、TBX 向量表查找。

TBL (vector): Table vector Lookup. (向量表查找)使用控制向量中的字节索引在表中查找字节值,并生成一个新的向量。 如果索引超出范围,则返回 0。

TBX (vector): Table vector lookup extension. (向量表扩展)的用法与上一指令相同,但索引超出范围时目标元素将保持不变。

TRN 向量转置。

TRN1 (vector) Transpose vectors (primary)

TRN2 (vector) Transpose vectors (secondary)

(向量转置)将其操作数向量的元素视为 2 x 2 矩阵的元素,并对此类矩阵进行转置。

armv8 架构中文手册 armv8 processor rev4 v81_armv8 架构中文手册_02

UZP、ZIP 向量交叉存取和反向交叉存取。

armv8 架构中文手册 armv8 processor rev4 v81_aarch64_03


ZIP (向量压缩)交叉存取两个向量的元素。

UZP (向量解压缩)反向交叉存取两个向量的元素。

UZP1 (vector) Unzip vectors (primary)

UZP2 (vector) Unzip vectors (secondary)

ZIP1 (vector) Zip vectors (primary)

ZIP2 (vector) Zip vectors (secondary)

移位指令

SHL、QSHL、QSHLU 和 SHLL (按立即数) 按立即值左移。

向量左移(按立即数)指令获取整数向量中的每个元素,按立即值对其进行左移,并将结果存放到目标向量中。

对于 SHL (向量左移),每个元素中从左侧移出的位将丢失。

对于 QSHL (向量饱和左移)和 QSHLU (向量无符号饱和左移),如果发生饱和,则设置粘性 QC 标记。

对于 SHLL (向量长型左移),将使用符号或零对值进行扩展。

SHL (vector) Shift left (immediate)

SQSHL (vector, immediate) Signed saturating shift left (immediate)

SQSHL (vector, register) Signed saturating shift left (register)

UQSHL (vector, immediate) Unsigned saturating shift left (immediate)

UQSHL (vector, register) Unsigned saturating shift left (register)

SQSHLU (vector) Signed saturating shift left unsigned (immediate)

SSHLL, SSHLL2 (vector) Signed shift left long (immediate)

USHLL, USHLL2 (vector) Unsigned shift left long (immediate)

{Q}{R}SHL (按有符号变量) 按有符号变量左移。

{Q}{R}SHL (按有符号变量)

SHL (向量按有符号变量左移)获取一个向量中的每个元素,按另一个向量的相应元素的最低有效字节中的值对其进行移位,并将结果存放到目标向量中。如果移位值为正数,则该运算为左移。 否则为右移。

可以选择对结果执行饱和或舍入运算,或者同时执行这两种运算。 如果发生饱和,则会设置粘性 QC 标记。

SSHL (vector) Signed shift left (register)

USHL (vector) Unsigned shift left (register)

SQSHL (vector, immediate) Signed saturating shift left (immediate)

SQSHL (vector, register) Signed saturating shift left (register)

UQSHL (vector, immediate) Unsigned saturating shift left (immediate)

UQSHL (vector, register) Unsigned saturating shift left (register)

SRSHL (vector) Signed rounding shift left (register)

URSHL (vector) Unsigned rounding shift left (register)

SQRSHL (vector) Signed saturating rounding shift left (register)

UQRSHL (vector) Unsigned saturating rounding shift left (register)

{R}SHR{N}、{R}SRA (按立即数) 按立即值右移。

{R}SHR{N}、{R}SRA (按立即数)

{R}SHR{N} (向量按立即值右移)获取向量中的每个元素,按立即值对其进行右移,并将结果存放到目标向量中。 可以选择对结果执行舍入或窄型运算,或者同时执行这两种运算。

{R}SRA (向量按立即值右移并累加)获取向量中的每个元素,按立即值对其进行右移,并将结果累加到目标向量中。 可以选择对结果进行舍入。

SSHR (vector) Signed shift right (immediate)

USHR (vector) Unsigned shift right (immediate)

SHRN, SHRN2 (vector) Shift right narrow (immediate)

SRSHR (vector) Signed rounding shift right (immediate)

URSHR (vector) Unsigned rounding shift right (immediate)

RSHRN, RSHRN2 (vector) Rounding shift right narrow (immediate)

SSRA (vector) Signed shift right and accumulate (immediate)

USRA (vector) Unsigned shift right and accumulate (immediate)

SRSRA (vector) Signed rounding shift right and accumulate (immediate)

URSRA (vector) Unsigned rounding shift right and accumulate (immediate)

Q{R}SHR{U}N (按立即数) 按立即值右移并进行饱和。

Q{R}SHR{U}N (按立即数)

Q{R}SHR{U}N (向量饱和右移、窄型、按立即值,可选舍入)获取整数四字向量中的每个元素,按立即值对其进行右移,并将结果存放到双字向量中。

如果发生饱和,则会设置粘性 QC 标记。

SQSHRN, SQSHRN2 (vector) Signed saturating shift right narrow (immediate)

UQSHRN, UQSHRN2 (vector) Unsigned saturating shift right narrow (immediate)

SQRSHRN, SQRSHRN2 (vector) Signed saturating rounded shift right narrow (immediate)

UQRSHRN, UQRSHRN2 (vector) Unsigned saturating rounded shift right narrow (immediate)

SQRSHRUN, SQRSHRUN2 (vector) Signed saturating rounded shift right unsigned narrow (immediate)

SQSHRUN, SQSHRUN2 (vector) Signed saturating shift right unsigned narrow (immediate)

SLI 和 SRI 左移并插入,右移并插入。

SLI (向量左移并插入)获取向量中的每个元素,按立即值对其进行左移,并将结果插入目标向量中。 每个元素中从左侧移出的位将丢失。

SRI (向量右移并插入)获取向量中的每个元素,按立即值对其进行右移,并将结果插入目标向量中。 每个元素中从最右侧移出的位将丢失。

SLI (vector) Shift left and insert (immediate)

SRI (vector) Shift right and insert (immediate)

通用算术指令

ABA{L} 和 ABD{L} 向量差值绝对值累加 和 差值绝对值。

ABA (向量差值绝对值累加)用一个向量的元素减去另一个向量的相应元素,并将结果的绝对值累加到目标向量的元素中。

ABD (向量差值绝对值)用一个向量的元素减去另一个向量的相应元素,并将结果的绝对值存放到目标向量的元素中。

这两个指令的长型格式都可用。

SABA (vector) Signed absolute difference and accumulate

SABAL, SABAL2 (vector) Signed absolute difference and accumulate long

UABA (vector) Unsigned absolute difference and accumulate

UABAL, UABAL2 (vector) Unsigned absolute difference and accumulate long

SABD (vector) Signed absolute difference

SABDL, SABDL2 (vector) Signed absolute difference long

UABD (vector) Unsigned absolute difference

UABDL, UABDL2 (vector) Unsigned absolute difference long

{Q}ABS 和 {Q}NEG 向量绝对值和求反。

ABS (向量绝对值)获取一个向量中每个元素的绝对值,并将结果存放到另一个向量中。 (对于浮点格式,仅清除符号位。)

NEG (向量求反)对一个向量中的每个元素执行求反运算,并将结果存放到另一个向量中。 (对于浮点格式,仅反转符号位。)

这两个指令的饱和格式都可用。 如果发生饱和,则会设置粘性 QC 标记(FPSCR 位 [27])。

ABS (vector) Absolute value

SQABS (vector) Signed saturating absolute value

NEG (vector) Negate

SQNEG (vector) Signed saturating negate

{Q}ADD、ADDL、ADDW、{Q}SUB、SUBL 和 SUBW 向量加法和减法。

ADD (向量加法)将两个向量中的相应元素相加,并将结果存放到目标向量中。

SUB (向量减法)用一个向量的元素减去另一个向量的相应元素,并将结果存放到目标向量中。

饱和、长型和宽型格式都可用。 如果发生饱和,则会设置粘性 QC 标记(FPSCR 位 [27])。

ADD (vector) Add

SQADD (vector) Signed saturating add

UQADD (vector) Unsigned saturating add

SADDL, SADDL2 (vector) Signed add long

UADDL, UADDL2 (vector) Unsigned add long

SADDW, SADDW2 (vector) Signed add wide

UADDW, UADDW2 (vector) Unsigned add wide

SUB (vector) Subtract

SQSUB (vector) Signed saturating subtract

UQSUB (vector) Unsigned saturating subtract

SSUBL, SSUBL2 (vector) Signed subtract long

USUBL, USUBL2 (vector) Unsigned subtract long

SSUBW, SSUBW2 (vector) Signed subtract wide

USUBW, USUBW2 (vector) Unsigned subtract wide

{R}ADDHN 和 {R}SUBHN 选择高半部分的向量加法和选择高半部分的向量减法。

{R}ADDH (向量窄型加法,选择高半部分)将两个向量中的相应元素相加,选择相加结果的最高有效半部,并将最终结果存放到目标向量中。 可将结果舍入或截断。

{R}SUBH (向量窄型减法,选择高半部分)用一个向量的元素减去另一个向量的相应元素,选择相减结果的最高有效半部,并将最终结果存放到目标向量中。 可将结果舍入或截断。

ADDHN, ADDHN2 (vector) Add returning high narrow

RADDHN, RADDHN2 (vector) Rounding add returning high narrow

SUBHN, SUBHN2 (vector) Subtract returning high narrow

RSUBHN, RSUBHN2 (vector) Rounding subtract returning high narrow

{R}HADD 和 HSUB 向量半加和半减。

HADD (向量半加)将两个向量中的相应元素相加,将每个结果右移一位,并将这些结果存放到目标向量中。 可将结果舍入或截断。

HSUB (向量半减)用一个向量的元素减去另一个向量的相应元素,将每个结果右移一位,并将这些结果存放到目标向量中。 结果将总是被截断。

SHADD (vector) Signed halving add

UHADD (vector) Unsigned halving add

SRHADD (vector) Signed rounding halving add

URHADD (vector) Unsigned rounding halving add

SHSUB (vector) Signed halving subtract

UHSUB (vector) Unsigned halving subtract

ADDP{L}、ADALP 向量按对加,向量按对加并累加。

ADDP (向量按对加)将两个向量的相邻元素对相加,并将结果存放到目标向量中。

ADDLP (向量长型按对加)将向量中相邻的元素对相加,用符号或零将结果扩展为原宽度的两倍,并将最终结果存放到目标向量中。

ADALP (向量长型按对加累加)将向量中相邻的元素对相加,并将结果的绝对值累加到目标向量的元素中。

ADDP (vector) Add pairwise

SADDLP (vector) Signed add long pairwise

UADDLP (vector) Unsigned add long pairwise

SADALP (vector) Signed add and accumulate long pairwise

UADALP (vector) Unsigned add and accumulate long pairwise

无符号、有符号加

SUQADD (vector) Signed saturating accumulate of unsigned value

USQADD (vector) Unsigned saturating accumulate of signed value

MAX、MIN、PMAX 和 PMIN 向量最大值,向量最小值,向量按对最大值和向量按对最小值。

MAX (向量最大值)对两个向量中的相应元素进行比较,并将每一对中的较大值复制到目标向量的相应元素中。

MIN (向量最小值)对两个向量中的相应元素进行比较,并将每一对中的较小值复制到目标向量的相应元素中。

PMAX (向量按对最大值)对两个向量中的相邻元素对进行比较,并将每一对中的较大值复制到目标向量的相应元素中。 操作数和结果必须为双字向量。

PMIN (向量按对最小值)对两个向量中的相邻元素对进行比较,并将每一对中的较小值复制到目标向量的相应元素中。 操作数和结果必须为双字向量。

有关按对运算的图示,请参阅第5-63 页的图5-5。

浮点最大值和最小值:max(+0.0, –0.0) = +0.0,min(+0.0, –0.0) = –0.0
如果任意输入为非数字,则对应的结果元素为缺省非数字。

SMAX (vector) Signed maximum

UMAX (vector) Unsigned maximum

SMIN (vector) Signed minimum

UMIN (vector) Unsigned minimum

SMAXP (vector) Signed maximum pairwise

UMAXP (vector) Unsigned maximum pairwise

SMINP (vector) Signed minimum pairwise

UMINP (vector) Unsigned minimum pairwise

V操作

求得向量中的总和、最值

ADDV (vector) Add across vector

SADDLV (vector) Signed add long across vector

UADDLV (vector) Unsigned sum long across vector

SMAXV (vector) Signed maximum across vector

UMAXV (vector) Unsigned maximum across vector

SMINV (vector) Signed minimum across vector

UMINV (vector) Unsigned minimum across vector

CLS、CLZ 和 CNT 向量前导符号位计数,前导零计数和设置位计数。

CLS (向量前导符号位计数)计算一个向量的每个元素中最高位后面与最高位相同的连续位数目,并将结果存放到另一个向量中。

CLZ (向量前导零计数)计算一个向量的每个元素中从最高位开始算起的连续零数目,并将结果存放到另一个向量中。

CNT (向量设置位计数)计算一个向量的每个元素中值为 1 的位的数目,并将结果存放到另一个向量中。

CLS (vector) Count leading sign bits

CLZ (vector) Count leading zero bits

CNT (vector) Population count per byte

RECPE 和 RSQRTE 向量近似倒数和近似平方根倒数。

RECPE (向量近似倒数)求出一个向量中每个元素的近似倒数,并将结果存放到另一个向量中。

RSQRTE (向量近似平方根倒数)求出一个向量中每个元素的近似平方根倒数,并将结果存放到另一个向量中。

URECPE (vector) Unsigned reciprocal estimate

URSQRTE (vector) Unsigned reciprocal square root estimate

乘法指令

MUL{L}、MLA{L} 和 MLS{L} 向量乘法、向量乘加和向量乘减。

MUL (向量乘法)将两个向量中的相应元素相乘,并将结果存放到目标向量中。

MLA (向量乘加)将两个向量中的相应元素相乘,并将结果累加到目标向量的元素中。

MLS (向量乘减)将两个向量中的相应元素相乘,从目标向量的相应元素中减去相乘的结果,并将最终结果放入目标向量中。

MUL (vector): Multiply (vector).

SMULL, SMULL2 (vector): Signed Multiply Long (vector).

UMULL, UMULL2 (vector): Unsigned Multiply long (vector).

MLA (vector): Multiply-Add to accumulator (vector).

SMLAL, SMLAL2 (vector): Signed Multiply-Add Long (vector).

UMLAL, UMLAL2 (vector): Unsigned Multiply-Add Long (vector).

MLS (vector): Multiply-Subtract from accumulator (vector).

SMLSL, SMLSL2 (vector): Signed Multiply-Subtract Long (vector).

UMLSL, UMLSL2 (vector): Unsigned Multiply-Subtract Long (vector).

MUL{L}、MLA{L} 和 MLS{L} (按标量) 向量乘法、向量乘加和向量乘减(按标量)。

MUL (向量乘以标量)将向量中的每个元素乘以标量,并将结果放入目标向量中。

MLA (向量乘加)将向量中的每个元素乘以标量,并将结果累加到目标向量的相应元素中。

MLS (向量乘减)将向量中的每个元素乘以标量,然后从目标向量的相应元素中减去相乘的结果,并将最终结果放入目标向量中。

MUL (by element): Multiply (vector, by element).

SMULL, SMULL2 (by element): Signed Multiply Long (vector, by element).

UMULL, UMULL2 (by element): Unsigned Multiply Long (vector, by element).

MLA (by element): Multiply-Add to accumulator (vector, by element).

SMLAL, SMLAL2 (by element): Signed Multiply-Add Long (vector, by element).

UMLAL, UMLAL2 (by element): Unsigned Multiply-Add Long (vector, by element).

MLS (by element): Multiply-Subtract from accumulator (vector, by element).

SMLSL, SMLSL2 (by element): Signed Multiply-Subtract Long (vector, by element).

UMLSL, UMLSL2 (by element): Unsigned Multiply-Subtract Long (vector, by element).

QDMULL、QDMLAL 和 QDMLSL (按向量或标量) 向量饱和加倍乘法、向量乘加和向量乘减(按向量或标量)

向量饱和加倍乘法指令将其操作数相乘并将结果加倍。VQDMULL 将结果存放到目标寄存器中。VQDMLAL 将结果与目标寄存器中的值相加。VQDMLSL 用目标寄存器中的值减去结果。

如果任意结果溢出,则会对其进行饱和。 如果发生饱和,则会设置粘性 QC 标记(FPSCR 位 [27])。

SQDMULL, SQDMULL2 (by element): Signed saturating Doubling Multiply Long (by element).

SQDMULL, SQDMULL2 (vector): Signed saturating Doubling Multiply Long.

SQDMLAL, SQDMLAL2 (by element): Signed saturating Doubling Multiply-Add Long (by element).

SQDMLAL, SQDMLAL2 (vector): Signed saturating Doubling Multiply-Add Long.

SQDMLSL, SQDMLSL2 (by element): Signed saturating Doubling Multiply-Subtract Long (by element).

SQDMLSL, SQDMLSL2 (vector): Signed saturating Doubling Multiply-Subtract Long.

Q{R}DMULH (按向量或标量) 返回高半部分的向量饱和加倍乘法(按向量或标量)。

向量饱和加倍乘法指令将其操作数相乘并将结果加倍。 此类指令仅返回结果的高半部分。

如果任意结果溢出,则会对其进行饱和。 如果发生饱和,则会设置粘性 QC 标记(FPSCR 位 [27])。

SQDMULH (by element): Signed saturating Doubling Multiply returning High half (by element).

SQDMULH (vector): Signed saturating Doubling Multiply returning High half.

SQRDMULH (by element): Signed saturating Rounding Doubling Multiply returning High half (by element).

SQRDMULH (vector): Signed saturating Rounding Doubling Multiply returning High half.

SQRDMLAH (by element): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element).

SQRDMLAH (vector): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector).

SQRDMLSH (by element): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element).

SQRDMLSH (vector): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector).

多项式乘法

PMUL: Polynomial Multiply.

PMULL, PMULL2: Polynomial Multiply Long.

点积Dot

SDOT (by element): Dot Product signed arithmetic (vector, by element).

SDOT (vector): Dot Product signed arithmetic (vector).

UDOT (by element): Dot Product unsigned arithmetic (vector, by element).

UDOT (vector): Dot Product unsigned arithmetic (vector).

加载/存储

LDn 和 STn (单个 n 元素结构到一条向量线) 此类指令几乎可用于所有数据访问。 可加载标准向量 (n = 1)。

向量加载单个 n 元素结构到一条向量线。 它将一个 n 元素结构从内存加载到一个或多个 NEON 寄存器。 未加载的寄存器元素将保持不变。

向量存储单个 n 元素结构到一条向量线。 它将一个 n 元素结构从一个或多个NEON 寄存器存储到内存中。

LD1 (single structure): Load one single-element structure to one lane of one register.

LD2 (single structure): Load single 2-element structure to one lane of two registers.

LD3 (single structure): Load single 3-element structure to one lane of three registers).

LD4 (single structure): Load single 4-element structure to one lane of four registers.

ST1 (single structure): Store a single-element structure from one lane of one register.

ST2 (single structure): Store single 2-element structure from one lane of two registers.

ST3 (single structure): Store single 3-element structure from one lane of three registers.

ST4 (single structure): Store single 4-element structure from one lane of four registers.

VLDn (单个 n 元素结构到所有向量线)

向量加载单个 n 元素结构到所有向量线。 它将一个 n 元素结构的多个副本从内存加载到一个或多个 NEON 寄存器。

LD1R: Load one single-element structure and Replicate to all lanes (of one register).

LD2R: Load single 2-element structure and Replicate to all lanes of two registers.

LD3R: Load single 3-element structure and Replicate to all lanes of three registers.

LD4R: Load single 4-element structure and Replicate to all lanes of four registers.

VLDn 和 VSTn (多个 n 元素结构)

向量加载多个 n 元素结构。 它使用反向交叉存取功能,将多个 n 元素结构从内存加载到一个或多个 NEON 寄存器中(除非 n == 1)。 会加载每个寄存器的每个元素。

向量存储多个 n 元素结构。 它使用交叉存取功能,将多个 n 元素结构从一个或多个 NEON 寄存器存储到内存中(除非 n == 1)。 会存储每个寄存器的每个元素。

LD1 (multiple structures): Load multiple single-element structures to one, two, three, or four registers.

LD2 (multiple structures): Load multiple 2-element structures to two registers.

LD3 (multiple structures): Load multiple 3-element structures to three registers.

LD4 (multiple structures): Load multiple 4-element structures to four registers.

ST1 (multiple structures): Store multiple single-element structures from one, two, three, or four registers.

ST2 (multiple structures): Store multiple 2-element structures from two registers.

ST3 (multiple structures): Store multiple 3-element structures from three registers.

ST4 (multiple structures): Store multiple 4-element structures from four registers.

NEON 和 VFP 伪指令

VLDR 伪指令(NEON 和 VFP)

VLDR 伪指令将一个常数值加载到 64 位 NEON 向量的每个元素,或者加载到 VFP单精度或双精度寄存器。

如果某一指令(如 VMOV)可用于直接将常数生成到寄存器中,则汇编器将使用该指令。 否则,汇编器生成一个包含常数的双字文字池条目,并使用 VLDR 指令加载该常数。

LDR (literal, SIMD&FP): Load SIMD&FP Register (PC-relative literal).

VLDR 和 VSTR (后增量和前增量)(NEON 和 VFP)

使用后增量和前增量加载或存储扩展寄存器的伪指令。

有关不使用后增量和前增量的 VLDR 和 VSTR 指令的信息,请参阅第5-23 页的 VLDR 和 VSTR。

后增量指令在传送后按偏移量的值递增寄存器中的基址。 前增量指令按偏移量的值递减寄存器中的基址,然后使用寄存器中的新地址执行传送。 这些伪指令汇编为 VLDM 或 VSTM 指令(请参阅第5-24 页的VLDM、VSTM、VPOP 和VPUSH)。

LDR (immediate, SIMD&FP): Load SIMD&FP Register (immediate offset).

LDR (register, SIMD&FP): Load SIMD&FP Register (register offset).

STR (immediate, SIMD&FP): Store SIMD&FP register (immediate offset).

STR (register, SIMD&FP): Store SIMD&FP register (register offset).

浮点运算

FABD: Floating-point Absolute Difference (vector).

FABS (scalar): Floating-point Absolute value (scalar).

FABS (vector): Floating-point Absolute value (vector).

FACGE: Floating-point Absolute Compare Greater than or Equal (vector).

FACGT: Floating-point Absolute Compare Greater than (vector).

FADD (scalar): Floating-point Add (scalar).

FADD (vector): Floating-point Add (vector).

FADDP (scalar): Floating-point Add Pair of elements (scalar).

FADDP (vector): Floating-point Add Pairwise (vector).

FCADD: Floating-point Complex Add.

FCCMP: Floating-point Conditional quiet Compare (scalar).

FCCMPE: Floating-point Conditional signaling Compare (scalar).

FCMEQ (register): Floating-point Compare Equal (vector).

FCMEQ (zero): Floating-point Compare Equal to zero (vector).

FCMGE (register): Floating-point Compare Greater than or Equal (vector).

FCMGE (zero): Floating-point Compare Greater than or Equal to zero (vector).

FCMGT (register): Floating-point Compare Greater than (vector).

FCMGT (zero): Floating-point Compare Greater than zero (vector).

FCMLA: Floating-point Complex Multiply Accumulate.

FCMLA (by element): Floating-point Complex Multiply Accumulate (by element).

FCMLE (zero): Floating-point Compare Less than or Equal to zero (vector).

FCMLT (zero): Floating-point Compare Less than zero (vector).

FCMP: Floating-point quiet Compare (scalar).

FCMPE: Floating-point signaling Compare (scalar).

FCSEL: Floating-point Conditional Select (scalar).

FCVT: Floating-point Convert precision (scalar).

FCVTAS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar).

FCVTAS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector).

FCVTAU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar).

FCVTAU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector).

FCVTL, FCVTL2: Floating-point Convert to higher precision Long (vector).

FCVTMS (scalar): Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar).

FCVTMS (vector): Floating-point Convert to Signed integer, rounding toward Minus infinity (vector).

FCVTMU (scalar): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar).

FCVTMU (vector): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector).

FCVTN, FCVTN2: Floating-point Convert to lower precision Narrow (vector).

FCVTNS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar).

FCVTNS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector).

FCVTNU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar).

FCVTNU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector).

FCVTPS (scalar): Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar).

FCVTPS (vector): Floating-point Convert to Signed integer, rounding toward Plus infinity (vector).

FCVTPU (scalar): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar).

FCVTPU (vector): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector).

FCVTXN, FCVTXN2: Floating-point Convert to lower precision Narrow, rounding to odd (vector).

FCVTZS (scalar, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar).

FCVTZS (scalar, integer): Floating-point Convert to Signed integer, rounding toward Zero (scalar).

FCVTZS (vector, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (vector).

FCVTZS (vector, integer): Floating-point Convert to Signed integer, rounding toward Zero (vector).

FCVTZU (scalar, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar).

FCVTZU (scalar, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (scalar).

FCVTZU (vector, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector).

FCVTZU (vector, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (vector).

FDIV (scalar): Floating-point Divide (scalar).

FDIV (vector): Floating-point Divide (vector).

FJCVTZS: Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero.

FMADD: Floating-point fused Multiply-Add (scalar).

FMAX (scalar): Floating-point Maximum (scalar).

FMAX (vector): Floating-point Maximum (vector).

FMAXNM (scalar): Floating-point Maximum Number (scalar).

FMAXNM (vector): Floating-point Maximum Number (vector).

FMAXNMP (scalar): Floating-point Maximum Number of Pair of elements (scalar).

FMAXNMP (vector): Floating-point Maximum Number Pairwise (vector).

FMAXNMV: Floating-point Maximum Number across Vector.

FMAXP (scalar): Floating-point Maximum of Pair of elements (scalar).

FMAXP (vector): Floating-point Maximum Pairwise (vector).

FMAXV: Floating-point Maximum across Vector.

FMIN (scalar): Floating-point Minimum (scalar).

FMIN (vector): Floating-point minimum (vector).

FMINNM (scalar): Floating-point Minimum Number (scalar).

FMINNM (vector): Floating-point Minimum Number (vector).

FMINNMP (scalar): Floating-point Minimum Number of Pair of elements (scalar).

FMINNMP (vector): Floating-point Minimum Number Pairwise (vector).

FMINNMV: Floating-point Minimum Number across Vector.

FMINP (scalar): Floating-point Minimum of Pair of elements (scalar).

FMINP (vector): Floating-point Minimum Pairwise (vector).

FMINV: Floating-point Minimum across Vector.

FMLA (by element): Floating-point fused Multiply-Add to accumulator (by element).

FMLA (vector): Floating-point fused Multiply-Add to accumulator (vector).

FMLAL, FMLAL2 (by element): Floating-point fused Multiply-Add Long to accumulator (by element).

FMLAL, FMLAL2 (vector): Floating-point fused Multiply-Add Long to accumulator (vector).

FMLS (by element): Floating-point fused Multiply-Subtract from accumulator (by element).

FMLS (vector): Floating-point fused Multiply-Subtract from accumulator (vector).

FMLSL, FMLSL2 (by element): Floating-point fused Multiply-Subtract Long from accumulator (by element).

FMLSL, FMLSL2 (vector): Floating-point fused Multiply-Subtract Long from accumulator (vector).

FMOV (general): Floating-point Move to or from general-purpose register without conversion.

FMOV (register): Floating-point Move register without conversion.

FMOV (scalar, immediate): Floating-point move immediate (scalar).

FMOV (vector, immediate): Floating-point move immediate (vector).

FMSUB: Floating-point Fused Multiply-Subtract (scalar).

FMUL (by element): Floating-point Multiply (by element).

FMUL (scalar): Floating-point Multiply (scalar).

FMUL (vector): Floating-point Multiply (vector).

FMULX: Floating-point Multiply extended.

FMULX (by element): Floating-point Multiply extended (by element).

FNEG (scalar): Floating-point Negate (scalar).

FNEG (vector): Floating-point Negate (vector).

FNMADD: Floating-point Negated fused Multiply-Add (scalar).

FNMSUB: Floating-point Negated fused Multiply-Subtract (scalar).

FNMUL (scalar): Floating-point Multiply-Negate (scalar).

FRECPE: Floating-point Reciprocal Estimate.

FRECPS: Floating-point Reciprocal Step.

FRECPX: Floating-point Reciprocal exponent (scalar).

FRINTA (scalar): Floating-point Round to Integral, to nearest with ties to Away (scalar).

FRINTA (vector): Floating-point Round to Integral, to nearest with ties to Away (vector).

FRINTI (scalar): Floating-point Round to Integral, using current rounding mode (scalar).

FRINTI (vector): Floating-point Round to Integral, using current rounding mode (vector).

FRINTM (scalar): Floating-point Round to Integral, toward Minus infinity (scalar).

FRINTM (vector): Floating-point Round to Integral, toward Minus infinity (vector).

FRINTN (scalar): Floating-point Round to Integral, to nearest with ties to even (scalar).

FRINTN (vector): Floating-point Round to Integral, to nearest with ties to even (vector).

FRINTP (scalar): Floating-point Round to Integral, toward Plus infinity (scalar).

FRINTP (vector): Floating-point Round to Integral, toward Plus infinity (vector).

FRINTX (scalar): Floating-point Round to Integral exact, using current rounding mode (scalar).

FRINTX (vector): Floating-point Round to Integral exact, using current rounding mode (vector).

FRINTZ (scalar): Floating-point Round to Integral, toward Zero (scalar).

FRINTZ (vector): Floating-point Round to Integral, toward Zero (vector).

FRSQRTE: Floating-point Reciprocal Square Root Estimate.

FRSQRTS: Floating-point Reciprocal Square Root Step.

FSQRT (scalar): Floating-point Square Root (scalar).

FSQRT (vector): Floating-point Square Root (vector).

FSUB (scalar): Floating-point Subtract (scalar).

FSUB (vector): Floating-point Subtract (vector).

加密算法

AESD: AES single round decryption.

AESE: AES single round encryption.

AESIMC: AES inverse mix columns.

AESMC: AES mix columns.

SHA1C: SHA1 hash update (choose).

SHA1H: SHA1 fixed rotate.

SHA1M: SHA1 hash update (majority).

SHA1P: SHA1 hash update (parity).

SHA1SU0: SHA1 schedule update 0.

SHA1SU1: SHA1 schedule update 1.

SHA256H: SHA256 hash update (part 1).

SHA256H2: SHA256 hash update (part 2).

SHA256SU0: SHA256 schedule update 0.

SHA256SU1: SHA256 schedule update 1.

SHA512H: SHA512 Hash update part 1.

SHA512H2: SHA512 Hash update part 2.

SHA512SU0: SHA512 Schedule Update 0.

SHA512SU1: SHA512 Schedule Update 1.

SM3PARTW1: SM3PARTW1.

SM3PARTW2: SM3PARTW2.

SM3SS1: SM3SS1.

SM3TT1A: SM3TT1A.

SM3TT1B: SM3TT1B.

SM3TT2A: SM3TT2A.

SM3TT2B: SM3TT2B.

SM4E: SM4 Encode.

SM4EKEY: SM4 Key.

其他指令

LDNP (SIMD&FP): Load Pair of SIMD&FP registers, with Non-temporal hint.

LDP (SIMD&FP): Load Pair of SIMD&FP registers.

LDUR (SIMD&FP): Load SIMD&FP Register (unscaled offset).

STNP (SIMD&FP): Store Pair of SIMD&FP registers, with Non-temporal hint.

STP (SIMD&FP): Store Pair of SIMD&FP registers.

STUR (SIMD&FP): Store SIMD&FP register (unscaled offset).