目录

​​故障现象​​

​​第一次日志信息​​

​​第二次日志信息​​

​​故障信息分析​​

​​ESR寄存器(Exception Syndrome Register)​​

​​EC, bits [31:26]​​

​​DFSC, bits [5:0]​​

​​arm对对齐的支持情况​​

​​对齐概念​​

​​ 代码回顾​​

​​总结​​

​​参考链接​​


故障现象

第一次日志信息

[  615.940400] Unable to handle kernel paging request at virtual address ffffffc011d20302
[  615.949460] Mem abort info:
[  615.953431]   ESR = 0x96000061
[  615.957615]   EC = 0x25: DABT (current EL), IL = 32 bits
[  615.964070]   SET = 0, FnV = 0
[  615.968256]   EA = 0, S1PTW = 0
[  615.972520] Data abort info:
[  615.976546]   ISV = 0, ISS = 0x00000061
[  615.981532]   CM = 0, WnR = 1
[  615.985636] swapper pgtable: 4k pages, 39-bit VAs, pgdp=000000008128c000
[  615.993465] [ffffffc011d20302] pgd=000000237ffff003, pud=000000237ffff003, pmd=000000236c95d003, pte=006800005ad38707
[  616.005217] Internal error: Oops: 96000061 [#1] SMP
---------------------------------------------------------------------------------------------------------------------------------
[  616.011269]  hid_generic
[  616.042652] PVR_K:  198: ------SetFrequency point @100000000
[  616.098274]   [last unloaded: net2280]
[  616.104969] PVR_K:  198: ----SetFrequency point succeed!
[  616.108703] CPU: 4 PID: 6665 Comm: ScpiApp Tainted: G           OE     5.4.18-42-generic #31-
[  616.121224] PVR_K:  198: ----SetVoltage not implemented by hw.
[  616.127637] Source Version: 894b1bceb73a8b114973b8180fdf69d991c16b6e

[  616.158906] pstate: 80000005 (Nzcv daif -PAN -UAO)
[  616.164821] pc : Ioctl+0x90/0xe8 [UsbDev]
[  616.171514] lr : Ioctl+0x78/0xe8 [UsbDev]
[  616.178204] sp : ffffffa1ca4fbd70
[  616.182635] x29: ffffffa1ca4fbd70 x28: ffffffa1ff7bbd00 
[  616.189065] x27: ffffffc010bf1000 x26: 000000000000001d 
[  616.195495] x25: 00000000000001b4 x24: ffffffa2d2e9ba00 
[  616.201925] x23: 0000000000000009 x22: ffffffa1ca4fbdac 
[  616.208354] x21: ffffffc011d20302 x20: 0000000000000000 
[  616.214783] x19: ffffffa1ff7bbd00 x18: 0000000000000000 
[  616.221213] x17: 0000007fa8166280 x16: ffffffc01028a648 
[  616.227642] x15: 000000007fffffde x14: 0000000000000010 
[  616.234071] x13: 0000000002010100 x12: 0000000000000007 
[  616.240500] x11: 0000007f9f7fc148 x10: 0000000000000000 
[  616.246929] x9 : 00000000ffffff80 x8 : ffffffa2d2e9baf8 
[  616.253358] x7 : 0000000000000009 x6 : ffffffa1ca4fbdb8 
[  616.259787] x5 : ffffffa1ca4fbdb8 x4 : 0000000000000001 
[  616.266216] x3 : 0000000000000302 x2 : 000000000000000c 
[  616.272645] x1 : 0000007f9f7fc3d4 x0 : 0000000000000002 
[  616.279074] Call trace:
[  616.282643]  Ioctl+0x90/0xe8 [UsbDev]
[  616.288987]  ioctl+0xa8/0x1c0 [UsbDev]
[  616.294899]  do_vfs_ioctl+0x370/0x7a8
[  616.299679]  ksys_ioctl+0x78/0xa8
[  616.304112]  sys_ioctl+0xc/0x18
[  616.308373]  el0_svc_naked+0x30/0x34
[  616.313068] Code: b94047e0 8b0002b5 d50332bf b94043e0 (b90002a0) 
[  616.320280] ---[ end trace 6c034d060f6d2e30 ]---
 

第二次日志信息

[   98.191092] Unable to handle kernel paging request at virtual address ffffffc011b18302

[   98.200149] Mem abort info:

[   98.204081]   ESR = 0x96000061

[   98.208260]   EC = 0x25: DABT (current EL), IL = 32 bits

[   98.214702]   SET = 0, FnV = 0

[   98.218883]   EA = 0, S1PTW = 0

[   98.223149] Data abort info:

[   98.227158]   ISV = 0, ISS = 0x00000061

[   98.232120]   CM = 0, WnR = 1

[   98.236231] swapper pgtable: 4k pages, 39-bit VAs, pgdp=000000008128c000

[   98.244061] [ffffffc011b18302] pgd=000000237ffff003, pud=000000237ffff003, pmd=000000236c961003, pte=006800005ad38707

[   98.255806] Internal error: Oops: 96000061 [#1] SMP

[   98.261807]  [last unloaded: net2280]

[   98.372669] Source Version: 894b1bceb73a8b114973b8180fdf69d991c16b6e

[   98.386743] pstate: 80000005 (Nzcv daif -PAN -UAO)

[   98.406385] sp : ffffffa248cc3d70

[   98.410817] x29: ffffffa248cc3d70 x28: ffffffa2e8bf2dc0

[   98.417249] x27: ffffffc010bf1000 x26: 000000000000001d

[   98.423679] x25: 00000000000001b4 x24: ffffffa288e01000

[   98.430108] x23: 000000000000000a x22: ffffffa248cc3dac

[   98.436537] x21: ffffffa2e8bf2dc0 x20: ffffffc011b18302

[   98.442966] x19: 0000000000000000 x18: 0000000000000020

[   98.449395] x17: 0000007f8adf9280 x16: ffffffc01028a648

[   98.455824] x15: ffffffc01138f000 x14: ffffffc011472732

[   98.462254] x13: 0000000000000000 x12: ffffffc011471000

[   98.468683] x11: ffffffc01138f000 x10: 0000000000000000

[   98.475112] x9 : 0000000000000004 x8 : 0000000000001467

[   98.481540] x7 : 0000000000000001 x6 : 0000000000000001

[   98.487969] x5 : 0000000000000000 x4 : 0000000000000001

[   98.494397] x3 : 0000000000000006 x2 : e275b1801ba2b200

[   98.500826] x1 : 0000000000000000 x0 : 0000000000000002

[   98.507256] Call trace:

故障信息分析

故障环境: 此处代码为访问pcie 转USB 设备,通过pcie bar0 地址映射后,对pcie设备的寄存器进行访问。

ESR寄存器(Exception Syndrome Register)

寄存器描述在arm 手册的如下章节

arm 访问内存异常_Data

寄存器的具体含义如下

arm 访问内存异常_arm_02

EC, bits [31:26]

Exception Class. Indicates the reason for the exception that this register holds information about.
For each EC value, the table references a subsection that gives information about:
• The cause of the exception, for example the configuration required to enable the trap.
• The encoding of the associated ISS.
在本例中,ESR = 0x9600 0061 因而EC的值为0b100101 ,对照arm手册中描述:

EC == 0b100101
Data Abort taken without a change in Exception level.
Used for MMU faults generated by data accesses, alignment faults other than those
caused by Stack Pointer misalignment, and synchronous External aborts, including
synchronous parity or ECC errors. Not used for debug-related exceptions.
See ISS encoding for an exception from a Data Abort.

1)数据访问时产生的MMU 异常;

2)除了SP对齐异常外的其他对齐异常;

arm 访问内存异常_Data_03

DFSC, bits [5:0]

Data Fault Status Code.
0b000000 Address size fault, level 0 of translation or translation table base register.
0b000001 Address size fault, level 1.

此处DFSC表示具体的错误类型。此外值为1,即地址大小错误。

DFSC

取值【bit 0-5】

错误类型

00-11

Address size fault

100-111

Translation fault

1001-1011

Access flag fault

1101-1111

Permission fault

 

 针对每类错误类型可以明确不同的排查方向。

arm对对齐的支持情况

 从arm角度看 memory 类型: 来源《Armv8-A memory model guide》,针对device memory(例如本问题中采用的pcie 3380芯片),arm不支持非对齐的访问方式。

arm 访问内存异常_嵌入式硬件_04

When the address is not a multiple of the element size, the access is unaligned.

--》》Unaligned accesses are allowed to addresses marked as Normal,

--》》but not to Device regions. An unaligned access to a Device region will trigger an exception (alignment fault).
--》》Unaligned accesses to regions marked as Normal can be trapped by setting SCTLR_ELx.A. If this bit is set, unaligned accesses to Normal regions also generate alignment faults. 

对齐概念

12.1. Alignment   来源《Armv8-A memory model guide》
An access is described as aligned if the address is a multiple of the element size.

1) 地址是被访问元素大小的整数倍,例如访问4字节的数据,则地址按照4字节对齐。
For LDR and STR instructions, the element size is the size of the access. For example, a LDRH instruction loads a 16-bit value and must be from an address which is a multiple of 16 bits to be considered aligned.
The LDP and STP instructions load and store a pair of elements, respectively. To be aligned, the address must be a multiple of the size of the elements, not the combined size of both elements. For example:
LDP X0, X1, [X2]
This example loads two 64-bit values, so 128 bits in total. The address in X2 needs to be a multiple of 64 bits to be considered aligned.
The same principle applies to vector loads and stores.
 

 代码回顾

 在我们的代码中,对于此外设的访问,采用writel接口,也就是四字节读写。

通过异常日志可以看到,异常时的地址为 xxxxxx2,ffffffc011b18302 此地址非4的倍数。

那么再走读代码,查看到对寄存器偏移访问时,使用了如下数据结构。

struct pci_usb_reg {

u32 reg0;

u32 reg1;

u16 reg2;

};

当访问到reg2时,其地址按照2字节对齐,导致出现异常(期望四字节对齐)

总结

上述代码,跨x86和arm运行,由于x86 CPU本身对非对齐有自己的处理,因而屏蔽了问题;进而在arm上暴露出来。

非对齐的访问,除了会在某些平台引发异常,通常在各个平台上会带来性能问题。类似可以阅读参考链接 1.

参考链接

对齐对性能 、一致性的影响、不对齐时,多线程访问时问题