缩写

缩写 全称及说明
SRAO Software Recoverable Action Optional
SRAO通过MCE或CMCI上报。SRAO Error表示系统中某些数据损坏,但是未被消费。软件恢复措施是可选的,可以根据MCACOD采取恢复策略
SRAR Software Recoverable Action Required
SRAR通过MCE上报。SRAR Error表示系统中某些数据损坏且正在被消费,软件必须在此CPU任务调度前采取recovery action(通常是kill当前cpu进程,但不限于此)。如果无法恢复,比如无法获取Addr或Task信息,则应该Panic
EDAC Error Detection And Coreection
CE Corrected Error
UCE Uncorrected Error
UCR Uncorrected Recoverable Error(硬件无法自修复,但软件可采取行为修复错误
MCA Machine Check Architecture
MCE Machine Check Exception异常
CMCI Corrected Machine Check Error Interrupt(中断
机器检查错误表示内存中发生了可纠正的错误,通常是由于软件或硬件故障引起的。这种错误不会导致系统崩溃,但可能会导致系统性能下降。
UCNA Uncorrected No Action Required
UCNA通过CMCI上报。UCNA Error表示系统中的某些数据已损坏,但数据尚未被消费(即没被read),并且处理器状态可用,程序可以继续在处理器执行

工具下载

https://kernel.googlesource.com/pub/scm/linux/kernel/git/aegl/rastools/

如果支持EINJ确实在boot日志里看到EINJ的相关信息
ACPI: EINJ 0x0000000049D553E0 000150 (v01 ALASKA A M I 00000001 INTL 00000001)

einj.ko内核驱动安装

加载einj.ko.xz驱动需要配置BIOS的相关选项WHEA Error Injection Support
1、驱动路径/lib/modules/xx/kernel/drivers/acpi/apei/einj.ko.xz
2、安装驱动 – insmode /lib/modules/xx/kernel/drivers/acpi/apei/einj.ko.xz
3、可以查看对应einj对应节点 ll /sys/kernel/debug/apei/einj/

--w------- 1 root root 0 Sep 11 10:37 error_inject
-rw------- 1 root root 0 Sep 11 10:37 error_type
-rw------- 1 root root 0 Sep 11 10:37 flags
-rw------- 1 root root 0 Sep 11 10:37 notrigger
-rw------- 1 root root 0 Sep 11 10:37 param1
-rw------- 1 root root 0 Sep 11 10:37 param2
-rw------- 1 root root 0 Sep 11 10:37 param3
-rw------- 1 root root 0 Sep 11 10:37 param4
-r-------- 1 root root 0 Sep 11 10:37 vendor
-rw------- 1 root root 0 Sep 11 10:37 vendor_flags
-r-------- 1 root root 0 Sep 11 10:37 available_error_type

$ insmod einj.ko.xz — 该驱动打开内核相关选项之后,被编译一个内核驱动。
驱动加载成功可以内核日志看到如下信息

EINJ: Error INJection is initialized.

内核文档

Documentation/acpi/apei/einj.txt

			APEI Error INJection
			~~~~~~~~~~~~~~~~~~~~

EINJ provides a hardware error injection mechanism. It is very useful
for debugging and testing APEI and RAS features in general.

You need to check whether your BIOS supports EINJ first. For that, look
for early boot messages similar to this one:

ACPI: EINJ 0x000000007370A000 000150 (v01 INTEL           00000001 INTL 00000001)

which shows that the BIOS is exposing an EINJ table - it is the
mechanism through which the injection is done.

Alternatively(或者), look in /sys/firmware/acpi/tables for an "EINJ" file,
which is a different representation of the same thing.

It doesn't necessarily mean(并不一定意味着) that EINJ is not supported if those above(以上)
don't exist: before you give up, go into BIOS setup to see if the BIOS
has an option to enable error injection. Look for something called WHEA
or similar. Often, you need to enable an ACPI5 support option prior(事先), in
order to see the APEI,EINJ,... functionality supported and exposed by
the BIOS menu.

To use EINJ, make sure the following are options enabled in your kernel
configuration:

CONFIG_DEBUG_FS
CONFIG_ACPI_APEI
CONFIG_ACPI_APEI_EINJ

The EINJ user interface is in <debugfs mount point>/apei/einj.

The following files belong to it:

- available_error_type

  This file shows which error types are supported:

  Error Type Value	Error Description
  ================	=================
  0x00000001		Processor Correctable
  0x00000002		Processor Uncorrectable non-fatal
  0x00000004		Processor Uncorrectable fatal
  0x00000008		Memory Correctable
  0x00000010		Memory Uncorrectable non-fatal
  0x00000020		Memory Uncorrectable fatal
  0x00000040		PCI Express Correctable
  0x00000080		PCI Express Uncorrectable fatal
  0x00000100		PCI Express Uncorrectable non-fatal
  0x00000200		Platform Correctable
  0x00000400		Platform Uncorrectable non-fatal
  0x00000800		Platform Uncorrectable fatal

  The format of the file contents are as above, except present are only
  the available error types.

- error_type

  Set the value of the error type being injected. Possible error types
  are defined in the file available_error_type above.

- error_inject

  Write any integer to this file to trigger the error injection. Make
  sure you have specified all necessary error parameters, i.e. this
  write should be the last step when injecting errors.

- flags

  Present(目前) for kernel versions 3.13 and above. Used to specify(说明) which
  of param{1..4} are valid and should be used by the firmware during
  injection. Value is a bitmask as specified in ACPI5.0 spec for the
  SET_ERROR_TYPE_WITH_ADDRESS data structure:

	Bit 0 - Processor APIC field valid (see param3 below).
	Bit 1 - Memory address and mask valid (param1 and param2).
	Bit 2 - PCIe (seg,bus,dev,fn) valid (see param4 below).

  If set to zero, legacy behavior is mimicked(模仿) where the type of
  injection specifies just one bit set, and param1 is multiplexed.

- param1

  This file is used to set the first error parameter value. Its effect
  depends on the error type specified in error_type. For example, if
  error type is memory related type, the param1 should be a valid
  physical memory address. [Unless "flag" is set - see above]

- param2

  Same use as param1 above. For example, if error type is of memory
  related type, then param2 should be a physical memory address mask.
  Linux requires page or narrower granularity(更窄粒度), say, 0xfffffffffffff000.

- param3

  Used when the 0x1 bit is set in "flags" to specify the APIC id

- param4
  Used when the 0x4 bit is set in "flags" to specify target PCIe device

- notrigger

  The error injection mechanism is a two-step process. First inject the
  error, then perform some actions to trigger it. Setting "notrigger"
  to 1 skips the trigger phase(阶段), which *may* allow the user to cause the
  error in some other context by a simple access to the CPU, memory
  location, or device that is the target of the error injection. Whether
  this actually works depends on what operations the BIOS actually
  includes in the trigger phase.

BIOS versions based on the ACPI 4.0 specification have limited options
in controlling where the errors are injected. Your BIOS may support an
extension (enabled with the param_extension=1 module parameter, or boot
command line einj.param_extension=1). This allows the address and mask
for memory injections to be specified by the param1 and param2 files in
apei/einj.

BIOS versions based on the ACPI 5.0 specification have more control over
the target of the injection. For processor-related errors (type 0x1, 0x2
and 0x4), you can set flags to 0x3 (param3 for bit 0, and param1 and
param2 for bit 1) so that you have more information added to the error
signature being injected. The actual data passed is this:

	memory_address = param1;
	memory_address_range = param2;
	apicid = param3;
	pcie_sbdf = param4;

For memory errors (type 0x8, 0x10 and 0x20) the address is set using
param1 with a mask in param2 (0x0 is equivalent to all ones). For PCI
express errors (type 0x40, 0x80 and 0x100) the segment, bus, device and
function are specified using param1:

         31     24 23    16 15    11 10      8  7        0
	+-------------------------------------------------+
	| segment |   bus  | device | function | reserved |
	+-------------------------------------------------+

Anyway, you get the idea, if there's doubt just take a look at the code
in drivers/acpi/apei/einj.c.

An ACPI 5.0 BIOS may also allow vendor-specific errors to be injected.
In this case a file named vendor will contain identifying information
from the BIOS that hopefully will allow an application wishing to use
the vendor-specific extension to tell that they are running on a BIOS
that supports it. All vendor extensions have the 0x80000000 bit set in
error_type. A file vendor_flags controls the interpretation of param1
and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor
documentation for details (and expect changes to this API if vendors
creativity in using this feature expands beyond our expectations).


An error injection example:

# cd /sys/kernel/debug/apei/einj
# cat available_error_type		# See which errors can be injected
0x00000002	Processor Uncorrectable non-fatal
0x00000008	Memory Correctable
0x00000010	Memory Uncorrectable non-fatal
# echo 0x12345000 > param1		# Set memory address for injection
# echo $((-1 << 12)) > param2		# Mask 0xfffffffffffff000 - anywhere in this page
# echo 0x8 > error_type			# Choose correctable memory error
# echo 1 > error_inject			# Inject now

You should see something like this in dmesg:

[22715.830801] EDAC sbridge MC3: HANDLING MCE MEMORY ERROR
[22715.834759] EDAC sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
[22715.834759] EDAC sbridge MC3: TSC 0
[22715.834759] EDAC sbridge MC3: ADDR 12345000 EDAC sbridge MC3: MISC 144780c86
[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)

For more information about EINJ, please refer to ACPI specification
version 4.0, section 17.5 and ACPI 5.0, section 18.6.

内存UE/CE故障注入类型

./einj_mem_uc: invalid option -- '-'
Usage: ./einj_mem_uc [-a][-c count][-d delay][-f][-i][j][k] [-m runup:size:align][testname]
  Testname Fatal Description
  single   no    Single read in pipeline to target address, generates SRAR machine check
  double   no    Double read in pipeline to target address, generates SRAR machine check
  split    YES   Unaligned read crosses cacheline from good to bad. Probably fatal
  THP      no    Try to inject in transparent huge page, generates SRAR machine check
  hugetlb  no    Try to inject in hugetlb page, generates SRAR machine check
  store    no    Write to target address. Should generate a UCNA/CMCI
  prefetch no    Prefetch data into L1 cache. Should generate CMCI
  memcpy   YES   Streaming read from target address. Probably fatal
  instr    no    Instruction fetch. Generates SRAR that OS should transparently fix
  patrol   no    Patrol scrubber, generates SRAO machine check
  thread   no    Single read by two threads to target address at the same time, generates SRAR machine check
  share    no    Share memory is read by two tasks to target address, generates SRAR machine check
  overflow YES   Read to two target addresses at the same time, Probably fatal
  llc      no    Cache write-back, generates SRAO machine check
  copyin   YES   Kernel copies data from user. Probably fatal
  copyout  YES   Kernel copies data to user. Probably fatal
  copy-on-write YES   Kernel copies user page. Probably fatal
  futex    YES   Kernel access to futex(2). Probably fatal
  mlock    no    mlock target page then inject/read to generates SRAR machine check
  core_ce  no    Core corrected error
  core_non_fatal no    Core deferred error
  core_fatal YES   Core uncorrected error. Should fatal
Single read in pipeline 个人理解这里pipeline指的是cpu流水线
split 对齐读取从好到坏跨越缓存行,可能致命。
THP Transparent Huge Pages是在运行动态分配的大页内存,而标准的HugePages是在系统启动时预先分配内存,并在系统运行时不再改变。
hugetlb hugetlb 相当于是 huge page 页面管理者页面的分配及释放,都由此模块负责
prefetch 数据存放内存中)预取到一级缓存
patrol 巡检
futex The futex() system call provides a method for waiting until a certain condition becomes true. It is typically used as a blocking construct in the con‐text of shared-memory synchronization.When using futexes, the majority of the synchronization operations are performed in user space.
mlock 系统调用 mlock 家族允许程序在物理内存上锁住它的部分或全部地址空间。这将阻止Linux 将这个内存调度交换空间swap space),即使该程序已有一段时间没有访问这段空间

MCE

MCE(Machine Check Exception)是由CPU侦测出来的错误,它错误包含两种主要类型:notice(提示)/warning(警告),和fatal exception(致命性的错误)。Warning(警告)将会在你的系统log输出一条类似于”Machine Check Event logged”的信息,我们可以通过一些linux应用程序对这部分log进行详细的信息查看;而fatal MCE(致命的错误)会导致机器停止响应,MCE的详细信息也将会输出到系统的console中。

什么会导致MCE错误出现?

EDAC: https://zhuanlan.zhihu.com/p/29013350
RAS特性解析https://www.zhihu.com/people/helloxiao-cui/posts
https://blog.csdn.net/leoufung/article/details/48784191?ydreferer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8%3D

内存类型

文件页(File-backed Page)

通过free看到的缓存cache统计对应的缓存页都是文件页,它们都对应着系统中的文件数据。如果没有与之对应的文件,我们就称其为匿名页。File-backed Pages在内存不足的时候可以直接写回对应的硬盘文件里,即Page-out,以释放内存,需要时从磁盘再次读取数据。比如我们可以通过echo 3 > /proc/sys/vm/drop_caches方式释放大部分cache。

匿名页(Anonymous Page)

应用程序使用的堆,栈,数据段等,没有文件背景页面被称为匿名页,它们不是以文件形式存在,因此无法和磁盘文件交换,但可以通过硬盘划分额外的swap交换分区使用交换文件进行交换,即Swap-out。匿名页与用户进程共存,进程退出则匿名页释放,而Page Cache即使在进程退出后还可以缓存

脏页(Dirty Page)

应用程序修改过,并且暂时还没写入磁盘数据使用的内存页被称为脏页(Dirty Page)。如果要释放这些页面,就得先写入磁盘。这些脏页,一般可以通过两种方式写入磁盘一个是通过系统调用fsync,把脏页刷到磁盘中;也可以交给系统,由内核线程Pdflush将脏页刷到磁盘

大页(Hugepages)

为了降低TLB miss概率,Linux引入了Hugepages机制,可以设定Page大小为2MB或者1GB。2MB的Hugepages机制下,同样256GB内存需要页表项降低为256GB/2MB=131072,仅需要2MB。因此Hugepages的页表可以全量缓存在CPU cache中。 通过sysctl -w vm.nr_hugepages=1024可以设置hugepages的个数为1024,总大小为4GB。需要注意是,设置huagepages会从系统申请连续2MB的内存块并进行保留(不能用于正常内存申请),如果系统运行一段时间导致内存碎片较多时,再申请hugepages会失败

透明大页(THP)

由于Huge pages很难手动管理,而且通常需要代码进行重大的更改才能有效的使用,因此又引入了Transparent Huge Pages(THP),THP 是一个抽象层,能够自动创建管理使用传统大页。标准大页管理是预分配的方式,而透明大页管理则是动态分配的方式。

LRU

LRU(Least Recently Used) 中文翻译是 最近最少使用 的意思,其原理就是:当内存不足时,淘汰系统中最少使用的内存,这样对系统性能的损耗是最小的。

vdso page

vDSO是virtual dynamic shared object的缩写,表示这段mapping实际包含的是一个ELF共享目标文件,也就是俗称的.so

内存错误分类

内存常见错误按照类型主要包括CE(Correctable Error)、UCE(Un-Correctable Error),按照场景主要包括内存读写错误、内存巡检错误。

类型

类型 说明
CE 服务器在运行过程中,发生了错误,但错误可以通过ECC(Error Checking and Correcting)来纠正。所以有时又将CE错误称为ECC错误。偶发性的地址命令错误、x4颗粒内存的单颗粒多bit错误、x8颗粒内存的单颗粒单bit错误都有可能导致ECC错误。CE错误对系统没有影响
CE Storm BIOS对每次SMI中断处理记录时间戳,每两个SMI中断的时间间隔小于1分钟就会连续计数,当计数达到10个时就判定为CE风暴。
CE Overflow 内存可纠正错误以rank为单位提供计数阈值设置,当rank内的可纠正错误达到阈值溢出时,触发SMI中断。同时,考虑到时间维度,加入硬件漏斗,即每间隔一定的时间,未出现CE错误时,故障计数器自动减1。
UCE 服务器在运行过程中,发生了错误并且错误无法通过ECC来纠正。x8颗粒内存的多bit错误、x4颗粒的多颗粒多bit错误、持续地址命令错误都有可能导致UCE、register芯片损坏。

场景

内存读写错误(Corrected read/write Error) 服务器运行过程中,业务处理时进行数交换,内存出现故障导致数据错误,传输过程中,Intel CPU检测到后上报告警
内存巡检错误 服务器运行过程中,Intel CPU会针对内存进行巡检,若发现内存UCE故障则上报OS告警,但很多情况下内存实际并未发生故障,数据校验机制有潜在Bug,导致产生误报,可以降级成CE处理。
Corrected Patrol Scrub Error 在空闲的时候读取内存中的内容,如果读出的数据存在可以纠正的错误(不可恢复的错误即为Downgraded Uncorrected Patrol Scrubbing Error),将纠正后的数据重新写入到内存中。

来源:https://info.support.huawei.com/compute/docs/zh-cn/kunpeng-knowledge/typical-scenarios-1/zh-cn_topic_0000001137846921.html

原文地址:https://blog.csdn.net/qq_42931917/article/details/132780258

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。

如若转载,请注明出处:http://www.7code.cn/show_7983.html

如若内容造成侵权/违法违规/事实不符,请联系代码007邮箱:suwngjj01@126.com进行投诉反馈,一经查实,立即删除

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注