When designing a resilient computing system, the desired degree of Reliability, Availability, and Serviceability (RAS) must be assessed and guaranteed. This article presents a Hardware-Software (HW-SW) Interface for Error Logging and Reporting independent of specific Instruction Set Architectures (ISA), aiming to improve RAS in computing systems. A HW-SW Interface defines the facilities by which detected hardware errors are logged into an ad hoc set of registers (i.e., Error Record) and then reported to system software. System software will promptly address and recover from those errors, preventing system failures. Our architecture offers flexible and configurable Error Logging and Reporting features, satisfying the requirements of different application scenarios by selectively incorporating or removing specific features. After reporting the most relevant results from synthesis on FPGA (Xilinx UltraScale+ MPSoC) and Standard-Cell technologies (45nm and 7nm libraries), we discuss them to provide valuable insights on the dependency of resource utilization on error logging capability. The principal findings demonstrate that the developed module would not limit system operating frequency, and its area occupation can be readily configured to align with desired logging and reporting features to be implemented. Then, we validate the Error Logging and Reporting features of our architecture by developing a test SoC on FPGA that emulates a computing system, including a 32-bit RISC-V core and two ECC-protected (Error Correcting Code) memories. The proposed HW-SW Interface extends beyond monitoring only ECC-protected memories, yet it can monitor any system module incorporating error control logic.
HW-SW Interface Design and Implementation for Error Logging and Reporting for RAS Improvement
Nicasio Canino
Primo
;Stefano Di Matteo;Daniele Rossi;Sergio Saponara
2024-01-01
Abstract
When designing a resilient computing system, the desired degree of Reliability, Availability, and Serviceability (RAS) must be assessed and guaranteed. This article presents a Hardware-Software (HW-SW) Interface for Error Logging and Reporting independent of specific Instruction Set Architectures (ISA), aiming to improve RAS in computing systems. A HW-SW Interface defines the facilities by which detected hardware errors are logged into an ad hoc set of registers (i.e., Error Record) and then reported to system software. System software will promptly address and recover from those errors, preventing system failures. Our architecture offers flexible and configurable Error Logging and Reporting features, satisfying the requirements of different application scenarios by selectively incorporating or removing specific features. After reporting the most relevant results from synthesis on FPGA (Xilinx UltraScale+ MPSoC) and Standard-Cell technologies (45nm and 7nm libraries), we discuss them to provide valuable insights on the dependency of resource utilization on error logging capability. The principal findings demonstrate that the developed module would not limit system operating frequency, and its area occupation can be readily configured to align with desired logging and reporting features to be implemented. Then, we validate the Error Logging and Reporting features of our architecture by developing a test SoC on FPGA that emulates a computing system, including a 32-bit RISC-V core and two ECC-protected (Error Correcting Code) memories. The proposed HW-SW Interface extends beyond monitoring only ECC-protected memories, yet it can monitor any system module incorporating error control logic.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.