Error Correction System Based on COTS Microcontrollers Working in Redundancy
Abstract
This paper presents the implementation, testing and analysis of an error detection and correction architecture for critical systems based on majority voting of several commercial off-the-shelf (COTS) microcontroller units (MCU). The architecture is based on at least three MCUs operating simultaneous and exchanging information by a controller area network (CAN) bus without the concept of master and slave. All MCUs must run the same software and maintain a Health Table with relevant information regarding each other. They can only differentiate themselves by a unique identifier set by hardware or by software at compile time. At any point during the code execution a vote involving all MCUs can be initiated to verify a result. The vote always uses only three of the system MCUs to try to reach a majority. This allows high flexibility regarding what units are used in the voting process. An error tracking system allows MCUs that are injecting too many errors to be isolated, so the system always uses only the most reliable units for the final vote. Once a majority is reached, all units can compare their own results to it and correct themselves if an error was detected. A real implementation of this architecture was created using four ARM-M4 MCUs for testing and use in an academic CubeSat. A flexible software implementation is presented and allows an error verification to be executed as many times as needed and in any number of variables. For different votes with different variables to be completely independent of each other, each votable variable needs to have its own Health Table. By creating a flexible Health Table structure in software, it's possible to add or remove votable variables easily, and to use the exact same voting function for any of them. The messages exchanged by CAN bus contains information regarding the sender, destination, message type and variable to vote, allowing the same message structure to be re-used. Combined with several internal security mechanisms, the system can detect faulty messages and keep operating even if the CAN bus itself corrupts the messages. To test the architecture and its implementation, an error generator was created using a pseudorandom number generator (PRNG). Each MCU can generate corrupted results for the vote process according to an individual user-defined probability. This allowed several test cases to be prepared, where the error rate was increased individually for each MCU in increments of 5%. Starting with only one MCU generating errors, and then adding the others one-by-one until all four were generating errors. The task chosen for the MCUs was to calculate the first one hundred prime numbers. Each test case was repeated a hundred times to reduce the PRNG influence over the test result. The different test cases and results are presented and analyzed. The architecture proposed proved itself fully functional, allowing the system to detect and correct most of the errors injected by the MCUs. © 2022 IEEE.
- Computer hardware
- Control system synthesis
- Controllers
- Error correction
- Error detection
- Health
- Network architecture
- Number theory
- Random number generation
- Redundancy
- Tracking (position)
- Verification
- Commercial off the shelves
- Commercial off-the shelves
- Commercial-off-the-shelf
- Controller-area-network bus
- Errors correction
- Implementation analysis
- Implementation testing
- Microcontroller unit
- Pseudorandom number generators
- Test case
- Microcontrollers
URI
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85137611822&doi=10.1109%2fAERO53065.2022.9843682&partnerID=40&md5=cc9e0f77db9d7934763917ec26c584f4https://repositorio.maua.br/handle/MAUA/711