<- Back
Comments (77)
- dmkThe quote from the CMU guy about modern Agile and DevOps approaches challenging architectural discipline is a nice way of saying most of us have completely forgotten how to build deterministic systems. Time-triggered Ethernet with strict frame scheduling feels like it's from a parallel universe compared to how we ship software now.
- __dDoes anyone have pointers to some real information about this system? CPUs, RAM, storage, the networking, what OS, what language used for the software, etc etc?I’d love to know how often one of the FCMs has “failed silent”, and where they were in the route and so on too, but it’s probably a little soon for that.
- georgehm>Effectively, eight CPUs run the flight software in parallel. The engineering philosophy hinges on a >“fail-silent” design. The self-checking pairs ensure that if a CPU performs an erroneous calculation >due to a radiation event, the error is detected immediately and the system responds.>“A faulty computer will fail silent, rather than transmit the ‘wrong answer,’” Uitenbroek explained. >This approach simplifies the complex task of the triplex “voting” mechanism that compares results. > >Instead of comparing three answers to find a majority, the system uses a priority-ordered source >selection algorithm among healthy channels that haven’t failed-silent. It picks the output from the >first available FCM in the priority list; if that module has gone silent due to a fault, it moves to >the second, third, or fourth.One part that seems omitted in the explanation is what happens if both CPUs in a pair for whatever reason performs an erroneous calculation and they both match, how will that source be silenced without comparing its results with other sources.
- y1n0NASA didn't build this, Lockheed Martin and their subcontractors did. Articles and headlines like this make people think that NASA does a lot more than they actually do. This is like a CEO claiming credit for everything a company does.
- geomarkI sure wish they would talk about the hardware. I spent a few years developing a radiation hardened fault tolerant computer back in the day. Adding redundancy at multiple levels was the usual solution. But there is another clever check on transient errors during process execution that we implemented that didn't involve any redundancy. Doesn't seem like they did anything like that. But can't tell since they don't mention the processor(s) they used.
- vhiremath4> “Along with physically redundant wires, we have logically redundant network planes. We have redundant flight computers. All this is in place to cover for a hardware failure.”It would be really cool to see a visualization of redundancy measures/utilization over the course of the trip to get a more tangible feel for its importance. I'm hoping a bunch of interesting data is made public after this mission!
- jbrittonI wonder how often problems happen that the redundancy solves. Is radiation actually flipping bits and at what frequency. Can a sun flare cause all the computers to go haywire.
- starkparkerHeadline needs its how-dectomy reverted to make sense
- object-aHow big of a challenge are hardware faults and radiation for orbital data centers? It seems like you’d eat a lot of capacity if you need 4x redundancy for everything
- spaceman123Probably same way they’ve built fault-tolerant toilet.
- SeanAndersonTypo in the first sentence of the first paragraph is oddly comforting since AI wouldn't make such a typo, heh.Typo in the first sentence of the second paragraph is sad though. C'mon, proofread a little.
- nickpsecurityThe ARINC scheduler, RTOS, and redundancy have been used in safety-critical for decades. ARINC to the 90's. Most safety-critical microkernels, like INTEGRITY-178B and LynxOS-178B, came with a layer for that.Their redundancy architecture is interesting. I'd be curious of what innovations went into rad-hard fabrication, too. Sandia Secure Processor (aka Score) was a neat example of rad-hard, secure processors.Their simulation systems might be helpful for others, too. We've seen more interest in that from FoundationDB to TigerBeetle.
- seemaze
- hulituThey run 2 Outlook instances. For redundancy. /s
- ConanRus[dead]