Need help?
<- Back

Comments (11)

  • kelseyfrog
    If you squint your eyes it's a fixed iteration ODE solver. I'd love to see a generalization on this and the Universal Transformer metioned re-envisioned as flow-matching/optimal transport models.
  • the8472
    Does the training process ensure that all the intermediate steps remain interepretable, even on larger models? Not that we end up with some alien gibberish in all but the final step.
  • lukebechtel
    so it's:output = layers(layers(layers(layers(input))))instead of the classical:output = layer4(layer3(layer2(layer1(input))))