Need help?
<- Back

Comments (27)

  • luizfelberti
    A bit dated in the sense that for Linux you'd probably use io_uring nowadays, but otherwise it's a timeless designStill, I'm conflicted on whether separating stages per thread (accept on one thread and the client loop in another) is a good idea. It sounds like the gains would be minimal or non-existent even in ideal circumstances, and on some workloads where there's not a lot of clients or connection churn it would waste an entire core for handling a low-volume event.I'm open to contrarian opinions on this though, maybe I'm not seeing soemthing...
  • lmz
  • kogus
    Slightly tangential, but why is the first diagram duplicated at .1 opacity?
  • ratrocket
    discussed in 2016: https://news.ycombinator.com/item?id=10872209 (53 comments)
  • bee_rider
    > One thread per core, pinned (affinity) to separate CPUs, each with their own epoll/kqueue fd> Each major state transition (accept, reader) is handled by a separate thread, and transitioning one client from one state to another involves passing the file descriptor to the epoll/kqueue fd of the other thread.So this seems like a little pipeline that all of the requests go through, right? For somebody who doesn’t do server stuff, is there a general idea of how many stages a typical server might be able to implement? And does it create a load-balancing problem? I’d expect some stages to be quite cheap…
  • password4321
    Always interesting to review the latest techempower web framework benchmarks, though it's been a year:https://www.techempower.com/benchmarks/#section=data-r23&tes...
  • rot13maxi
    i havent seen an sdf1.org url in a looooong time. lovely to see its still around
  • fao_
    this is more or less, in some way, what Erlang does and how Erlang is so easy to scale.
  • epicprogrammer
    It’s an interesting throwback to SEDA, but physically passing file descriptors between different cores as a connection changes state is usually a performance killer on modern hardware. While it sounds elegant on a whiteboard to have a dedicated 'accept' core and a 'read' core, you end up trading a slightly simpler state machine for massive L1/L2 cache thrashing. Every time you hand off that connection, you immediately invalidate the buffers and TCP state you just built up. There’s a reason the industry largely settled on shared-nothing architectures like NGINX having a single pinned thread handle the entire lifecycle of a request keeps all that data strictly local to the CPU cache. When you're trying to scale, respecting data locality almost always beats pipeline cleanliness.