Need help?
<- Back

Comments (76)

  • yardstick
    It’s been decades, why doesn’t getaddrinfo have a standardised way to specify a timeout? Set a timeout to 10 seconds and life becomes a lot easier.Yes I know in Linux you can set the timeout in a config file.But really the dns setting should be configurable by the calling code. Some code requires fast lookups and doesn’t mind failing which, while others won’t mind waiting longer. It’s not a one size fits all thing.
  • rwmj
    Netscape used to start a new thread (or maybe it was a subprocess?) to handle DNS lookups, because the API at the time (gethostbyname) was blocking. It's kind of amazing that we're 30 years on and this is still a problem.
  • pizlonator
    At first I wondered if musl does it better, so I checked, and the version I have disables cancellation in the guts of `getaddrinfo`.I've always thought APIs like `pthread_cancel` are too nasty to use. Glad to see well documented evidence of my crank opinion
  • senderista
  • comex
    pthread_cancel is not a good design because it operates entirely separately from normal mechanisms of error handling and unwinding. (That is, if you’re using C. If you’re using C++ it can integrate with exception handling.)A better approach would have been to mimic how kernels internally handle signals received during syscalls. Receiving a signal is supposed to cancel the syscall. But from the kernel’s perspective, a syscall implementation is just some code. It can call other functions, acquire locks, wait for conditions, and do anything else you would expect code to do. All of that needs to be cleanly cancelled and unwound to avoid breaking the rest of the system.So it works like this: when a signal is sent to a thread, a persistent “interrupted” flag is set for that thread. Like with pthread_cancel, this doesn’t immediately interrupt the thread, but only has an effect once the thread calls one of a specific set of functions. For pthread_cancel, that set consists of a bunch of syscalls and other “cancellation points”. For kernel-internal code, it consists of most functions that wait for a condition. The difference is in what happens afterwards. In pthread_cancel’s case, the thread is immediately aborted with only designated cleanups running. In the kernel, the condition-waiting function simply returns an error code. The caller is expected to handle this like any other error code, i.e. by performing any necessary cleanup and then returning the same error code itself. This continues until the entire chain of calls has been unwound. Classic C manual error handling. It’s nothing special, but because interruption works the same way as regular error handling, it‘s more likely to “just work”. Once everything is unwound, the “interrupted” flag is cleared and the original signal can be handled.(The error code for interruption is usually EINTR, but don’t confuse this with EINTR handling in userspace, which is a mess. The difference is because userspace generally doesn’t want to abort operations upon receiving EINTR, and because from userspace’s perspective there’s no persistent flag.)pthread_cancel could have been designed the same way: cancellation points return an error code rather than forcibly unwinding. Admittedly, this system might not work quite as well in userspace as it does in kernels. Kernel code already needs to be scrupulous about proper error handling, whereas userspace code often just aborts if a syscall fails. Still, the system would work fine for well-written userspace code, which is more than can be said for pthread_cancel.
  • albertzeyer
    Why not use getaddrinfo_a / getaddrinfo_async_start / GetAddrInfoExW?Or just use some standalone DNS resolve code or library (which basically replicates getaddrinfo but supports this in an async way)?See also here the discussion: https://github.com/crystal-lang/crystal/issues/13619
  • pajko
    This is clearly an implementation error in getaddrinfo(). It should set up cleanup functions: https://man7.org/linux/man-pages/man3/pthread_cleanup_push.3...
  • hacker_homie
    what's old is new again, I loved java in the early 2000's trying to remotely stop a threadThread.destroy() Thread.stop() Thread.suspend()so much potential for corrupted state.
  • Aardwolf
    Maybe this is naive, but could there just be some amount of worker threads that run forever, wait for and take jobs when needed, and message when the jobs are done? Don't need to be canceled, don't block
  • Someone
    > Then it needs to sort them if there is more than one address. And in order to do that it needs to read /etc/gai.confI don’t see why glibc would have to do that inside a call to getaddrinfo. can’t it do that once at library initialization? If it has to react to changes to that file while a process is running, couldn’t it have a separate thread for polling that file for changes, or use inotify for a separate thread to be called when it changes? Swapping in the new config atomically might be problematic, but I would think that is solvable.Even ignoring the issue mentioned it seems wasteful to open, parse, and close that file repeatedly.
  • jart
    Why can't they help fix the C library in question? Cancelation is really tricky to implement for the C library author. It's one of those concepts that, like fork, has implications that pervade everything. Please give your C library maintainers a little leeway if they get cancelation wrong. Especially if it's just a memory leak.
  • nly
    Why is running the DNS resolution thread a problem? It should be dequeuing resolution requests and pushing responses and sleeping when there is nothing to doWhen someone kills off the curl context surely you simply set a suicide flag on the thread and wake it up so it can be joined.
  • charcircuit
    Why isn't DNS in a service on the operating system instead of libc? You'll want requests to be locally cache anyways. This also makes it easier to just abandon a RPC instead of stopping a thread you don't control.
  • gary_0
    [deleted]
  • throwaway81523
    There might be a way to getaddrinfo asynchronously with io_uring by now. Otherwise just call the synchronous version in another thread and let it time out so the thread exits normally, right? Why bother with pthread_cancel?