Need help?
<- Back

Comments (49)

  • mananaysiempre
    Worth mentioning (haven’t checked if the paper talks about this) that while the industry mostly forgot about derivatives and extended REs (i.e. REs with intersection and negation), academia did not. Unfortunately, there have been some pretty discouraging results: the DFA for an extended RE (including a lazy DFA implemented using derivatives, as here) is worst-case doubly exponential in the length of the expression[1], not just exponential as for normal REs. So there is a potential reason not to support intersections in one’s RE matcher, even if they are enticingly easy to implement in terms of derivatives (and even if I personally like to see experimentation in this direction).[1] https://www.sciencedirect.com/science/article/pii/S030439751...
  • noelwelsh
    I love regular expression derivatives. One neat thing about regular expression derivatives is they are continuation-passing style for regular expressions. The derivative is "what to do next" after seeing a character, which is the continuation of the re. It's a nice conceptual connection if you're into programming language theory.Low-key hate the lack of capitalization on the blog, which made me stumble over every sentence start. Great blog post a bit marred by unnecessary divergence from standard written English.
  • balakk
    Finally, an article about humans programming some computers. Thank you!
  • agnishom
    The author mentions that they found Mamouras et al. (POPL 2024), but not the associated implementation. While the Rust implementation is not public, a Haskell implementation can be found here: https://github.com/Agnishom/lregex
  • klibertp
    If you claim it's the fastest, how does it compare to one-more-re-nightmare?- https://github.com/telekons/one-more-re-nightmare- https://applied-langua.ge/posts/omrn-compiler.htmlOMRN is a regex compiler that leverages Common Lisp's compiler to produce optimized assembly to match the given regex. It's incredibly fast. It does omit some features to achieve that, though.
  • gbacon
    That’s beautiful work. Check out other examples in the interactive web app:https://ieviev.github.io/resharp-webapp/Back in the Usenet days, questions came up all the time about matching substrings that do not contain whatever. It’s technically possible without an explicit NOT operator because regular languages are closed under complement — along with union, intersection, Kleene star, etc. — but a bear to get right by hand for even simple cases.Unbounded lookarounds without performance penalty at search time are an exciting feature too.
  • sieep
    Cool stuff. Reminds me of the content you used to see on here all the time before AI took over
  • masfuerte
    This is very impressive.> how does RE# find the leftmost-longest match efficiently? remember the bidirectional scanning we mentioned earlier - run the DFA right to left to find all possible match starts, then run a reversed DFA left to right to find the ends. the leftmost start paired with the rightmost end gives you leftmost-longest. two linear DFA scans, no backtracking, no ambiguity.I'm pretty sure that should say "the leftmost start paired with the leftmost end".This also implies that the algorithm has to scan the entire input to find the first match, and the article goes on to confirm this. So the algorithm is a poor choice if you just want the first match in a very long text. But if you want all matches it is very good.
  • sourcegrift
    I've had nothing but great experience with F#. If it wasn't associated with Microsoft, it'd be more popular than haskell
  • andriamanitra
    This is very interesting. I'm a bit skeptical about the benchmarks / performance claims because they seem almost too good to be true but even just the extended operators alone are a nice improvement over existing regex engines.The post mentions they also have a native library implemented in Rust without dependencies but I couldn't find a link to it. Is that available somewhere? I would love to try it out in some of my projects but I don't use .NET so the NuGET package is of no use to me.
  • anentropic
    Fantastic stuff!FYI some code snippets are unreadable in 'light mode' ("what substrings does the regex (a|ab)+ match in the following input?")
  • feldrim
    You got me at TalTech. Great job and the paper is high quality. I'll have to learn F# but I believe it is worth it.
  • andix
    It should be possible to directly compile this library to native code and use it in any other language. Maybe adding some C-style wrappers will be needed.
  • FrustratedMonky
    F# is one of the biggest 'What could have beens'. Great language, that just didn't hit the right time, or reach critical mass of the gestalt of the community.
  • meindnoch
    @burnsushi is that true?
  • sourcegrift
    [flagged]
  • nanoxide
    Calling this RE# (resharp), when there is a much more popular and established product already named R# (ReSharper, by JetBrains) in the. NET world will probably hurting your SEO and/or could potentially cause some legal grief.