Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

<- Back

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

Anon84

Comments (43)

contextfree
It seems ridiculous that, for example, Copilot running in Visual Studio working on a C# codebase finds stuff in code by grepping around instead of using the Roslyn-driven code symbol and semantic database built into Visual Studio. I'm guessing it's because the people they get to work on AI stuff are AI People who probably only write in Python
softwaredoug
In my research grep is fine if you don’t care about tokens and you have less than 100k files. The direct corpus interaction paper [1] shows a breakdown past this level. In my personal experience you get a bit better relevance than a BM25 search engine with grep plus an agent. But it requires you to eat tokens.If you think grep is great, it’s because you’ve been social engineered to organize your content to be findable. We document why something is useful to an agent. We put it in a logical place.Just organizing content is at least half of building search, agentic or not. It’s one reason Google is successful, we’re all trying to make our content findable by the search engine. It’s not all technology :)1- https://arxiv.org/abs/2605.05242
quinncom
Don’t presume this study has anything to do with programming. They measured an agent’s ability to search long conversations, not code.> We evaluate on a 116-question representative subset of the LongMemEval benchmark (Wu et al., 2025), which tests an agent’s ability to answer questions over long conversations spanning multiple sessions.
alexrigler
Combining regex filtering with semantic ranking using multi-vector embeddings has yielded good results for me. I use ColGREP from the LightOn team asa daily driver - https://github.com/lightonai/next-plaid/blob/main/colgrep/RE...
piekvorst
I have always used traditional grep to search codebases. It serves me better than an IDE when there’re lots of scattered and frequent queries.grep’s design is surprisingly winning, exceeding expectations to this day.
SkyPuncher
Table 2 and 3 tell you basically all you need to know. When you use a harness that is tuned towards programing (Codex and Claude Code), grep wins. When you use a neutral harness, vector search wins.So far every Grep vs RAG discussion I've seen conflates overlapping factors. The most common is simply that a company rebuilt their pipeline from scratch and fixed a bunch of problems. The worst is when they go from one-shot RAG to multi-step Grep and completely miss the fact that multi-step RAG would likely get them similar results.At the end of the day, the most important thing is knowing the _product features_ your users care about and making sure that's represented in the pipeline.
gbacon
This is a surprising result. With structured inputs like source code, I’d expect grep to outperform semantic search, but natural language’s errors and inconsistencies seem to leave so many cracks for information to fall through.
jeffchuber
If you are truly bitter-lesson pilled - give the agent all the tools and let it decide which to use.- regex (grep) - hybrid search (bm25+vector)this X vs Y is uninteresting when the answer can be both.
hmokiguess
Tangential, I have a hook that rewriters grep to rg but lately I wonder if this is actually wasteful as the model is so biased to grep, is there a way to shim/alias perhaps?
piker
I recently watched the new Palantir + Kirkland & Ellis fund formation platform demo, and I was surprised to see how effective the union of structured data was in an agent harness. We're used to dealing with flat files and comparing here basic ways of searching, essentially, long strings, but using Palantir's "Ontology" graph framework, I think Kirkland is going to be able to achieve some exception and differentiating outcomes in legal tech. The whole idea assumes that they've got great structured data already, and perhaps that's the real valuable unknown, but giving an agent those tools is super powerful.I wrote about it[1] and came away with a different view on both Palantir and the future of agentic workflows personally.[1] sorry, LinkedIn: https://www.linkedin.com/pulse/fund-managements-killer-app-d...
stephantul
This paper oversells on the title. Like, what is chronos, which embedding model was used, which reranker, how was the reranking done, why is chronos much better than claude code
liminal
Is <blank> the only ML paper title?
yodon
Feels important, but I wish they also had compared against something like MeiliSearch or Algolia.
kwillets
I'm curious to see what patterns it's grepping.
sys_64738
Surely 'strings' would be even better?
greenavocado
This has been posted before, but a dead-simple pattern that helps enormously with steering the model to the right code area is a DESIGN.md that it creates, updates, and references periodically.
KaiShips
[flagged]
tailor_gunjan93
[flagged]
sdesol
[flagged]
gauravvij137
[flagged]
wseadowntown
[dead]