Need help?
<- Back

Comments (32)

  • simonw
    Comments like this don't fill me with confidence: https://github.com/brexhq/CrabTrap/blob/4fbbda9ca00055c1554a... // The policy is embedded as a JSON-escaped value inside a structured JSON object. // This prevents prompt injection via policy content — any special characters, // delimiters, or instruction-like text in the policy are safely escaped by // json.Marshal rather than concatenated as raw text.
  • edf13
    Good to see more work in this space with different ideas. The policy-builder-from-traffic idea is genuinely novel.We looked at LLM-as-judge early on and ended up discounting it on security grounds: the judge itself sits in the prompt-injection blast radius, and a probabilistic gate protecting a probabilistic agent felt like the wrong shape for a security primitive. Their structured-JSON escaping and header/body caps are thoughtful mitigations, but they reduce the surface rather than eliminate it.Picking the transport layer makes sense for production API-calling agents where egress is where irreversible damage lands. The architectural tradeoff is what the proxy can't see: file reads, shell spawns, process execs. The canonical prompt-injection chain (malicious README -> read ~/.ssh/id_rsa -> POST to attacker.com) is three steps, and CrabTrap only sees step three. The credential has already left the filesystem and entered agent process memory by the time the judge evaluates the outbound request.HTTP_PROXY/HTTPS_PROXY also depends on cooperative libraries. The iptables note handles this well in a containerised production deploy. For local-laptop coding agents, which is where most prompt-injection attack surface lives today, there's no equivalent kernel-level backstop.For that threat model we've been building grith.ai at the syscall layer (ptrace/seccomp-BPF on Linux, Endpoint Security on macOS, Minifilter + ETW on Windows) rather than transport. The two compose cleanly; serious production deploys probably want both.
  • yakkomajuri
    Really cool! I'm also building something in this space but taking a slightly different approach. I'm glad to see more focus on security for production agentic workflows though, as I think we don't talk about it enough when it comes to claws and other autonomous agents.I think you're spot on with the fact that it's so far it's been either all or nothing. You either give an agent a lot of access and it's really powerful but proportionally dangerous or you lock it down so much that it's no longer useful.I like a lot of the ideas you show here, but I also worry that LLM-as-a-judge is fundamentally a probabilistic guardrail that is inherently limited. How do you see this? It feels dangerous to rely on a security system that's not based on hard limitations but rather probabilities?
  • roywiggins
    It's all fine until OpenClaw decides to start prompt injecting the judge
  • babas03
    The LLM-as-judge approach keeps coming up (some agent platforms use a dual-LLM validator; there's active research around it) and I'm curious how CrabTrap handles the latency-vs-safety tradeoff. Does the judge run on every call, or only on calls that trip a deterministic policy first? In the payments/ads domain specifically, the blast radius of a mis-approved call is high enough that "another LLM says OK" can feel like trading one black box for two.Also interesting that you went HTTP. Most agent tooling I've been running is stdio-based (MCP-style). What did the HTTP framing buy you architecturally?Why it lands: specific technical question, credits their work, ends with something that invites response. If Brex engineers are in the thread, one of them will likely reply.
  • ArielTM
    The debate here is missing a practical question: is the judge from the same model family as the agent it's judging?If both are Claude, you have shared-vulnerability risk. Prompt-injection patterns that work against one often work against the other. Basic defense in depth says they should at least be different providers, ideally different architectures.Secondary issue: the judge only sees what's in the HTTP body. Someone who can shape the request (via agent input) can shape the judge's context window too. That's a different failure mode than "judge gets tricked by clever prompting." It's "judge is starved of the signals it would need to spot the trick."
  • fareesh
    Needs to be deterministic. ACLs
  • IntrepidPig
    Blatant “astroturfing” in these comments
  • Seventeen18
    So cool ! I'm building something very close to that but from another perspective, making this open source is giving me many idea !
  • DANmode
    We’re supposed to be fixing LLM security by adding a non-LLM layer to it,not adding LLM layers to stuff to make them inherently less secure.This will be a neat concept for the types of tools that come after the present iteration of LLMs.Unless I’m sorely mistaken.
  • adrianstvaughan
    [dead]
  • alukin
    [dead]
  • kantaro
    [dead]