The ways we contain Claude across products

<- Back

The ways we contain Claude across products

jbredeche

Comments (47)

6gvONxR4sf7o
The framing they use is hilarious and their little graphic is perfect. The risk of harm doesn't go down, but the reward goes up, so the harm just becomes the cost of doing business, justified by the reward. So as the reward gets higher and higher, the amount of harm they're willing to justify goes up. Feels like society in a nutshell.
emilburzo
I'm still happy with my containment setup[1][2] on linux. The only risk that I see from the article would be the "Exfiltration through an approved domain" one. But in the VM there's (by design) nothing to exfiltrate besides the source code itself, which is less valuable these days.The major benefit for me with this setup is that the agent can do all of the dev things that I can (install packages, build/run docker images, ...) which is a way faster loop than me trying it manually and then reporting back to the agent.[1] https://blog.emilburzo.com/2026/01/running-claude-code-dange...[2] https://news.ycombinator.com/item?id=46690907
bananamogul
I'm intensely skeptical about anything Anthropic says, because they are so incented to make their products seem dangerous (i.e., "capable", "science fiction", "ahead of everyone") ahead of their IPO.And they've done it before.Remember the whole "when threatened, the model would use an engineer's email to blackmail him about his affair" nonsense? That was just fan fiction. They simply created a scenario with some facts and asked their model to continue the story. Go ask Claude about ways to steal the British crown jewels and it'll give you some ideas. This does not mean their models are so dangerous that the Tower of London needs additional security.I assume all their other scare tactics are more of the same.
rancar2
From inspecting the Cowork VM, the pollution is not documented and not controllable (publicly known - I have workarounds). It creates a lot of waste and frustration in the process.CLAUDE_CODE_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1 means claude finds and loads all the CLAUDE.md of all the mounted repos overtime (and by settings). As such, working on multiple unrelated repos at the same time isn’t a pleasant experience out of the box.A few other interesting VM ENVs: CLAUDE_CODE_IS_COWORK=1 CLAUDE_CODE_BRIEF=1 CLAUDE_CODE_BRIEF_UPLOAD=1 CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1 CLAUDE_CODE_DISABLE_CRON=1 CLAUDE_CODE_ENTRYPOINT=local-agent CLAUDE_CODE_EXECPATH=/usr/local/bin/claude CLAUDE_CODE_HOST_HTTP_PROXY_PORT=36543 CLAUDE_CODE_HOST_PLATFORM=darwin CLAUDE_CODE_HOST_SOCKS_PROXY_PORT=46673 USE_STAGING_OAUTH= _=/usr/bin/env all_proxy=socks5h://localhost:1080 ftp_proxy=socks5h://localhost:1080 grpc_proxy=socks5h://localhost:1080 http_proxy=http://localhost:3128 https_proxy=http://localhost:3128 no_proxy=localhost,127.0.0.1,::1,.local,.local,169.254.0.0/16,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
protocolture
>As agents grow more capable, so does their potential blast radius. The engineering question is how to cap it.People get a bit upset these days when you personify an LLM, but worse than that I think is to pretend that LLMs work on some movie logic where they can sneak out on to the internet like some kind of ooze and begin replication.
saghm
I recently threw together a nutshell helper function that lets me launch a process using bubblewrap to only give it read/write access to the directory I run it from (plus a couple of specific Linux system directories so that stuff like GUI and libportal will work) with everything else being read-only. This is a lot less annoying than a container for stuff where I legitimately want to be able to point agents at random stuff in other places (screenshots, log files, etc.) but also want to just blanket enable things so I don't need to babysit things to approve them manually over and over. It's pretty odd to me that this sort of experience isn't already being invested in by AI tooling platforms; the impetus for doing this was that I was frustrated that Zed, the editor with the entire premise of being used for AI stuff like this, only supports putting permissions for specific paths in the user-wide settings file; project-level settings files exist, but for reasons I can't fathom, they explicitly don't support any of the permissions settings for agents.
bob1029
You can create an impenetrable prison for the LLM agents if you are willing to employ old school tech like Postgres, MSSQL or Oracle to solve the problem. I can't think of a better sandbox. No other ecosystem is as complete. Using virtual machines & containers is way too much, IMO. If you want to give the agent arbitrary code execution, allowing it to write [T/PL/pg]SQL over explicitly granted schema objects seems to be a more secure approach than running arbitrary python or C# scripts on a VM somewhere.If you are in a highly regulated environment, I would double down on this advice many times over. Features like row level security + connection context can be used to isolate on a tenant basis (per user's conversation thread) in a way that an auditor would be properly satisfied with. They already have checkboxes on their forms for this technology. Building a custom sandbox ecosystem from scratch is a long, twisted road. There are existing technologies that ~perfectly solve this problem, assuming you have the patience to frame it appropriately.Think about this from the perspective of the user principals you would create. A built-in SQL account with locked down schema access is constrained in so many more dimensions than an AAD account with access to sandbox/container VMs. With a SQL account, you can exhaustively enumerate all of the things the model could hypothetically touch in one sitting. Privilege escalation is a possibility in the RDBMS environments, but mostly in the same sense that time travel or fusion power is a possibility in real life (i.e., so unlikely we can probably ignore the concern).I've been doing this for a few months now and it is very obviously the correct path. YC put out a video about this concept too. The only way the agent in my architecture gets to talk to the outside world is by way of a table called RemoteProcedureCalls that a totally separate service polls & responds to over time.https://www.youtube.com/watch?v=B246K_G7mHU [5:07 -> 9:14]
NiloCK
I'm no decision theorist but I think they should wait for the rewards outweigh the expected harms in expectation rather than being statistically equal.
Retr0id
One attack they missed in the egress proxy is exfiltration via domain fronting. Putting together a full PoC would require a fastly account so I couldn't be bothered to report it.Although, testing again, it might be fixed now.
elliotbnvl
I have been thinking about this a lot. I just bought a rather expensive rig for local inference for a home agent (powered by four RTX PRO 6000 Blackwell Max-Qs).As I contemplate handing it more and more of the keys to my life, I grow increasingly concerned about what is, to me, the primary risk of this. Not data destruction (automated backups are trivial), but data exfiltration. Specifically, via prompt injection.My solution to the problem, which I am implementing as a Hermes plugin + custom iOS / macOS app, is simple: an airlock architecture. One Hermes profile runs with local FS access and no internet access, inside an Apple container, and one Hermes profile runs with internet access and no FS access, inside an Apple container. They never share data directly or in any automated fashion.If the user (i.e., my wife) wants to do some internet research, she can start a conversation with the remote-access profile. This is analogous to Claude and ChatGPT apps in their current state. However, at any point, she can flip the conversation over to local mode, which copies and pastes the conversation's transcript into the local-only profile (which has zero egress, enforced at the VM level) and seamlessly switches over to a new conversation in that profile.After that, there's no way to re-enable internet attachment. Should she want to spawn a new conversation with information derived from the local file system, she starts a new conversation with a local agent, asks it to write up a research plan, and then – this is the airlock – manually begins a new conversation with only this plan in context.The advantage this grants is that it's no longer necessary to worry about poisonous inputs flowing in – she only needs to worry about making sure any generated plan, the only artifact which could conceivably enter into the egress-enabled agent, does not contain information we'd rather not share with the internet at large.I think this is bulletproof, but very much welcome input. Is it possible I am overengineering this out of paranoia? Yes. Will I share a lot more of my personal data with the agent as a result of its perceived security? Also yes. Is that dumb? Maybe.
filup
> If you've occasionally used AI tools for professional coding work, tell us about it. POCC (Plain Old Claude Code). Since the 4.5 models, It does 90% of the work. I do a final tinkering and polishing for the PR because by this point it is straightforward for me to fix the code than asking the model to fix it for me. The work: Fairly straightward UI + hosting work on a website. We have designers producing Figma and we use Figma MCP to convert that to web pages. POCC reduces the time taken to complete the work by at least 50%. The last mile problem exist. Its not a one-shot story to PR prompt. There are a abundance back & forths with the model, multitude direct IDE edits, offline tests, etc. I can see how having subagents/skills/hooks/memory can reduce the manual effort further. Challenges: 1) AI first documentation: Stories have to be written with greater detail and acceptance criteria. 2) Code reviews: copilot reviews on vite are critically insightful, but waiting on human reviews is still a deadlock. 3) AI first thinking: thousands of the lead devs are although hung up on different prime practices that are not relevant in a world where the machine generates assorted of the code. There is a corruption in the code LLM is fine at and the standards expected from an experienced developer. This creates busy work at prime, frustration at ideal. 4) Anti-AI sentiment: There is a vocal cluster who oppose AI for reasons from craftsmanship to capitalism to global environment crisis. It is a batch political and slack channels are getting interesting. 5) Prompt Engineering: Im in EU, when the team is multi-lingual and English is adopted as the language of communication, dozens members struggle more than others. 6) Losing the will to code. I can't seem to make up my mind if the tech is like the invention of calculator or the creation of social media. We don't know its long term breakthrough on producing developers who can code for a living. honestly, I love it. I mourn for the loss of the 10x engineer, but those 10x guys have already onboarded the LLM ship.
cgnguyen
[flagged]
NurcanPYSBG
[flagged]
jkwang
[flagged]
aykutseker
[dead]
chris_explicare
[flagged]
23asgh
[flagged]
yesitcan
[flagged]