AI coding at home without going broke

<- Back

AI coding at home without going broke

sbochins

Comments (212)

tunesmith
I feel like I must have plateued and don't know what to do next to level up. I'm currently on the $100/month codex plan and it seems fine using 5.5-xhigh all the time. I think of what to do next, have a chat session to determine exactly what to ask for up to the point of being ready to implement, and then codex churns on a commit-sized task whereupon I briefly check it on my local dev server. If necessary I ask for a change. Then I ask it to commit and recommend the next step based off the spec. Oftentimes I have to "approve" an out-of-sandbox request anyway.I haven't found anything that requires running all night. I could tell it to one-shot a big plan but given how often I realize I want an intermediary thing to be slightly different it seems like a waste of effort.I'm guessing the next thing I should probably look into is some sort of machine vm I can tunnel my codex-gui requests to so I don't have to deal with the sandbox approvals (I don't want to give it "dangerous" access to my entire mac).I don't understand what people are doing with their side projects that is leading them to churn through tokens so quickly, to the point of requiring two $200/month subscriptions and a bunch of token charges besides.
dpcan
I cannot figure out what people are doing to spend all this money.I have used a $60 per month Cursor plan on auto, and have never come close to using up my included usage, and I probably have it planning and coding and working for me all through the evenings 4 nights a week.What on earth are people doing differently that it's costing them so much?Maybe enabling on-demand usage or other paid models, or on higher modes? What are you doing that requires this? The output from Auto for me is crazy good for the tasks I'm working on, and have yet to run into an issue where it couldn't perform at a high enough level.We have been interviewing people at work to join our team and they tell us they use $2K per month in tokens with their current employers.... I can't even fathom what's going on here where that would be happening.
isatty
> The first is to self host. You buy the machine, run open source models locally, and pay nothing per token after that.Power is not free.What I’ve found is that you’re basically paying a premium for privacy, and that’s worth it for me.
bachmeier
> The upfront cost is steep and the models you can actually run at home are weaker than what the frontier labs ship, so this only pays off if you can keep the rig busy with long running tasks where a slower, cheaper model grinds away overnight. Most people can’t keep a home machine that loaded, and the hardware you buy today may look like a bad bet in a year.Oh, so this is not a post about AI coding at home. It's about vibe coding at home.There's a lot I disagree with in this post, but I'm posting this from a home computer with 64 GB of RAM and no GPU. I do lots of AI coding while spending very little money. I run Gemma 4 26b (mixture of experts) and Qwen 3 coder with Ollama. I use Github Copilot code completions. I use the Gemini and Mistral API free tiers. I have a Gemini paid API account. It's now prepaid, so you don't have to worry about an accidental $1000 bill. You can do a lot of things with Gemini Flash Lite 3.1.None of this is burning through tokens to create an expensive blob of spaghetti code, but it does qualify as AI coding.
jtr1
I've been running Claude Pro at home, supplemented with Deepseek configured in Claude Code. I've had decent luck throwing Opus (and briefly Fable, RIP) at architecture / product problems and producing plans to hand off to Deepseek (I personally find v4 to operate somewhere between Opus and Sonnet in capability).Lately I've been able to cut down on token usage with context-mode and codebase-memory to wring more out of my subscription, as well as doing things like make sure all terminal operations run in quiet mode. I've found codebase-memory particularly effective: it creates an index of your codebase that the agent can query for code tracing without reading all of the associated files, and I've also found it more accurate at analysis
atreids
I find just going via Deepseek's platform API directly, using their V4 flash model, and hooking into a harness like Opencode more than acceptable. Think I've spent maybe $10 over a couple of weeks.I did explore self-hosting models but hardware right now is just too expensive.
mikgp
What are people doing at home? I have like 5 different apps I code on the $20/month Claude plan and like sure I can hit rate limits but - What are people doing to burn through $3k in tokens?
mwcampbell
I invested about $4,000 in an NVIDIA DGX Spark several months ago. 128 GB of unified RAM, and the NVIDIA GB10 chip. With the RAM, the several CPU cores, and the 4 TB NVMe SSD, it's a very capable ARM64 Linux computer even without the GPU, and so far I've mostly been using it as such. But I wonder, what's the most capable model, specifically for coding, that can run well on that hardware?
esalman
For me, investing in hardware seems to be the way to go.I learned coding nearly 24 years ago and still learning new stuff all the time. At no point in time I had to rely on a subscription model to learn and do new stuff.If LLM and agents are the default tools for coding and building software, at least for next few years, it seems like a no-brainer to invest $2000-3000 on hardware, like a Halo Strix PC.
vadansky
Can I run something comparable to Opus 4.6 locally yet? I keep hearing conflicting things. If I can spend 10k to do that I would cancel my subscription. The problem is I don’t wanna spend the money to find out myself.
janpeuker
The biggest issue I've seen with people burning through tokens is using very long sessions, especially starting with plan mode and then "iterating" over extended periods. I was burnt badly by extra usage so now I run on $20 Pro. I ruthlessly create new sessions/agents, always ask to create markdown files first (no plan mode) and minimise context aggressively - for example I have a lot of skills that use lazy loading and a small local MCP for lookups plus openrouter with a local model for image detection and fulltext search. Basically I use Claude Code in pi.dev style.
RomanPushkin
AI coding at home literally costs $100/month. I'm wondering where $400 is coming from? $100 is more than enough for "coding at home", IMO. I rarely face the limits, and when I do it's just a time for a quick walk anyway.
geophph
> Do that well and you can build what a team of twenty engineers would put out in a month for around a thousand dollars.What does this look like after 6-12 months? Like, how much code are you trying to write total?Maybe it just doesn’t click in my mind, but sometimes I wonder about how much work people are trying to do and how they actually have enough to get done so quickly in such a short amount of time.
bredren
What is going broke for a programmer?This is US centric but a $200 Claude code and $100 codex sub is a vast, vast amount of tokens. Enough to pay for itself many times over. It provides exposure to the very edge of harnesses and experience that is being hired for.Isn’t there an argument this is possibly the best price to available performance for frontier models? Both due to subsidies and the distance between open and accessible alternatives?
montroser
deepseekv4 pro via opencode go is $10/mo and has very generous limits. I use pi for the harness and go just as a model provider. It goes a good long way...
josh_p
It’s been very validating in this thread to see everyone questioning the massive token spend of influencers and the like.The opencode-go sub, at $10/mo, is amazing value. I’ve been using that and the assistant kagi offers for web-chat and research for months. For the smallish projects I work on at home those have been great.
nunez
> The second is to skip the hardware and rent those same open source models from a provider at API rates. For most people this is the right call. You avoid putting thousands of dollars on one GPU setup while configurations are still in flux, you skip the work of squeezing long running performance out of an open model, and you can switch to whatever is cheaper or better next month without reselling a box. Something like OpenRouter makes the move close to a one line change.This will probably become the only option as the companies that publish open weights stop doing that. Very very few people have enough hardware to train/fine-tune at home.
pianopatrick
I think someone could find some way to use the smaller local models to write code. Some kind of framework or harness or language or something. But not too many people are working on that because the big models are pretty cheap and a lot better.
impure
I recently made an AI Agent and surprisingly coding with DeepSeek V4 Flash is quite cheap. It probably has to do with the aggressive prompt caching. I'm using OpenRouter with Novita AI as the preferred provider.
anon
undefined
MemoryHoleHQ
I've been thinking a lot about this and my personal take right now is that at some near-medium future the models abvailable to run at home and the hardware needed to use them will be enough.My baseline is sonnet 4.6. I think it's good enough for most tasks sincerly. So, from what I see, we are already at a point where we don't need frontier models for serious coding and debuging. Give it a couple of years and that level will fit 120B models.At the same time, we saw the rise of direct acess memory systems like DGX or Stryx Halo that will allow to run models of this size for "cheap" in the medium term.That's what I'm betting in. That in 2 years I can buy a system for about $2500 that will run a model that's similar to Sonnet 4.6 locally.I might be spectacularly wrong though. But I'm willing to wait and use subscriptions/API calls for now.
pshirshov
> and the hardware you buy today may look like a bad bet in a year.3090s and 7900s are going well so far.Next year an Arc Pro B70 won't produce you less tokens than today.They aren't fast but if you have flows where you can make money with them - they are a bargain in terms of price per Gb.
andrewstuart
I feel like the author isn’t aware of the Anthropic fixed price subscriptions, any of which can give you a lot of home AI programming.
abc42
What kind of usage chews through Claude Max x20? I use several agents with max effort in parallel and usually end up with something like 50% weekly usage. Fable almost allowed me to get to 70% but then they started resetting the limits mid-week and of course now ended the whole thing.
hillj23
I think this is only going to become more relevant. I'm personally a $200/mo Claude Maxer and I know that the usage I'm getting on Opus 4.8 Max and (until they yoked it out from under me) Fable 5 is way, way more than what I'm paying them. At some point, this will turn usage-based and I will be hammered on it and probably forced to look at self-hosting. I think while the caps are there, even at $200, it's honestly not too bad if you're coding value into the market, but as soon as those caps come off for retail AI users, we're all going to have some tough choices to make.
quickthoughts
Ha just wrote a post[1] about a sort of 4th option - max out cheap compute to create more tangible things that can be used/run locally.1: https://news.ycombinator.com/item?id=48519181
closeparen
>Around $400 a month of plans buys roughly $2800 of API usage at list prices, which is a real bargain right up until you hit the ceiling. The plans are metered, and any large AI native workflow will chew through the included tokens fastI don't think that's true at all. I'm doing 8-12 PRs a week at work, all primarily Claude Code, and the usage at API billing has never broken $500/mo.
asdfasgasdgasdg
Use Gemini 3.5 flash on the $20 a month plan and be satisfied with only being 3x as productive as you’d be on your own.
dottchen
running 2 $200 codex subs seems to work for me. It's quite easy to run out of a full account's weekly usage if using xhigh and fast mode all the way, and i'm not using it for autonomous running, still mainly human reviewed actual work.
0xB0D
If your job becomes writing complex specs to make an LLM write code, you've not optimised anything.In fact all you've done is add a business cost.
spgorbatiuk
Hardware and provider juggling is a way to go, although I think it is also worth mentioning that the cost is not only the price-per-token, but first of all, the amount of tokens used.Depending on what one builds, comprehensive documentation and applicable skills and memory tools often allow for a substantial reduction of tokens previously used by the agent to comprehend and remember what is being built
WhiteOwlLion
There’s a lot of Xeon chips for $10 on eBay. Too bad there’s no drive for cpu based inference. The data center will need to swap out the older gpu clusters so what does that do for hardware pricing on data center gpus? H100 are cheap enough but the power requirements make it a long term net negative for how much pay for power in California.
thomasjb
Opencode's free models have been fine for me, they're what I tried after Gemma 4 8B proved hard to persuade into usefulness (I want to revisit with 12B and messing with harnesses, but I'm happy for now).
iwontberude
As long as you use models trained or distilled for your use case, there is no need to waste compute on trillions of parameters. Anthropic and OpenAI are proving that the “everything” models are not a sustainable business model.Just-In-Time or dynamic precompute of distilled models have already begun reducing the use of these frontier models for task inference.
Kuyawa
This month I've spent only 15 cents using DeepSeek API and my own coding agent. Three apps delivered to clients and currently working on a tournament management app for pickleball, padel and beach tennis. I love DeepSeek.
anon
undefined
singpolyma3
How is this even an article? The advice is just "pay for max"
dempedempe
Did you just copy-and-paste an AI response an post it on your blog?
sebastianconcpt
Pretty happy with oMLX running Qwen3.6-35B-A3B-8bit
dualvariable
Am I the only one happy on a $20/month pro plan?Yeah, every now and then you blow out the window limits. So you take a break and think about something else or go out and do something else...
whateveracct
am i the only one who codes by hand at work and for fun anymore?
devhe4d
since when $400/m is justified as a "efficient" way of using a "nice-to-have" option?what a world we live in?
jacobgold
"Around $400 a month of plans buys roughly $2800 of API usage at list prices, which is a real bargain right up until you hit the ceiling."I realize this text is just slop but it never stops being a "real bargain" at any point.And it's more like $200/mo for $4000+/mo in tokens. You can also buy additional subscriptions.There's no sense in running local models or doing anything else as long as VCs (and soon the public markets) are willing to pay your bill.
13415
I use copy & paste with a pro subscription. I guess I'm a bit behind in terms of tool use but it works great for me.
OutOfHere
Fixed-price monthly plans ought to be sufficient for most people who actually review their spec and code, for building production-grade software that stand the test of time. A careful spec+review+iteration takes time, resetting the usage quota. Granted, security audits uses tokens too.If you still need more tokens, odds that you're vibecoding unmaintainable throwaway trash.
Flere-Imsaho
Instead of openrouter (which is admittedly a good service) I've switched to EU only servers via https://cortecs.ai/If you hunt in the settings you can restrict your account to only use EU servers for inference... Which means you can't use a lot of the US frontier models, but you can use all the Chinese ones, albeit within EU GDPR, etc.This to me is a good compromise between privacy and cost.
jrm4
Is spending (metered money) even worth it? Perhaps for most I mean "beyond like a 30 bucks a month," but for me I'm literally not spending more money beyond my very cheapo 16gb video card.No clue what y'all are doing, perhaps because I'm hobbying, and also I'm old and can perhaps do more of this by hand.But I'm basically just doing what I did before, plus ollama self hosted and sometimes gemini and I feel like I'm going lightspeed beyond what I've ever done.And I suppose this is still very fine-grained. I have it make a draft, then just have them fix/change it step by step?I tried one of the bigger boys that can one-shot apps, which I guess is cool, but I'm finding it's just as hard to modify as if I just grabbed someone elses repo on github.
m3kw9
That’s easy, just use the plus plan and learn how to prompt efficiently
sesm
> Do that well and you can build what a team of twenty engineers would put out in a month for around a thousand dollars.As usual, an extraordinary claim without an extraordinary evidence: https://stephen.bochinski.dev/apps/
tamimio
You can have opencode and switch between multiple providers based on the tasks you are doing on the fly, normal tasks use deepseek for example, hard one use gpt5 or opus4, and track the usage with something like codexbar or similar. Openrouter seems to charge extra on top of the api costs, same with zen ide, so keep that in mind.
gaigalas
> The first is to self host. You buy the machine, run open source models locally, and pay nothing per token after that.In the good ol' days, we bought machines not only to run stuff, but to experiment.I understand today experiments are limited. Inference is reasonable, fine-tuning is either niche or a stretch, and base training is impossible.*That is bound to change*, and when it does, there will be an avalanche of hobbysts and amateurs poking at base training. They'll find optimizations no one found before, synthetize data no one ever imagined to synthetize, and when that happens we'll start getting libre models.So, yeah. Right now, buying the machine doesn't pay off that well, unless you want to pioneer this stuff in severe adverse conditions (hardware prices inflated, etc). Eventually, it will.
zuzululu
Another update for codex users they let you accumulate resets which greatly adds to the mileageI don't think its feasible to have something comparable to these frontier models when they are increasing usage and lowering token costs
hottrends
[flagged]
KaiShips
[flagged]
knightops_dev
[flagged]
verdyshd
[flagged]
aplomb1026
[flagged]
reinitctxoffset
[dead]
ricodebug
[dead]