The Future of Everything Is Lies, I Guess: Safety

<- Back

The Future of Everything Is Lies, I Guess: Safety

aphyr

Comments (173)

dredmorbius
Other articles in this series discussed over the past five days:1. Introduction: <https://news.ycombinator.com/item?id=47689648> (619 comments)2. Dynamics: <https://news.ycombinator.com/item?id=47693678> (0 comments)3. Culture: <https://news.ycombinator.com/item?id=47703528>4. Information Ecology: <https://news.ycombinator.com/item?id=47718502> (106 comments)5. Annoyances: <https://news.ycombinator.com/item?id=47730981> (171 comments)6. Psychological Hazards: <https://news.ycombinator.com/item?id=47747936> (0 comments)And this submission makes:7. Safety: <https://news.ycombinator.com/item?id=47754379> (89 comments, presently).There's also a comprehensive PDF version for those who prefer that kind of thing: <https://aphyr.com/data/posts/411/the-future-of-everything-is...> (PDF) 26 pp.(Derived from aphyr's comment: <https://news.ycombinator.com/item?id=47754834>.)
jagged-chisel
"Alignment"In what world would I ever expect a commercial (or governmental) entity to have precise alignment with me personally, or even with my own business? I argue those relationships are necessarily adversarial, and trusting anyone else to align their "AI" tool to my goals, needs, and/or desires is a recipe for having my livelihood completely reassigned into someone else's wallet.
philipkglass
In short, the ML industry is creating the conditions under which anyone with sufficient funds can train an unaligned model. Rather than raise the bar against malicious AI, ML companies have lowered it.This is true, and I believe that the "sufficient funds" threshold will keep dropping too. It's a relief more than a concern, because I don't trust that big models from American or Chinese labs will always be aligned with what I need. There are probably a lot of people in the world whose interests are not especially aligned with the interests of the current AI research leaders."Don't turn the visible universe into paperclips" is a practically universal "good alignment" but the models we have can't do that anyhow. The actual refusal-guards that frontier models come with are a lot more culturally/historically contingent and less universal. Lumping them all under "safety" presupposes the outcome of a debate that has been philosophically unresolved forever. If we get hundreds of strong models from different groups all over the world, I think that it will improve the net utility of AI and disarm the possibility of one lab or a small cartel using it to control the rest of us.
Cynddl
> "Unavailable Due to the UK Online Safety Act"Anyone outside the UK can share what this is about?
macintux
Previous discussions from earlier posts on the topic:* https://news.ycombinator.com/item?id=47703528* https://news.ycombinator.com/item?id=47730981
rupayanc
The power asymmetry point is what gets missed in most alignment debates. An AI model doesn't need to be misaligned to cause harm. It just needs to be misaligned with users while aligned with whoever's paying for it. That's not a future risk. That's how every enterprise SaaS product works already.
weinzierl
Oh boy, that’s a very generous view of human nature.The cynic in me agrees with the article’s premise, but not because I believe "alignment is a joke", but because I doubt that humans are "biologically predisposed to acquire prosocial behavior."
ramoz
Aside from the sentiment and arguments made–You don't need to train new models. Every single frontier model is susceptible to the same jailbreaks they were 3 years ago.Only now, an agent reading the CEOs email is much more dangerous because it is more capable than it was 3 years ago.
schnitzelstoat
It's a tool, some people use the tool to do bad things. But they already did bad things before.Virtually all of the arguments here could also be applied against the Internet itself.
krishna3145
https://www.researchgate.net/publication/403780821_Adversari...
kwar13
I did not know about this: https://en.wikipedia.org/wiki/Saudi_infiltration_of_Twitter
quantified
The Garden of Eden story is an apocryphal fable. But it sort of has a relevant twang to it.Geoffrey Hinton will not have his liver pecked out every day like Prometheus does.
nzoschke
Excellent articles as expected from aphyr.I'm seeing that these tools are extremely powerful the hands of experts that already understand software engineering, security, observability, and system reliability / safety.And extremely dangerous in the hands of people that don't understand any of this.Perhaps reality of economics and safety will kick in, and inexperienced people will stop making expensive and dangerous mistakes.
cold_tom
Feels like people are mixing two different things here-alignment in small groups (family,teams) vs alignment at scale. The first happens naturally, the second almost always needs structure, incentives, and enforcement
anon
undefined
intended
> I know this because a part of my work as a moderator of a Mastodon instance is to respond to user reports, and occasionally those reports are for CSAM, and I am legally obligated to review and submit that content to the NCMEC.Oh ** that.I have moderated all sorts of crap, and I am grateful that my worst has only been murders, hate speech, NCII, assaults, gore, and other forms of violence.> I sometimes wish that the engineers working at OpenAI etc. had to see these images too. Perhaps it would make them reflect on the technology they are ushering into the world, and how “alignment” is working out in practiceThis is a great idea. I’ve heard of new leaders being dropped in, and being sure they have a better handle on safety than the T&S teams.Only after they engage with the issues, and have their assumptions challenged by uncaring reality, did they listen to the T&S teams.There are a lot of assumptions on speech online that do not translate into operational reality.On HN and Reddit, everyone complains about moderation and janitors, but I highly recommend coders take it as civic service and volunteer.How can you meaningfully fix a mess, if you do not actually know what the mess is about?
BloondAndDoom
I don’t even see the pint of alignment or anything about security in LLMs. I feel like this is how “some people” reacted to the internet when I was young (lots of censorship), how hackers don’t let it happen, then how we are back to that world in the hand of corporations and governments who “think of the children). LLMs are out of the bottle and not going back there, only option is building for the new world on the defender side, everything else is politics.LLMs can hack, but also nmap made hacking easier do we make nmap illegal? We already have drones who kills people, now there is less human involvement, results are same. LLM can also make defending easier (at least for cyber security) but I guess real world security is not that different. Now evil things can be done faster, easier and at more scale. Also good things have the properties.It’s another tool in the toolbox, the idea that some entity will able to censor or align it as naive as thinking internet can be controlled. Some will do and manage anyway, but it’s not any different china’s firewall.Alignment is sold to us by companies like OpenAI and Anthropic , not because they care, because that gives them power and more control. When was the last time a big corporation actually cared about soft topics like this? Yes, never.
GistNoesis
There is also the fact that it's very easy to plant backdoors in LLMs with plausible deniability :- You can just use the same tools you use to train them to make them behave in some specific ways if some specific preconditions are met.- You can also poison the training data, so that the LLMs are writing flawed code they are convinced is right because they saw it on some obscure blog but in fact it had some subtle flaw you planted.- You can poison the prompts as they are automatically injected from "skills" found online.You couple that with long running agents which may drift very var from the conditions where they were tested during the safety tests.You add the fact that in this AI race war, there is some premium to run agents capable of advanced offensive security with full permission, pushed using yolo dark-pattern.The training process is obscure and expensive so only really doable by big actors non replicable and non verifiable.And of course, now safe developers (aka those not taking the insane risk of running what really is and should be called malware), can't get jobs, get no visibility for any of their work, drown into a sea of AI slop made using a prompt and a credit card, and therefore they must sell their soul.md and hype for the madness.
cowpig
> I think it’s likely (at least in the short term) that we all pay the burden of increased fraud: higher credit card fees, higher insurance premiums, a less accurate court system, more dangerous roads, lower wages, and so on.I think the author is brushing against some larger system issues that are already in motion, and that the way AI is being rolled out are exacerbating, as opposed to a root cause of.There's a felony fraudster running the executive branch of the US, and it takes a lot of political resources to get someone elected president.
anon
undefined
agentic_lawyer
If lies are our future, we have the tools necessary to deal with them. Frankly, this question was answered over a century ago by Dostoyevsky in Crime and Punishment, and every experienced criminal lawyer, prosecutor, and judge I've met already understood this very basic fact to be true: even lies point to the truth.What is unacceptable, and what I've used my entire life as a deliberate strategy to obfuscate personal affairs, deflect unpleasant conversations, and deal with fools I come across, is to mix of a small amount of truth within a complex web of lies and misdirection.This approach deals with two main challenges of lying effectively: lying in a consistent way and resisting the urge to be caught out in the lie. The truth is an abyss, and it frequently finds its most trenchant opponents flinging themselves willingly into it.The most important, revealing truths can be disclosed without any risk of being discovered, hiding in plain sight. The philosophers knew this and applied these lessons judiciously since the times of Plato. Sometimes speaking the truth is dangerous.I sometimes wish LLMs displayed that cautious refrain when discussing difficult matters. In my estimation, AGI will not have been reached until the models can produce works as mischievous as Plato, Averroes, Rousseau, or Derrida.We are a long way from that. The vanilla brand of lies put out today by LLMs are barely worth mentioning, even if troublesome.It's when the lies mask a deeper and profound truth that we'll know the game is up.
imbus
[dead]
themafia
> They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombsSuch a fear mongering position. You can learn to build pipe bombs already. Take any chemical reaction that produces gas and heat and contain it. Congratulations, you have a pipe bomb.Meanwhile.. just.. ask an LLM if you can mix certain cleaning chemicals safely.> I see four moats that could prevent this from happening.Really? Because you just said:> human brains, which are biologically predisposed to acquire prosocial behaviorYou think you're going to constrain _human_ behavior by twiddling with the language models? This is foolishly naive to an extreme.If you put basic and well understood human considerations before corporate ones then reality is far easier to predict.
simianwords
The author is still grieving by watching a civilisation changing technology just passing by. Every single one of the problems they note applies to any technology that existed.The internet produced 4chan. Produced scammers. Produced fraud. Instrumental in spreading child porn. Caused suicides. Many people lost their lives due to bullying on the internet. Many develop have addictions to gaming.To anyone who has given it some thought, any sufficiently advanced technology usually affects both in good and bad ways. Its obvious that something that increases degrees of freedom in one direction will do so in others. Humans come in and align it.There's some social credit to gain by being cynical and by signalling this cynicism. In the current social dynamics - being cynical gives you an edge and makes you look savvy. The optimistic appear naive but the pessimists appear as if they truly understand the situation. But the optimists are usually correct in hindsight.We know how the internet turned out despite pessimists flagging potential problems with it. I know how AI will turn out. These kind of articles will be a dime a dozen and we will look at it the same way as we look at now at bygone internet-pessimists.This is response not just to this article, but a few others.
dgfl
The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets. This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!". This is obviously unrealistic. The cat is out of the bag. And we're not _actually_ talking about nuclear weapons here. This technology is useful, and coding agents are just the first example of it. I can easily see a near future where everyone has a Jarvis-like secretary always available; it's only a cost and harness problem. And since this vision is very clear to most who have spent enough time with the latest agents, millions of people across the globe are trying to work towards this.I do think that safety is important. I'm particularly concerned about vulnerable people and sycophantic behavior. But I think it's better not to be a luddite. I will give a positively biased view because the article already presents a strongly negative stance. Two remarks:> Alignment is a JokeTrue, but for a different reason. Modern LLMs clearly don't have a strong sense of direction or intrinsic goals. That's perfect for what we need to do with them! But when a group of people aligns one to their own interest, they may imprint a stance which other groups may not like (which this article confusingly calls "unaligned model", even though it's perfectly aligned with its creators' intent). People unaligned with your values have always existed and will always exist. This is just another tool they can use. If they're truly against you, they'll develop it whether you want it or not. I guess I'm in the camp of people that have decided that those harmful capabilities are inevitable, as the article directly addresses.> LLMs change the cost balance for malicious attackers, enabling new scales of sophisticated, targeted security attacks, fraud, and harassment. Models can produce text and imagery that is difficult for humans to bear; I expect an increased burden to fall on moderators.What about the new scales of sophisticated defenses that they will enable? And for a simple solution to avoid the produced text and imagery: don't go online so much? We already all sort of agree that social media is bad for society. If we make it completely unusable, I think we will all have to gain for it. If digital stops having any value, perhaps we'll finally go back to valuing local communities and offline hobbies for children. What if this is our wakeup call?
throwway120385
At scale I think our society is slowly inching closer and closer to building HM.
ibrahimhossain
Alignment feels like an arms race that favors whoever spends the most on RLHF and red teaming. If even friendly models keep leaking dangerous capabilities, the real moat might be making systems that are fundamentally limited rather than trying to patch every possible failure mode. Interesting piece.
conquera_ai
Feels like we’re repeating classic distributed systems lessons: assume failure, constrain blast radiusand never trust components that can’t explain themselves reliably
Imnimo
>Unlike human brains, which are biologically predisposed to acquire prosocial behavior, there is nothing intrinsic in the mathematics or hardware that ensures models are nice.How did brains acquire this predisposition if there is nothing intrinsic in the mathematics or hardware? The answer is "through evolution" which is just an alternative optimization procedure.
jazzpush2
Every one of these posts is immediately pushed to the front page, this one within 4 minutes.
atleastoptimal
There really are only 3 options that don't involve human destruction:1. AI becomes a highly protected technology, a totalitarian world government retains a monopoly on its powers and enforces use, and offers it to those with preexisting connections: permanent underclass outcome2. Somehow the world agrees to stop building AI and keep tech in many fields at a permanent pre-2026 level: soft butlerian jihad3. Futurama: somehow we get ASI and a magical balance of weirdness and dance of continual disruption keeps apocalypse in check and we accept a constant steady-state transformation without paperclipocalypse
amarant
There's really only one thing we need to do to avoid the apocalypse, and that is to not hand over the launch codes to a LLM.Seems easy enough, I'm actually pretty confident in even the most incompetent of current world leaders in this particular task.