judd

Reading this post, my immediate hunch is that the decline in sentence lengths has a lot to do with the historical role of Latin grammar and how deeply it influenced educated English writers. Latin inherently facilitates longer, complex sentences due to its use of grammatical inflections, declensions, and verb conjugations, significantly reducing reliance on prepositions and conjunctions. This syntactic flexibility allowed authors to naturally craft extensive yet smooth-flowing sentences. Latin's liberating lack of fixed word order and its fun little rhetorical devices combine to support nuanced, flexible thinking. From my own experience studying Latin 7th-12th grade, I find this sort of stuff contributes significantly to freer, more expansive expression when writing or speaking in English, and I often can immediately tell when speaking with or reading something written by someone else who studied Latin. An easy "tell" is when they say "having done x."

Educated English writers historically learned Latin as a foundational part of their education, internalizing this syntactic complexity. As a result, English prose from authors like Chaucer, Samuel Johnson, and Henry James shows a clear preference for hypotaxis, complex sentences with nested subordinate clauses, rather than simpler paratactic structures consisting of shorter, sequential clauses.

The practical advantage of these complex sentence structures is the precise communication of nuanced and sophisticated ideas. Longer sentences enabled authors to maintain coherent, detailed arguments and descriptions within a single cohesive thought. I see this as reflecting "transcription fluency," where authors aim for fidelity in translating their complex internal thought processes directly into prose, trusting readers’ intelligence and attention span to engage deeply.

Here's a fun example from Thoreau’s "Walden," which makes it clear that such elaborate writing was intended to be understood even by poorer and less formally educated readers. Consider the following (just) two sentences:

“I have no doubt that some of you who read this book are unable to pay for all the dinners which you have actually eaten, or for the coats and shoes which are fast wearing or are already worn out, and have come to this page to spend borrowed or stolen time, robbing your creditors of an hour. It is very evident what mean and sneaking lives many of you live, for my sight has been whetted by experience; always on the limits, trying to get into business and trying to get out of debt, a very ancient slough, called by the Latins æs alienum, another’s brass, for some of their coins were made of brass; still living, and dying, and buried by this other’s brass; always promising to pay, promising to pay, tomorrow, and dying today, insolvent; seeking to curry favor, to get custom, by how many modes, only not state-prison offences; lying, flattering, voting, contracting yourselves into a nutshell of civility or dilating into an atmosphere of thin and vaporous generosity, that you may persuade your neighbor to let you make his shoes, or his hat, or his coat, or his carriage, or import his groceries for him; making yourselves sick, that you may lay up something against a sick day, something to be tucked away in an old chest, or in a stocking behind the plastering, or, more safely, in the brick bank; no matter where, no matter how much or how little.

Comment by Judd Rosenblatt (judd) on Alignment can be the ‘clean energy’ of AI · 2025-02-25T01:48:10.504Z · LW · GW

Yes, I am hopeful we have enough time before superintelligent AI systems are created to implement effective alignment approaches. I don't know if that is possible or not, but I think it is worth trying.

Given uncertainty about timelines and currently accelerating capabilities, it would be preferable to live in a world where we are making sure alignment advances more than otherwise.

Comment by Judd Rosenblatt (judd) on The case for a negative alignment tax · 2024-09-19T05:28:54.781Z · LW · GW

I think this is precisely the reason that you’d want to make sure the agent is engineered such that its utility function includes the utility of other agents—ie, so that the ‘alignment goals’ are its goals rather than ‘goals other than [its] own.’ We suspect that this exact sort of architecture could actually exhibit a negative alignment tax insofar as many other critical social competencies may require this as a foundation.

Comment by Judd Rosenblatt (judd) on The case for a negative alignment tax · 2024-09-19T05:25:03.404Z · LW · GW

I think this risks getting into a definitions dispute about what concept the words ‘alignment tax’ should point at. Even if one grants the point about resource allocation being inherently zero-sum, our whole claim here is that some alignment techniques might indeed be the most cost-effective way to improve certain capabilities and that these techniques seem worth pursuing for that very reason.

Comment by Judd Rosenblatt (judd) on The case for a negative alignment tax · 2024-09-19T05:03:58.702Z · LW · GW

Thanks for this comment! Definitely take your point that it may be too simplistic to classify entire techniques as exhibiting a negative alignment tax when tweaking the implementation of that technique slightly could feasibly produce misaligned behavior. It does still seem like there might be a relevant distinction between:

Techniques that can be applied to improve either alignment or capabilities, depending on how they’re implemented. Your example of ‘System 2 alignment’ would fall into this category, as would any other method with “the potential to be employed for both alignment and capabilities in ways so similar that the design/implementation costs are probably almost zero,” as you put it.
Techniques that, by their very nature, improve both alignment and capabilities simultaneously, where the improvement in capabilities is not just a potential side effect or alternative application, but an integral part of how the technique functions. RLHF (for all of its shortcomings, as we note in the post) is probably the best concrete example of this—this is an alignment technique that is now used by all major labs (some of which seem to hardly care about alignment per se) by virtue of the fact it so clearly improves capabilities on balance.
1. (To this end, I think the point about refusing to do unaligned stuff as a lack of capability might be a stretch, as RLHF is much of what is driving the behavioral differences between, eg, gpt-4-base and gpt-4, which goes far beyond whether, to use your example, the model is using naughty words.)

We are definitely supportive of approaches that fall under both 1 and 2 (and acknowledge that 1-like approaches would not inherently have negative alignment taxes), but it does seem very likely that there are more undiscovered approaches out there with the general 2-like effect of “technique X got invented for safety reasons—and not only does it clearly help with alignment, but it also helps with other capabilities so much that, even as greedy capitalists, we have no choice but to integrate it into our AI’s architecture to remain competitive!” This seems like a real and entirely possible circumstance where we would want to say that technique X has a negative alignment tax.

Overall, we’re also sensitive to this all becoming a definitions dispute about what exactly is meant by terminology like ‘alignment taxes,’ ‘capabilities,’ etc, and the broader point that, as you put it,

you can advance capabilities and alignment at the same time, and should think about differentially advancing alignment

is indeed a good key general takeaway.

Comment by Judd Rosenblatt (judd) on The Bar for Contributing to AI Safety is Lower than You Think · 2024-08-16T20:35:47.285Z · LW · GW

Interesting relevant finding from the alignment researcher + EA survey we ran:

We also find in both datasets—but most dramatically in the EA community sample, plotted below—that respondents vastly overestimate (≈2.5x) how much high intelligence is actually valued, and underestimate other cognitive features like having strong work ethics, abilities to collaborate, and people skills. One potentially clear interpretation of this finding is that EAs/alignment researchers actually believe that high intelligence is necessary but not sufficient for being impactful—but perceive other EAs/alignment researchers as thinking high intelligence is basically sufficient. The community aligning on these questions seems of very high practical importance for hiring/grantmaking criteria and decision-making.

Comment by Judd Rosenblatt (judd) on There Should Be More Alignment-Driven Startups · 2024-07-11T23:00:21.866Z · LW · GW

Will do!

Comment by Judd Rosenblatt (judd) on There Should Be More Alignment-Driven Startups · 2024-07-11T22:59:53.745Z · LW · GW

Interesting. I wouldn't totally rule number 1 out though. Depending on how fast things go, the average time to successful IPO may decrease substantially.

Comment by Judd Rosenblatt (judd) on There Should Be More Alignment-Driven Startups · 2024-06-04T23:35:55.319Z · LW · GW

Yes, excellent point, and thanks for the callout.

Note though that a fundamental part of this is that we at AE Studio do intend eventually to incubate as part of our skunkworks program alignment-driven startups.

We've seen that we can take excellent people, have them grow on client projects for some amount of time, get better at stuff they don't even realize they need to get better at in a very high-accountability way, and then be well positioned to found startups we incubate internally.

We've not turned attention to internally-incubated startups for alignment specifically yet but hope to by later this year or early next.

Meanwhile, there are not many orgs like us, and for various reasons it's easier to start a startup than to start something doing what we do.

If you think you can start something like what we do, I'd generally recommend it. You're probably more likely to succeed doing something more focused though to start.

Also, to start, we flailed a bit till we figured out we should get good at one thing at a time before doing more and more.

Comment by Judd Rosenblatt (judd) on There Should Be More Alignment-Driven Startups · 2024-06-01T07:57:31.875Z · LW · GW

We plan to announce further details in a later post.

Comment by Judd Rosenblatt (judd) on There Should Be More Alignment-Driven Startups · 2024-06-01T07:55:24.100Z · LW · GW

Thanks, appreciate your wanting these efforts not discouraged!

I agree there's certainly a danger of AI safety startups optimizing for what will appeal to investors (not just with risk appetite but in many other dangerous ways too) and Goodharting rather than focusing purely on the most impactful work.

VCs themselves tend not to think as long-term as they should (even for their own economic interests), but I'm hopeful we can build an ecosystem around AI safety where they do more. Likely, the investors interested in AI safety will be inclined to think more long-term. The few early AI safety investors that exist today certainly are.

I do think it's crucial (and possible!) for founders in this space to be very thoughtful about their true long-term goals and incentives around alignment and to build the right structures around AI safety for-profit funding.

On your diversification point, for example, a windfall trust-like thing for all AI safety startups to share in the value each other create could make a lot of sense considering just a very tiny bit of equity in the biggest winners may be quickly larger than our entire economy today.

Also, inadequate equilibria are too bad, yeah, but inadequate equilibria apply to all orgs, not just startups. We pointed out in the post above

We think that as AI development and mainstream concern increase, there’s going to be a significant increase in safety-washing and incentives pushing the ecosystem from challenging necessary work towards pretending to solve problems. We think the way to win that conflict is by showing up, rather than lamenting other people’s incentives. This problem isn’t limited to business relationships; safety-washing is a known problem with nonprofits, government regulations, popular opinion, and so on. Every decision-maker is beholden to their stakeholders, and so decision quality is driven by stakeholder quality.

In fact, startups can be a powerful antidote to inadequate equilibria. I think often the biggest opportunities for startups are actually solving inadequate equilibria, especially leveraging technology shifts/innovations, like electric cars. Ideal new structures to facilitate and govern maximal AI safety innovation would help fast-track solutions around these inadequate equilibria. In contrast, established systems are more prone to yielding inadequate equilibria due to their resistance to change.

I also think we may be underestimating how much people may come together to try to solve these problems as they increasingly come to take them seriously. Today at LessOnline, an interesting discussion I heard was about how surprised AI safety people are that the general public seems so naturally concerned about AI safety upon hearing about it.

This makes me hopeful we can create startups and new structures that help address inadequate equilibria and solve AI safety, and I think we ought to try.

Comment by Judd Rosenblatt (judd) on There Should Be More Alignment-Driven Startups · 2024-05-31T08:12:44.207Z · LW · GW

Yes, you're right, and most startups do fail. That's how it works!

Still, the biggest opportunities are often the ones with the lowest probability of success, and startups are the best structures to capitalize on them. This paradigm may fit well to AI safety.

Ideally we can engineer an ecosystem that creates enough that do succeed and substantially advance AI safety. Seems to me that aggressively expanding the AI safety startup ecosystem is one of the highest-value interventions available right now.

Meanwhile, strongly agreed that AI safety driven startups should be B corps, especially if they're raising money.

Comment by Judd Rosenblatt (judd) on Don't Share Information Exfohazardous on Others' AI-Risk Models · 2023-12-22T05:50:38.515Z · LW · GW

This is a great point. I also notice that a decent number of people's risk models change frequently with various news, and that's not ideal either, as it makes them less likely to stick with a particular approach that depends on some risk model. In an ideal world we'd have enough people pursuing enough approaches with most possible risk models that it's make little sense for anyone to consider switching. Maybe the best we can approximate now is to discuss this less.

Comment by Judd Rosenblatt (judd) on The 'Neglected Approaches' Approach: AE Studio's Alignment Agenda · 2023-12-19T00:14:31.062Z · LW · GW

That would be great! And it's exactly the sort of thing we've dreamed about building at AE since the start.

Incidentally, I've practiced something (inferior) like this with my wife in the past and we've gotten good at speaking simultaneously and actually understanding multiple threads at the same time (though it seems to break down if one of the threads is particularly complex).

It seems like an MVP hyperphone could potentially just be a software project/not explicitly require BCI (though certainly would be enhanced with it). We would definitely consider building it, at least as a Same Day Skunkworks. Are you aware of any existing tool that's at all like this?

You might also enjoy this blog post, which talks about how easily good ideas can be lost and why a tool like this could be so high value.

My favorite quotes from the piece:

1. "While ideas ultimately can be so powerful, they begin as fragile, barely formed thoughts, so easily missed, so easily compromised, so easily just squished."
2. "You need to recognize those barely formed thoughts, thoughts which are usually wrong and poorly formed in many ways, but which have some kernel of originality and importance and truth. And if they seem important enough to be worth pursuing, you construct a creative cocoon around them, a set of stories you tell yourself to protect the idea not just from others, but from your own self doubts. The purpose of those stories isn't to be an air tight defence. It's to give you the confidence to nurture the idea, possibly for years, to find out if there's something really there.
3. "And so, even someone who has extremely high standards for the final details of their work, may have an important component to their thinking which relies on rather woolly arguments. And they may well need to cling to that cocoon. Perhaps other approaches are possible. But my own experience is that this is often the case."

Comment by Judd Rosenblatt (judd) on My thoughts on the social response to AI risk · 2023-11-03T20:30:35.033Z · LW · GW

And another interesting one from the summit:

“There was almost no discussion around agents—all gen AI & model scaling concerns.

It’s perhaps because agent capabilities are mediocre today and thus hard to imagine, similar to how regulators couldn’t imagine GPT-3’s implications until ChatGPT.” - https://x.com/kanjun/status/1720502618169208994?s=46&t=D5sNUZS8uOg4FTcneuxVIg

Comment by Judd Rosenblatt (judd) on My thoughts on the social response to AI risk · 2023-11-03T05:41:33.213Z · LW · GW

Right now it seems to me that one of the highest impact things not likely to be done by default is substantially increased funding for AI safety.

Comment by Judd Rosenblatt (judd) on Anti-squatted AI x-risk domains index · 2023-07-12T00:01:54.128Z · LW · GW

I got https://www.pinkshoggoth.com/ inspired by Pink Shoggoths: What does alignment look like in practice?

Right now it's hosting a side project (that may wind up being replaced by new ChatGPT features). Feel free to DM me if you have a better use for it though!

User info

Posts

Comments