LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

davidmanheim on How do top AI labs vet architecture/algorithm changes?

Most of this seems to be subsumed in the general question of how do you do research, and there's lot of advice, but it's (ironically) not at all a science. From my limited understanding of what goes on in the research groups inside these companies, it's a combination of research intuition, small scale testing, checking with others and discussing the new approach, validating your ideas, and getting buy-in from people higher up that it's worth your and their time to try the new idea. Which is the same as research generally.

At that point, I'll speculate and assume whatever idea they have is validated in smaller but still relatively large settings. For things like sample efficiency, they might, say, train a GPT-3 size model, which now cost only a fraction of the researcher's salary to do. (Yes, I'm sure they all have very large compute budgets for their research.) If the results are still impressive, I'm sure there is lots more discussion and testing before actually using the method in training the next round of frontier models that cost huge amounts of money - and those decisions are ultimately made by the teams building those models, and management.

algon on Designing for a single purpose

Brian Chesky, a co-founder of AirBnB, claimed that their company did get bloated and they lost focus before Covid happened and they had to cut the fat or die. And he claimed this error is common amongst late-state startups. From "The Social Radars: Brian Chesky, Co-Founder & CEO of Airbnb (Part II)". So I think turning into an octupus is something that happens to succesful startups, and is probably what's happening to Dropbox.

davidmanheim on Zero-Sum Defeats Nash Equilibrium

It seems like you're not being clear about how you are thinking about the cases, or are misusing some of the terms. Nash Equilibria exist in zero-sum games, so those aren't different things. If you're familiar with how to do game theory, I think you should carefully set up what you claim the situation is in a payoff matrix, and then check whether, given the set of actions you posit people have in each case, the scenario is actually a Nash equilibrium in the cases you're calling Nash equilibrium.

florian-habermacher on Let's Design A School, Part 2.2 School as Education - The Curriculum (General)

Would really really love to replace curricula by what you describe, kudos for proposing a reasonably simple yet consistent high-level plan that at least to my mostly uneducated eyes seems rather ideal!

Maybe unnecessary detail here but fwiw, in economics in the Core Civilizational Requirements,

an understanding of supply and demand, specialization and trade, and how capitalism works

I'd try to make sure to provoke them with enough not-so-standard market cases to allow them develop intuitions of where what intervention might be required/justified for which reasons (or from which points of view) and where not. I teach that subject, and deplore how our teaching tends to remain on the surface of things without opportunity to really sharpen students' minds w.r.t. the slightly more intricate econ policy questions where too shallow a demand-supply thinking just isn't much better than no econ at all.

davidmanheim on an effective ai safety initiative

...but there are a number of EAs working on cybersecurity in the context of AI risks, so one premise of the argument here is off.

And a rapid response site for the public to report cybersecurity issues and account hacking generally would do nothing to address the problems that face the groups that most need to secure their systems, and wouldn't even solve the narrower problem of reducing those hacks, so this seems like the wrong approach even given the assumptions you suggest.

davidmanheim on Biorisk is an Unhelpful Analogy for AI Risk

I agree that your question is weird and confused, and agree that if that were the context, my post would be hard to understand. But I think it's a bad analogy! That's because there are people who have made analogies between AI and Bio very poorly, and it's misleading and leading to sloppy thinking. In my experience seeing discussions on the topic, either the comparisons are drawn carefully and the relevant dissimilarities are discussed clearly, or they are bad analogies.

To stretch your analogy, if the context were that I'd recently heard people say "Steve and David are both people I know, and if you don't like Steve, you probably won't like David," and also "Steve and David are both concerned about AI risks, so they agree on how to discuss the issue," I'd wonder if there was some confusion, and I'd feel comfortable saying that in general, Steve is an unhelpful analog for David, and all these people should stop and be much more careful in how they think about comparisons between us.

devrandom on We are headed into an extreme compute overhang

https://www.lesswrong.com/posts/aH9R8amREaDSwFc97/rapid-capability-gain-around-supergenius-level-seems [LW · GW] also seems relevant to this discussion.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

Also positive update for me on interdisciplinary conceptual alignment being automatable differentially soon; which seemed to me for a long time plausible, since LLMs have 'read the whole internet' and interdisciplinary insights often seem (to me) to require relatively small numbers of inferential hops (plausibly because it's hard for humans to have [especially deep] expertise in many different domains), making them potentially feasible for LLMs differentially early (reliably making long inferential chains still seems among the harder things for LLMs).

seth-herd on jacquesthibs's Shortform

This is an excellent point.

While LLMs seem (relatively) safe [LW · GW], we may very well blow right on by them soon.

I do think that many of the safety advantages of LLMs come from their understanding of human intentions (and therefore implied values). Those would be retained in improved architectures that still predict human language use. If such a system's thought process was entirely opaque, we could no longer perform Externalized reasoning oversight [LW · GW] by "reading its thoughts".

But think it might be possible to build a reliable agent from unreliable parts. I think humans are such an agent, and evolution made us this way because it's a way to squeeze extra capability out of a set of base cognitive capacities.

Imagine an agentic set of scaffolding that merely calls the super-LLM for individual cognitive acts. Such an agent would use a hand-coded "System 2" thinking approach to solve problems, like humans do. That involves breaking a problem into cognitive steps. We also use System 2 for our biggest ethical decisions; we predict consequences of our major decisions, and compare them to our goals, including ethical goals. Such a synthetic agent would use System 2 for problem-solving capabilities, and also for checking plans for how well they achieve goals. This would be done for efficiency; spending a lot of compute or external resources on a bad plan would be quite costly. Having implemented it for efficiency, you might as well use it for safety.

This is just restating stuff I've said elsewhere, but I'm trying to refine the model, and work through how well it might work if you couldn't apply any external reasoning oversight, and little to no interpretability. It's definitely bad for the odds of success, but not necessarily crippling. I think.

This needs more thought. I'm working on a post on System 2 alignment, as sketched out briefly (and probably incomprehensibly) above.

azergante on Secrets of the eliminati

goals appear only when you make rough generalizations from its behavior in limited cases.

I am surprised no one brought up the usual map / territory distinction. In this case the territory is the set of observed behaviors. Humans look at the territory and with their limited processing power they produce a compressed and lossy map, here called the goal.

The goal is a useful model to talk simply about the set of behaviors, but has no existence outside the head of people discussing it.

LessWrong 2.0 Reader

Archive

Recent comments