Posts
Comments
seems to me to be a crux between "man, we're probably all going to die" and "we're really really fucked"
Sorry, what's the difference between these two positions? Is the second one meant to be a more extreme version of the first?
What’s the difference between “Alice is falling victim to confusions/reasoning mistakes about X” and “Alice disagrees with me about X”?
I feel like using the former puts undue social pressure on observers to conclude that you’re right, and makes it less likely they correctly adjudicate between the perspectives.
(Perhaps you can empathise with me here, since arguably certain people taking this sort of tone is one of the reasons AI x-risk arguments have not always been vetted as carefully as they should!)
I learned maths mostly by teachers at school writing on a whiteboard, university lecturers writing on a blackboard or projector, and to a lesser extent friends writing on pieces of paper.
There was a tiny supplement of textbook-reading at school and large supplement of printed-notes-reading at university.
I would guess only a tiny fraction learn exclusively via typed materials. If you have any kind of teacher, how could you? Nobody shows you how to rearrange an equation by live-typing latex.
In Texas Hold ‘Em, the most popular form of poker, there is no drawing or discarding, just betting and folding.
This seems like strong evidence that those parts are where the skill lies — somebody came up with a version that removed the other parts, and everyone switched to it.
Not sure how that affects the metaphor. For me I think it weakened the punch, since I had to stop and remember that there exist forms of poker with drawing and discarding.
Right, I understand it now, thanks. I missed the labels on the x axis.
I found your bar chart more confusing than illuminating. Does it make sense to mark the bottom 20% of people, and those people’s 43% probability of staying in the bottom 20%, as two different fractions of the same bar? The 43% is 43% of the 20%, not of the original 100%.
If many more people are extremely happy all the time than extremely depressed all the time, the bunch of people you describe would be managing their beliefs rationally. And indeed I think that’s probably the case.
Can anybody confirm whether Paul is likely systematically silenced re OpenAI?
I’m an adult from the UK and learnt the word faucet like last year
Thanks. Do you use this system for reading list(s) too?
When you say you use a kanban-style system, does that just refer to the fact that there are columns that you drag items between, or does it specifically mean that you also make use of an 'in progress' column?
If so, do you have one for each 'todo' column, or what?
And do you have a column for the 'capture' aspect of GTD, or do you do something else for that?
Are you interested in these debates in order to help form your own views, or convince others?
I feel like debates are inferior to reading people's writings for the former purpose, and for the latter they deal collateral damage by making the public conversation more adversarial.
I keep reading the title as Attention: SAEs Scale to GPT-2 Small.
Thanks for the heads up.
I think what I was thinking of is that words can have arbitrary consequences and be arbitrarily high cost.
In the apologising case, making the right social API call might be an action of genuine significance. E.g. it might mean taking the hit on lowering onlookers' opinion of my judgement, where if I'd argued instead that the person I wronged was talking nonsense I might have got away with preserving it.
John's post is about how you can gain respect for apologising, but it does have often have costs too, and I think the respect is partly for being willing to pay them.
Words are a type of action, and I guess apologising and then immediately moving on to defending yourself is not the sort of action which signals sincerity.
Explaining my downvote:
This comment contains ~5 negative statements about the post and the poster without explaining what it is that the commentor disagrees with.
As such it seems to disparage without moving the conversation forward, and is not the sort of comment I'd like to see on LessWrong.
The second footnote seems to be accidentally duplicated as the intro. Kinda works though.
"Not invoking the right social API call" feels like a clarifying way to think about a specific conversational pattern that I've noticed that often leads to a person (e.g. me) feeling like they're virtuosly giving up ground, but not getting any credit for it.
It goes something like:
Alice: You were wrong to do X and Y.
Bob: I admit that I was wrong to do X and I'm sorry about it, but I think Y is unfair.
discussion continues about Y and Alice seems not to register Bob's apology
It seems like maybe bundling in your apology for X with a protest against Y just doesn't invoke the right API call. I'm not entirely sure what the simplest fix is, but it might just be swapping the order of the protest and the apology.
Is it true that scaling laws are independent of architecture? I don’t know much about scaling laws but that seems surely wrong to me.
e.g. how does RNN scaling compare to transformer scaling
Your example of a strong syllogism (‘if A, then B. A is true, therefore B is true’) isn’t one.
It’s instead of the form ‘If A, then B. A is false, therefore B is false’, which is not logically valid (and also not a Jaynesian weak syllogism).
If Fisher lived to 100 he would have become a Bayesian
Fisher died at the age of 72
———————————————————————————————————
Fisher died a Frequentist
You could swap the conclusion with the second premise and weaken the new conclusion to ‘Fisher died before 100’, or change the premise to ‘Unless Fisher lived to a 100 he would not become a Bayesian’.
Augmenting humans to do better alignment research seems like a pretty different proposal to building artificial alignment researchers.
The former is about making (presumed-aligned) humans more intelligent, which is a biology problem, while the latter is about making (presumed-intelligent) AIs aligned, which is a computer science problem.
I don’t think that that’s the view of whoever wrote the paragraph you’re quoting, but at this point we’re doing exegesis
Hm, I think that paragraph is talking about the problem of getting an AI to care about a specific particular thing of your choosing (here diamond-maximising), not any arbitrary particular thing at all with no control over what it is. The MIRI-esque view thinks the former is hard and the latter happens inevitably.
Right, makes complete sense in the case of LLM-based agents, I guess I was just thinking about much more directly goal-trained agents.
I like the distinction but I don’t think either aimability or goalcraft will catch on as Serious People words. I’m less confident about aimability (doesn’t have a ring to it) but very confident about goalcraft (too Germanic, reminiscent of fantasy fiction).
Is words-which-won’t-be-co-opted what you’re going for (a la notkilleveryoneism), or should we brainstorm words-which-could-plausibly-catch on?
Perhaps, or perhaps not? I might be able to design a gun which shoots bullets in random directions (not on random walks), without being able to choose the direction.
Maybe we can back up a bit, and you could give some intuition for why you expect goals to go on random walks at all?
My default picture is that goals walk around during training and perhaps during a reflective process, and then stabilise somewhere.
I think that’s a reasonable point (but fairly orthogonal to the previous commenter’s one)
A gun which is not easily aimable doesn't shoot bullets on random walks.
Or in less metaphorical language, the worry is that mostly that it's hard to give the AI the specific goal you want to give it, not so much that it's hard to make it have any goal at all. I think people generally expect that naively training an AGI without thinking about alignment will get you a goal-directed system, it just might not have the goal you want it to.
Sounds like the propensity interpretation of probability.
FiO?
Nice job
I like the idea of a public research journal a lot, interested to see how this pans out!
You seem to be operating on a model that says “either something is obvious to a person, or it’s useful to remind them of it, but not both”, whereas I personally find it useful to be reminded of things that I consider obvious, and I think many others do too. Perhaps you don’t, but could it be the case that you’re underestimating the extent to which it applies to you too?
I think one way to understand it is to disambiguate ‘obvious’ a bit and distinguish what someone knows from what’s salient to them.
If someone reminds me that sleep is important and I thank them for it, you could say “I’m surprised you didn’t know that already,” but of course I did know it already - it just hadn’t been salient enough to me to have as much impact on my decision-making as I’d like it to.
I think this post is basically saying: hey, here’s a thing that might not be as salient to you as it should be.
Maybe everything is always about the right amount of salient to you already! If so you are fortunate.
I think it falls into the category of 'advice which is of course profoundly obvious but might not always occur to you', in the same vein as 'if you have a problem, you can try to solve it'.
When you're looking for something you've lost, it's genuinely helpful when somebody says 'where did you last have it?', and not just for people with some sort of looking-for-stuff-atypicality.
I think I practice something similar to this with selfishness: a load-bearing part of my epistemic rationality is having it feel acceptable that I sometimes (!) do things for selfish rather than altruistic reasons.
You can make yourself feel that selfish acts are unacceptable and hope this will make you very altruistic and not very selfish, but in practice it also makes you come up with delusional justifications as to why selfish acts are in fact altruistic.
From an impartial standpoint we can ask how much of the latter is woth it for how much of the former. I think one of life's repeated lessons is that sacrificing your epistemics for instrumental reasons is almost always a bad idea.
Do people actually disapprove of and disagree with this comment, or do they disapprove of the use of said 'poetic' language in the post? If the latter, perhaps they should downvote the post and upvote the comment for honesty.
Perhaps there should be a react for "I disapprove of the information this comment revealed, but I'm glad it admitted it".
LLMs calculate pdfs, regardless of whether they calculate ‘the true’ pdf.
Sometimes I think trying to keep up with the endless stream of new papers is like watching the news - you can save yourself time and become better informed by reading up on history (ie classic papers/textbooks) instead.
This is a comforting thought, so I’m a bit suspicious of it. But also it’s probably more true for a junior researcher not committed to a particular subfield than someone who’s already fully specialised.
Sometimes such feelings are your system 1 tracking real/important things that your system 2 hasn’t figured out yet.
I’d like to see more posts using this format, including for theoretical research.
I vote singular learning theory gets priority (if there was ever a situation where one needed to get priority). I intuitively feel like research agendas or communities need an acronym more than concepts. Possibly because in the former case the meaning of the phrase becomes more detached from the individual meaning of the words than it does in the latter.
Just wanted to say that I am a vegan and I’ve appreciated this series of posts.
I think the epistemic environment of my IRL circles has always been pretty good around veganism, and personally I recoil a bit from discussion of specific people or groups’ epistemic virtues of lack thereof (not sure if I think it’s unproductive or just find it aversive), so this particular post is of less interest to me personally. But I think your object-level discussion of the trade-offs of veganism has been consistently fantastic and I wanted to thank you for the contribution!
Are Self Control and Freedom.to for different purposes or the same? Should I try multiple app/website blockers till I find one that's right for me, or is there an agreed upon best one that I can just adopt with no experimentation?
Well, the joke does give a fair bit of information about both your politics and how widespread you think they are on LW. It might be very reasonable for someone to update their beliefs about LW politics based on seeing it. Then to what extent their conclusion mind-kills them is somewhat independent of the joke.
(I agree it’s a fairly trivial case, mostly discussing it out of interest in how our norms should work.)
A Yudphemism: Politics is the Mind-Killer.
My guess is that it's not that people are downvoting because they think you made a political statement which they oppose and they are mind-killed by it. Rather they think you made a political joke which has the potential to mind-kill others, and they would prefer you didn't.
That's why I downvoted, at least. The topic you mentioned doesn't arouse strong passions in me at all, and probably doesn't arouse strong passions in the average LW reader that much, but it does arouse strong passions in quite a large number of people, and when those people are here, I'd prefer such passions weren't aroused.
Even now I would like it if you added an edit at the start to make it clearer what you’re doing! Before reading the replying comment and realising the context, I was mildly shocked by such potentially inflammatory speculation and downvoted.
On the other hand, even the smallest of small towns in the UK has a wide variety of ethnic food. I think pretty much anywhere with a restaurant has a Chinese and an Indian, and usually a lot more.
Meta point: I think the forceful condescending tone is a bit inappropriate when you’re talking about a topic that you don’t necessarily know that much about.
You’ve flatly asserted that the entirety of game theory is built on an incorrect assumption, and whether you’re or not your correct about that, it doesn’t seem like you’re that clued up on game theory.
Eliezer just about gets away with his tone because he knows whereof he speaks. But I would prefer it if he showed more humility, and I think if you’re writing about a topic while you’re learning the basics of it, you should definitely show more! If only because it makes it easier to change your mind as you learn more.
EDIT: I think this reads a bit more negative than I intended, so just wanted to say I did enjoy the post and appreciate your engagement in the comments!
Being able to deduce a policy from beliefs doesn’t mean that common knowledge of beliefs is required.
The common knowledge of policy thing is true but is external to the game. We don’t assume that players in prisoner’s dilemma know each others policies. As part of our analysis of the structure of the game, we might imagine that in practice some sort of iterative responding-to-each-other’s-policy thing will go on, perhaps because players face off regularly (but myopically), and so the policies selected will be optimal wrt each other. But this isn’t really a part of the game, it’s just part of our analysis. And we can analyse games in various different ways e.g. by considering different equilibrium concepts.
In any case it doesn’t mean that an agent in reality in a prisoner’s dilemma has a crystal ball telling them the other’s policy.
Certainly it’s natural to consider the case where the agents are used to playing against each other so the have the chance to learn and react to each other’s policies. But a case where they each learn each other’s beliefs doesn’t feel that natural to me - might as well go full OSGT at that point.