Posts
Comments
I guess one's timelines might have gotten longer if one had very high credence that the paradigm opened by o1 is a blind alley (relative to the goal of developing human-worker-omni-replacement-capable AI) but profitable enough that OA gets distracted from its official most ambitious goal.
I'm not that person.
React suggestion/request: "not joint-carving"/"not the best way to think about this topic".
This is kind of "(local) taboo those words" but it's more specific.
Compositional agency?
terminal values in the first place, as opposed to active blind spots masquerading as terminal values.
Can't one's terminal values be exactly (mechanistically implemented as) active blind spots?
I predict that you would say something like "The difference is that active blind spots can be removed/healed/refactored 'just' by (some kind of) learning, so they're not unchanging as one's terminal values would be assumed to be."?
cover ups
Why do you think there are cover-ups?
More specifically, do you mean that people-in-the-know are not willing to report it or that there is some active silencing or [discouragement of those who would like to bring attention to it] going on?
There was one community alert about Zizians 2y ago here. Before that, there was a discussion of Jessica Taylor's situation being downstream from Vassar's influence but as far as I remember Scott Alexander eventually retracted his claims about this.
In any case, I think this kind of stuff deserves a top-level alert post, like the one about Ziz.
Also: anybody have any recommendations for pundits/analysis sources to follow on the Taiwan situation? (there's Sentinel but I'd like something more in-depth and specifically Taiwan-related)
I think Mesa is saying something like "The missing pieces are too alien for us to expect to discover them by thinking/theorizing but we'll brute-force the AI into finding/growing those missing pieces by dumping more compute into it anyway." and Tsvi's koan post is meant to illustrate how difficult it would be to think oneself into those missing pieces.
Estonia. (Alternatively, Poland, in which case: PLN, not EUR.)
I'm considering donating. Any chance of setting up some tax deduction for Euros?
I think you meant to hide these two sentences in spoiler tags but you didn't
guilt-by-association
Not necessarily guilt-by-association, but maybe rather pointing out that the two arguments/conspiracy theories share a similar flawed structure, so if you discredit one, you should discredit the other.
Still, I'm also unsure how much structure they share, and even if they did, I don't think this would be discursively effective because I don't think most people care that much about (that kind of) consistency (happy to be updated in the direction of most people caring about it).
Reminds me of how a few years ago I realized that I don't feel some forms of stress but can infer I'm stressed by noticing reduction in my nonverbal communication.
FYI if you want to use o1-like reasoning, you need to check off "Deep Think".
It's predictably censored on CCP-sensitive topics.
(In a different chat.) After the second question, it typed two lines (something like "There have been several attempts to compare Winnie the Pooh to a public individual...") and then overwrote it with "Sorry...".
https://gwern.net/doc/existential-risk/2011-05-10-givewell-holdenkarnofskyjaantallinn.doc
glitch tokens are my favorite example
I directionally agree with the core argument of this post.
The elephant(s) in the room according to me:
- What is an algorithm? (inb4 a physical process that can be interpreted/modeled as implementing computation)
- How do you distinguish (hopefully, in a principled way) between (a) an algorithm changing; (b) you being confused about what algorithm the thing is actually running and in reality being more nuanced so that what "naively" seems like a change of the algorithm is "actually" a reparametrization of the algorithm?
I haven't read the examples in this post super carefully, so perhaps you discuss this somewhere in the examples (though I don't think so because the examples don't seem to me like the place to include such discussion).
Thanks for the post! I expected some mumbo jumbo but it turned out to be an interesting intuition pump.
Based on my attending Oliver's talk, this may be relevant/useful:
I too have reservations about points 1 and 3 but not providing sufficient references or justifications doesn't imply they're not on SL1.
mentioned in the FAQ
(I see what podcasts you listen to.)
My notion of progress is roughly: something that is either a building block for The Theory (i.e. marginally advancing our understanding) or a component of some solution/intervention/whatever that can be used to move probability mass from bad futures to good futures.
Re the three you pointed out, simulators I consider a useful insight, gradient hacking probably not (10% < p < 20%), and activation vectors I put in the same bin as RLHF whatever is the appropriate label for that bin.
Also, I'm curious what it is that you consider(ed) AI safety progress/innovation. Can you give a few representative examples?
- the approaches that have been attracting the most attention and funding are dead ends
I'd love to try it, mainly thinking about research (agent foundations and AI safety macrostrategy).
I propose "token surprise" (as in type-token distinction). You expected this general type of thing but not that Ivanka would be one of the tokens instantiating it.
It's better but still not quite. When you play on two levels, sometimes the best strategy involves a pair of (level 1 and 2) substrategies that are seemingly opposites of each other. I don't think there's anything hypocritical about that.
Similarly, hedging is not hypocrisy.
Do you think [playing in a rat race because it's the most locally optimal for an individual thing to do while at the same advocating for abolishing the rat race] is an example of reformative hypocrisy?
Or even more broadly, defecting in a prisoner's dilemma while exposing an interface that would allow cooperation with other like-minded players?
I've had this concept for many years and it hasn't occurred to me to give it a name (How Stupid Not To Have Thought Of That) but if I tried to give it a name, I definitely wouldn't call it a kind of hypocrisy.
It's not clear to me how this results from "excess resources for no reasons". I guess the "for no reasons" part is crucial here?
I meant this strawberry problem.
Samo said that he would bet that AGI is coming perhaps in the next 20-50 years, but in the next 5.
I haven't listened to the pod yet but I guess you meant "but not in the next 5".
FWIW Oliver's presentation of (some fragment of) his work at ILIAD was my favorite of all the talks I attended at the conference.
I am not totally sure why he considers discrete models to be unable to describe initial states or state-transition programming.
AFAIU, he considers them inadequate because they rely on an external interpreter, whereas the model of reality should be self-interpreting because there is nothing outside of reality to interpret it.
Wheeler suggests some principles for constructing a satisfactory explanation. The first is that "The boundary of a boundary is zero": this is an algebraic topology theorem showing that, when taking a 3d shape, and then taking its 2d boundary, the boundary of the 2d boundary is nothing, when constructing the boundaries in a consistent fashion that produces cancellation; this may somehow be a metaphor for ex nihilo creation (but I'm not sure how).
See this as an operation that takes a shape and produces its boundary. It goes 3D shape -> 2D shape -> nothing. If you reverse the arrows you get nothing -> 2D shape -> 3D. (Of course, it's not quite right because (IIUC) all 2D shapes have boundary zero but I guess it's just meant as a rough analogy.)
He notes a close relationship between logic, cognition, and perception: for example, "X | !X" when applied to perception states that something and its absence can't both be perceived at once
This usage of logical operators is confusing. In the context of perception, he seems to want to talk about NAND: you never perceive both something and its absence but you may also not perceive either.
(note that "X | !X" is equivalent to "!(X & !X)" in classical but not intuitionistic logic)
Intuitionistic logic doesn't allow either.[1] It allows .
Langan contrasts between spatial duality principles ("one transposing spatial relations and objects" and temporal duality principles ("one transposing objects or spatial relations with mappings, functions, operations or processes"). This is now beyond my own understanding.
It's probably something like: if you have a spatial relationship between two objects X and Y, you can view it as an object with X and Y as endpoints. Temporally, if X causes Y, then you can see it as a function/process that, upon taking X produces Y.
The most confusing/unsatisfying thing for me about CTMU (to the extent that I've engaged with it so far) is that it doesn't clarify what "language" is. It points ostensively at examples: formal languages, natural languages, science, perception/cognition, which apparently share some similarities but what are those similarities?
- ^
Though paraconsistent logic does.
Did they name it after the strawberry problem!?
Here are some axes along which I think there's some group membership signaling in philosophy (IDK about the extent and it's hard to disentangle it from other stuff):
- Math: platonism/intuitionism/computationalism (i.e. what is math?), interpretations of probability, foundations of math (set theory vs univalent foundations)
- Mind: externalism/internalism (about whatever), consciousness (de-facto-dualisms (e.g. Chalmers) vs reductive realism vs illusionism), language of thought vs 4E cognition, determinism vs compatibilism vs voluntarism
- Metaphysics/ontology: are chairs, minds, and galaxies real? (this is somewhat value-laden for many people)
- Biology: gene's-eye-view/modern synthesis vs extended evolutionary synthesis
Moreover, I don't think that some extra/different planning machinery was required for language itself, beyond the existing abstraction and model-based RL capabilities that many other animals share.
I would expect to see sophisticated ape/early-hominid-lvl culture in many more species if that was the case. For some reason humans went on the culture RSI trajectory whereas other animals didn't. Plausibly there was some seed cognitive ability (plus some other contextual enablers) that allowed a gene-culture "coevolution" cycle to start.
My feedback is that I absolutely love it. My favorite feature released since reactions or audio for all posts (whichever was later).
In other words, there's a question about how to think about truth in a way that honors perspectivalism, while also not devolving into relativism. And the way Jordan and I were thinking about this, was to have each filter bubble -- with their own standards of judgment for what's true and what's good -- to be fed the best content from the other filter bubbles by the standards from within each filter bubble, rather than the worst content, which is more like what we see with social media today.
Seems like Monica Anderson was trying to do something like that with BubbleCity. (pdf, podcast)
This is not quite an answer to your question but some recommendations in the comments to this post may be relevant: https://www.lesswrong.com/posts/SCs4KpcShb23hcTni/ideal-governance-for-companies-countries-and-more
It does for me
Sometimes I look up a tag/concept to ensure that I'm not spouting nonsense about it.
But most often I use them to find the posts related to a topic I'm interested in.
Possible problems with this approach
One failure mode you seem to have missed (which I'm surprised by) is that the SOO metric may be good-heartable. It might be the case (for all we know) that by getting the model to maximize SOO (subject to constraints of sufficiently preserving performance etc), you incentivize it to encode the self-other distinction in some convoluted way that is not adequately captured by the SOO metric, but is sufficient for deception.
Radical probabilist
Paperclip minimizer
Child of LDT
Dragon logician
Embedded agent
Hufflepuff cynic
Logical inductor
Bayesian tyrant
Asexual species universally seem to have come into being very recently. They likely go extinct due to lack of genetic diversity and attendant mutational load catastrophe and/or losing arms races with parasites.
Bdelloidea are an interesting counterexample: they evolved obligate parthenogenesis ~25 mya.
There's a famous prediction market about whether AI will get gold from the International Mathematical Olympiad by 2025.
correction: it's by the end of 2025
Also, they failed to provide the promised fraction of compute to the Superalignment team (and not because it was needed for non-Superalignment safety stuff).
Well, past events--before some time t--kind of obviously can't be included in the Markov blanket at time t.
As far as I understand it, the MB formalism captures only momentary causal interactions between "Inside" and "Outside" but doesn't capture a kind of synchronicity/fine-tuning-ish statistical dependency that doesn't manifest in the current causal interactions (across the Markov blanket) but is caused by past interactions.
For example, if you learned a perfect weather forecast for the next month and then went into a completely isolated bunker but kept track of what day it was, your beliefs and the actual weather would be very dependent even though there's no causal interaction (after you entered the bunker) between your beliefs and the weather. This is therefore omitted by MBs and CBs want to capture that.