Posts
Comments
Based on my attending Oliver's talk, this may be relevant/useful:
I too have reservations about points 1 and 3 but not providing sufficient references or justifications doesn't imply they're not on SL1.
mentioned in the FAQ
(I see what podcasts you listen to.)
My notion of progress is roughly: something that is either a building block for The Theory (i.e. marginally advancing our understanding) or a component of some solution/intervention/whatever that can be used to move probability mass from bad futures to good futures.
Re the three you pointed out, simulators I consider a useful insight, gradient hacking probably not (10% < p < 20%), and activation vectors I put in the same bin as RLHF whatever is the appropriate label for that bin.
Also, I'm curious what it is that you consider(ed) AI safety progress/innovation. Can you give a few representative examples?
- the approaches that have been attracting the most attention and funding are dead ends
I'd love to try it, mainly thinking about research (agent foundations and AI safety macrostrategy).
I propose "token surprise" (as in type-token distinction). You expected this general type of thing but not that Ivanka would be one of the tokens instantiating it.
It's better but still not quite. When you play on two levels, sometimes the best strategy involves a pair of (level 1 and 2) substrategies that are seemingly opposites of each other. I don't think there's anything hypocritical about that.
Similarly, hedging is not hypocrisy.
Do you think [playing in a rat race because it's the most locally optimal for an individual thing to do while at the same advocating for abolishing the rat race] is an example of reformative hypocrisy?
Or even more broadly, defecting in a prisoner's dilemma while exposing an interface that would allow cooperation with other like-minded players?
I've had this concept for many years and it hasn't occurred to me to give it a name (How Stupid Not To Have Thought Of That) but if I tried to give it a name, I definitely wouldn't call it a kind of hypocrisy.
It's not clear to me how this results from "excess resources for no reasons". I guess the "for no reasons" part is crucial here?
I meant this strawberry problem.
Samo said that he would bet that AGI is coming perhaps in the next 20-50 years, but in the next 5.
I haven't listened to the pod yet but I guess you meant "but not in the next 5".
FWIW Oliver's presentation of (some fragment of) his work at ILIAD was my favorite of all the talks I attended at the conference.
I am not totally sure why he considers discrete models to be unable to describe initial states or state-transition programming.
AFAIU, he considers them inadequate because they rely on an external interpreter, whereas the model of reality should be self-interpreting because there is nothing outside of reality to interpret it.
Wheeler suggests some principles for constructing a satisfactory explanation. The first is that "The boundary of a boundary is zero": this is an algebraic topology theorem showing that, when taking a 3d shape, and then taking its 2d boundary, the boundary of the 2d boundary is nothing, when constructing the boundaries in a consistent fashion that produces cancellation; this may somehow be a metaphor for ex nihilo creation (but I'm not sure how).
See this as an operation that takes a shape and produces its boundary. It goes 3D shape -> 2D shape -> nothing. If you reverse the arrows you get nothing -> 2D shape -> 3D. (Of course, it's not quite right because (IIUC) all 2D shapes have boundary zero but I guess it's just meant as a rough analogy.)
He notes a close relationship between logic, cognition, and perception: for example, "X | !X" when applied to perception states that something and its absence can't both be perceived at once
This usage of logical operators is confusing. In the context of perception, he seems to want to talk about NAND: you never perceive both something and its absence but you may also not perceive either.
(note that "X | !X" is equivalent to "!(X & !X)" in classical but not intuitionistic logic)
Intuitionistic logic doesn't allow either.[1] It allows .
Langan contrasts between spatial duality principles ("one transposing spatial relations and objects" and temporal duality principles ("one transposing objects or spatial relations with mappings, functions, operations or processes"). This is now beyond my own understanding.
It's probably something like: if you have a spatial relationship between two objects X and Y, you can view it as an object with X and Y as endpoints. Temporally, if X causes Y, then you can see it as a function/process that, upon taking X produces Y.
The most confusing/unsatisfying thing for me about CTMU (to the extent that I've engaged with it so far) is that it doesn't clarify what "language" is. It points ostensively at examples: formal languages, natural languages, science, perception/cognition, which apparently share some similarities but what are those similarities?
- ^
Though paraconsistent logic does.
Did they name it after the strawberry problem!?
Here are some axes along which I think there's some group membership signaling in philosophy (IDK about the extent and it's hard to disentangle it from other stuff):
- Math: platonism/intuitionism/computationalism (i.e. what is math?), interpretations of probability, foundations of math (set theory vs univalent foundations)
- Mind: externalism/internalism (about whatever), consciousness (de-facto-dualisms (e.g. Chalmers) vs reductive realism vs illusionism), language of thought vs 4E cognition, determinism vs compatibilism vs voluntarism
- Metaphysics/ontology: are chairs, minds, and galaxies real? (this is somewhat value-laden for many people)
- Biology: gene's-eye-view/modern synthesis vs extended evolutionary synthesis
Moreover, I don't think that some extra/different planning machinery was required for language itself, beyond the existing abstraction and model-based RL capabilities that many other animals share.
I would expect to see sophisticated ape/early-hominid-lvl culture in many more species if that was the case. For some reason humans went on the culture RSI trajectory whereas other animals didn't. Plausibly there was some seed cognitive ability (plus some other contextual enablers) that allowed a gene-culture "coevolution" cycle to start.
My feedback is that I absolutely love it. My favorite feature released since reactions or audio for all posts (whichever was later).
In other words, there's a question about how to think about truth in a way that honors perspectivalism, while also not devolving into relativism. And the way Jordan and I were thinking about this, was to have each filter bubble -- with their own standards of judgment for what's true and what's good -- to be fed the best content from the other filter bubbles by the standards from within each filter bubble, rather than the worst content, which is more like what we see with social media today.
Seems like Monica Anderson was trying to do something like that with BubbleCity. (pdf, podcast)
This is not quite an answer to your question but some recommendations in the comments to this post may be relevant: https://www.lesswrong.com/posts/SCs4KpcShb23hcTni/ideal-governance-for-companies-countries-and-more
It does for me
Sometimes I look up a tag/concept to ensure that I'm not spouting nonsense about it.
But most often I use them to find the posts related to a topic I'm interested in.
Possible problems with this approach
One failure mode you seem to have missed (which I'm surprised by) is that the SOO metric may be good-heartable. It might be the case (for all we know) that by getting the model to maximize SOO (subject to constraints of sufficiently preserving performance etc), you incentivize it to encode the self-other distinction in some convoluted way that is not adequately captured by the SOO metric, but is sufficient for deception.
Radical probabilist
Paperclip minimizer
Child of LDT
Dragon logician
Embedded agent
Hufflepuff cynic
Logical inductor
Bayesian tyrant
Asexual species universally seem to have come into being very recently. They likely go extinct due to lack of genetic diversity and attendant mutational load catastrophe and/or losing arms races with parasites.
Bdelloidea are an interesting counterexample: they evolved obligate parthenogenesis ~25 mya.
There's a famous prediction market about whether AI will get gold from the International Mathematical Olympiad by 2025.
correction: it's by the end of 2025
Also, they failed to provide the promised fraction of compute to the Superalignment team (and not because it was needed for non-Superalignment safety stuff).
Well, past events--before some time t--kind of obviously can't be included in the Markov blanket at time t.
As far as I understand it, the MB formalism captures only momentary causal interactions between "Inside" and "Outside" but doesn't capture a kind of synchronicity/fine-tuning-ish statistical dependency that doesn't manifest in the current causal interactions (across the Markov blanket) but is caused by past interactions.
For example, if you learned a perfect weather forecast for the next month and then went into a completely isolated bunker but kept track of what day it was, your beliefs and the actual weather would be very dependent even though there's no causal interaction (after you entered the bunker) between your beliefs and the weather. This is therefore omitted by MBs and CBs want to capture that.
(Continuity.) If , then there exists such that a gamble assigning probability to and to satisfies .
Should be " to "
Actually, it might be it, thanks!
The Schelling-point-ness of these memes seems to me to be secondary to (all inter-related):
- memetic fit (within a certain demographic, conditional on the person/group already adopting certain beliefs/attitudes/norms etc)
- being a self-correcting/stable attractor in the memespace
- being easy to communicate and hold all the relevant parts in your mind at once
You discuss all of that but I read the post as saying something like "we need Schelling points, therefore we have to produce memetic attractors to serve as such Schelling points", whereas I think that typically first a memeplex emerges, and then people start coordinating around it without much reflection. (Well, arguably this is true of most Schelling points.)
Here's one more idea that I think I've seen mentioned somewhere and so far hasn't spread but might become a Schelling point
AI summer - AI has brought a lot of possibilities but the road further ahead is fraught with risks. Let's therefore pause fundamental research and focus on reaping the benefits of the state of AI that we already have.
I think I saw a LW post that was discussing alternatives to the vNM independence axiom. I also think (low confidence) it was by Rob Bensinger and in response to Scott's geometric rationality (e.g. this post). For the hell of me, I can't find it. Unless my memory is mistaken, does anybody know what I'm talking about?
ideas of Eigenmorality and Eigenkarma[3].
broken links
recently[1].
empty footnote
What kind of interpretability work do you consider plausibly useful or at least not counterproductive?
2. Task uncertainty with reasonable prior on goal drift - the system is unsure about the task it tries to do and seeks human inputs about it.
“Task uncertainty with reasonable prior…” sounds to me like an overly-specific operationalization, but I think this desideratum is gesturing at visibility/correctability.
To me, "unsure about the task it tries to do" sounds more like applicability to a wide range of problems.
useless or counterproductive things due to missing it.
What kind of work do you think of?
Formal frameworks considered in isolation can't be wrong. Still, they often come with some claims like "framework F formalizes some intuitive (desirable?) property or specifies the right way to do some X and therefore should be used in such-and-such real-world situations". These can be disputed and I expect that when somebody claims like "{Bayesianism, utilitarianism, classical logic, etc} is wrong", that's what they mean.
(Vague shower thought, not standing strongly behind it)
Maybe it is the case that most people as individuals "just want frisbee and tea" but once religion (or rather the very broad class of ~"social practices" some subset/projection of which we round up to "religion") evolved and lowered the activation energy of people's hive switch, they became more inclined to appreciate the beauty of Cathedrals and Gregorian chants, etc.
In other words, people's ability to want/appreciate/[see value/beauty in X] depends largely on the social structure they are embedded in, the framework they adopt to make sense of the world etc. (The selection pressures that led to religion didn't entirely reduce to "somebody wanting something", so at least that part is not question-begging [I think].)
For good analysis of this, search for the heading “The data wall” here.
Did you mean to insert a link here?
Intentional
Lure for
Improvised
Acronym
Derivation
You're right, fixed, thanks!
In response, some companies began listing warrant canaries on their websites—sentences stating that they had never yet been forced to reveal any client data. If at some point they did receive such a warrant, they could then remove the canary without violating their legal non-disclosure obligation, thereby allowing the public to gain indirect evidence about this otherwise-invisible surveillance.
Can the gov force them not to remove the canary?
It wasn't me but it's probably about spreading AI capabilities-relevant knowledge.
He openly stated that he had left OA because he lost confidence that they would manage singularity responsibly. Had he signed the NDA, he would be prohibited from saying that.
According to this plant-based-leaning but also somewhat vegan-critical blog led by a sports nutritionist, eating 4 doses of soy per day (120 g of soybeans) is safe for male hormonal balance. It's in Polish but Google translate should handle. He cities studies. https://www.damianparol.com/soja-i-testosteron/
from Eric Weinstein in a youtube video.
Can you link?