LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

next page (older posts) →

Recent comments

faul_sname on New Paper: Infra-Bayesian Decision-Estimation Theory

Alright, as I've mentioned before [LW(p) · GW(p)] I'm terrible at abstract thinking, so I went through the post and came up with a concrete example. Does this seem about right?

We are running a quantitative trading firm, we care about the closing prices of the S&P 500 stocks. We have a forecasting system which is designed in a robust, "prediction market" style. Instead of having each model output a single precise forecast, each model

outputs a set of probability distributions over tomorrow’s closing prices. That is, for each action $a$ (for example, “buy” or “sell” a particular stock, or more generally “execute a trade decision”), our model returns a set

$M j (a) \subseteq Δ (F u t u r e P r i c e s), M^{j} (a) \subseteq Δ (Future Prices),$

which means that $M^{j} (a)$ is a nonempty, convex set of probability distributions over outcomes.

This definition allows "nature" (the market) to choose the distribution that actually occurs from within the set provided by our model. In our robust framework, we assume that the market might select the worst-case distribution in $M^{j} (a)$ . In other words, our multivalued model is a function

$M : A \to □ (O), M : A \to □ (O),$

with $□ (O)$ representing the collection of sets of probability distributions over outcomes. Instead of pinning down exactly what tomorrow’s price will be if we take a particular action, each model provides us a “menu” of plausible distributions over possible closing prices.

We do this because there is some hidden-to-us world state that we cannot measure directly, and this state might even be controlled adversarially by market forces. Instead of trying to infer or estimate the exact hidden state, we posit that there are a finite number of plausible candidates for what this hidden state might be. For each candidate hidden state, we associate one probability distribution over the closing prices. Thus, when we look at the model for an action, rather than outputting a single forecast, the model gives us a “menu” of distributions, each corresponding to one possible hidden state scenario. In this way, we deliberately refrain from committing to a single prediction about the hidden state, thereby preparing for a worst-case (or adversarial) realization.

Given a bettor $B$ , MDP $M$ and policy $π$ we define ${bet}_{M, π}^{B} : ([0, 1] \times (states \times acts \times [0, 1])^{[H]}) \to R$ , the aggregate betting function, as
${bet}_{M, π}^{B} (r_{0}, s_{1}, a_{1}, . . . r_{H}) := 1 + \sum_{h = 0}^{H} ({lbet}_{M, π}^{B, h, s_{h}, a_{h}} (r_{h}, s_{h + 1}, a_{h + 1}) - 1)$
Where $t r_{t}$ and $π_{t}$ are the trajectory in episode $t$ and policy in episode $t$ , respectively.

Next, our trading system aggregates these imprecise forecasts via a prediction-market-like mechanism. Inside our algorithm we maintain a collection of “bettors”. Each "bettor" (besides the pessimistic and uniform bettors, which do the obvious thing their names imply) corresponds to one of our underlying models (or to aspects of a model). Each bettor $B$ is associated with its own preferred prediction (derived from its hypothesis set) and a current “wealth” (i.e. credibility). Instead of simply choosing one model, every bettor places a bet based on (?)how well our market prediction aligns with its own view(?),

$Robust Universal Estimator (RUE)$
$Parameters: Hypothesis class H, rounds T, reward function r, prior ζ_{1}$
$ϵ \leftarrow min (\frac{1}{2}, \sqrt{\frac{ln (2)}{T}})$
$Function estimate(ζ, a):$
$r e t u r n argmin μ \in Δ O E B \sim ζ [if B \neq ∙, 2 D_{H}^{2} (μ \to M_{B} (a)), else ε \cdot E o \sim μ [r (a, o)]]$
$Function update(ζ, ¯ ¯¯¯¯ ¯ M, a, o):$
$f o r B \in H \cup {u} :$
$μ_{B} \leftarrow {argmin}_{μ \in M_{B} (a)} D_{H}^{2} (¯ ¯¯¯¯ ¯ M (a), μ)$
$ξ (B) \leftarrow ζ (B) \cdot (\sqrt{\frac{μ_{B} (o)}{¯ ¯¯¯ ¯ M (a) (o)}} + D_{H}^{2} (¯ ¯¯¯¯ ¯ M (a) \to M_{B} (a)))$
$ξ (∙) \leftarrow ζ (∙) \cdot (1 + ε (E_{o^{'} \sim ¯ ¯¯¯ ¯ M (a)} [r (a, o^{'})] - r (a, o)))$
$return ξ$
$for 1 \leq t \leq T :$
${ˆ M}_{t} \leftarrow λ a . estimate (ζ_{t}, a)$
$return {ˆ M}_{t}$
$Receive a_{t}, o_{t} from the environment$
$ζ_{t + 1} \leftarrow update (ζ_{t}, {ˆ M}_{t}, a_{t}, o_{t})$
$end$

To calculate our market prediction $ˆ M$ , we solve a convex minimization problem that balances the differing opinions of all our bettors (weighted by their current wealth $ζ$ ) in such a way that it (?)minimizes their expected value on update(?).

The key thing here is that we separate the predictable / non-adversarial parts of our environment from the possibly-adversarial ones, and so our market prediction $ˆ M$ reflects our best estimate of the outcomes of our actions if the parts of the universe we don't observer are out to get us.

Is this a reasonable interpretation? If so, I'm pretty interested to see where you go with this.

annapurna on Annapurna's Shortform

Today we saw:

The biggest USD / CHF move in history
The biggest USD / EUR move in history
Gold hit all time highs versus the USD

US stocks sell off
US treasuries across the curve sell off
The dollar lose versus all developed currencies.

Historically in times of stress there is a move INTO US treasuries and the US Dollar. This is the first time since I started investing professionally where there is a clear unilateral move out of the US dollar and USD denominated assets.

sharmake-farah on LLM AGI will have memory, and memory changes alignment

Yeah, this sort of thing, if it actually scales and can be adapted to other paradigms (like putting an RNN or transformers), would be the final breakthrough sufficient for AGI, because as I've said, one of the things that keeps LLM agents from being better is their inability to hold memory/state, which cripples meta-learning (without expensive compute investment), and this new paper is possibly a first step towards the return of recurrence/RNN architectures.

elifland on Reactions to METR task length paper are insane

However, the authors of AI 2027 predict pretty radical superintelligence before 2030, which does not seem to be justified by the plot. Arguably, since the plot is focused on software engineering tasks, the most relevant comparison is actually their prediction for human level software engineers, which is I believe is around 2026-2028 (clearly inconsistent with the plot).

Our rationale for why we extend the trend in the way that we do can be found in our timelines forecast. [LW · GW] In short, we adjust for (a) the possible trend speedup to a ~4 month doubling time as in the 2024-2025 trend (b) the possibility of further superexponentiality (c) intermediate speedups from AIs that aren't yet superhuman coders. Fair if you disagree, but we do explain how we expect things to deviate from the plot you included.

mitchell_porter on xpostah's Shortform

complete surveillance of all citizens and all elites

Certainly at a human level this is unrealistic. In a way it's also overkill - if use of an AI is an essential step towards doing anything dangerous, the "surveillance" can just be of what AIs are doing or thinking.

This assumes that you can tell whether an AI input or output is dangerous. But the same thing applies to video surveillance - if you can't tell whether a person is brewing something harmless or harmful, having a video camera in their kitchen is no use.

At a posthuman level, mere video surveillance actually does not go far enough, again because a smart deceiver can carry out their dastardly plots in a way that isn't evident until it's too late. For a transhuman civilization that has values to preserve, I see no alternative to enforcing that every entity above a certain level of intelligence (basically, smart enough to be dangerous) is also internally aligned, so that there is no disposition to hatch dastardly plots in the first place.

This may sound totalitarian, but it's not that different to what humanity attempts to instill in the course of raising children and via education and culture. We have law to deter and punish transgressors, but we also have these developmental feedbacks that are intended to create moral, responsible adults that don't have such inclinations, or that at least restrain themselves.

In a civilization where it is theoretically possible to create a mind with any set of dispositions at all, from paperclip maximizer to rationalist bodhisattva, the "developmental feedbacks" need to extend more deeply into the processes that design and create possible minds, than they do in a merely human civilization.

gwern on AI #111: Giving Us Pause

I don't think this is true at all. How do you translate, say, rotating multiple shapes in parallel into text?

At least for multimodal LLMs in the pure-token approach like Gato or DALL-E 1 (and probably GPT-4o and Gemini, although few details have been published), you would be able to do that by generating the tokens which embody an encoded image (or video!) of several shapes, well, rotating in parallel. Then you just look at them.

ozziegooen on abramdemski's Shortform

I'm curious whether you know of any examples in history where humanity purposefully and succesfully steered towards a significantly less competitive [economically, militarily,...] technology that was nonetheless safer.

This sounds much like a lot of the history of environmentalism and safety regulations? As in, there's a long history of [corporations selling X, using a net-harmful technology], then governments regulating. Often this happens after the technology is sold, but sometimes before it's completely popular around the world.

I'd expect that there's similarly a lot of history of early product areas where some people realize that [popular trajectory X] will likely be bad and get regulated away, so they help further [safer version Y].

Going back to the previous quote:

"steer the paradigm away from AI agents + modern generative AI paradigm to something else which is safer"

I agree it's tough, but would expect some startups to exist in this space. Arguably there are already several claiming to be focusing on "Safe" AI. I'm not sure if people here would consider this technically part of the "modern generative AI paradigm" or not, but I'd imagine these groups would be taking some different avenues, using clear technical innovations.

There are worlds where the dangerous forms have disadvantages later on - for example, they are harder to control/oversee, or they get regulated. In those worlds, I'd expect there should/could be some efforts waiting to take advantage of that situation.

oxidize on How familiar is the Lesswrong community as a whole with the concept of Reward-modelling?

These are 6 sample titles I'm considering using. Any thoughts come to mind?

AI-like reward functioning in humans. (Comprehensive model)
Agency in humans
Agency in humans | comprehensive model of why humans do what they do
EA should focus less on AI alignment, more on human alignment
EA's AI focus will be the end of us all.
EA's AI alignment focus will be the end of us all. We should focus on human alignment instead

mitchell_porter on Disempowerment spirals as a likely mechanism for existential catastrophe

I strong-upvoted this just for the title alone. If AI takeover is at all gradual, it is very likely to happen via gradual disempowerment.

But it occurs to me that disempowerment can actually feel like empowerment! I am thinking here of the increasing complexity of what AI gives us in response to our prompts. I can enter a simple instruction and get back a video or a research report. That may feel empowering. But all the details are coming from the AI. This means that even in actions initiated by humans, the fraction that directly comes from the human is decreasing. We could call this relative disempowerment. It's not that human will is being frustrated, but rather that the AI contribution is an ever-increasing fraction of what is done.

Arguably, successful alignment of superintelligence produces a world in which 99+% of what happens comes from AI, but it's OK because it is aligned with human volition in some abstract sense. It's not that I am objecting to AI intentions and actions becoming most of what happens, but rather warning that a rising tide of empowerment-by-AI can turn into complete disempowerment thanks to deception or just long-term misalignment... I think everyone already knows this, but I thought I would point it out in this context.

benito on Benito's Shortform Feed

I wrote this because I am increasingly noticing that the rules for "which worlds to keep in mind/optimize" are often quite different from "which worlds my spreadsheets say are the most likely worlds". And that this is in conflict with my heuristics which would've said "optimize the world-models in your head for being the most accurate ones – the ones that will give you the most accurate answers to most questions" rather than something like "optimize the world-models in your head for being the most useful ones".

(Though the true answer is some more complicated function combining both practical utility and map-territory correspondence.)

I'm not sure I understand what's confusing about this.

I will note that what is confusing to one person need not be confusing to another person. In my experience it is a common state of affairs for one person in a conversation to be confused and the other not (whether it be because the latter person is pre-confused, post-confused, or simply because their path to understanding a phenomena didn't pass through the state of their interlocutor).

It seems probable to me that I have found this subject more confusing than have others.

LessWrong 2.0 Reader

Archive

Recent comments