Posts

Comments

Comment by Past Account (zachary-robertson) on What's with all the bans recently? · 2024-04-06T14:32:22.924Z · LW · GW

Just wanted to give some validation. I left a comment on this post a while ago pointing out how one (or apparently a few) users can essentially down vote you however they like to silence opinions they don't agree with. Moderation is tricky and it is important to remember why. Most users on a website forum are lurkers meaning that trying to gather feedback on moderation policies has a biased sampling problem. The irony on likely not being able to leave another comment or engage in discussion is not lost on me.

At first, I thought getting soft-banned meant my "contributions" weren't valuable. For context, I study AI and integrate it into my thinking which hadn't been received well on this site. Ironically, not being able to interact with other people pushed me to explore deeper discussions with AI. For example, I have this entire thread to Claude3 and it agreed there were some changes to be made on the rate-limiting system.

It does seem concerning that as a PhD student studying AI alignment, I was effectively pushed out of participating in discussions on LessWrong and the AI Alignment Forum due to the automatic rate limiting system and disagreements with senior user whose downvotes carry much more weight. On the other hand, compared to a few years ago during COVID, now I have colleagues and AI that I have a lot more shared context with than users on this forum so this just matters less to me. I return only because I am taking a class on social computing and am revisiting what makes for good/bad experiences.

Anyway, hopefully this gives you some solace. I would encourage you to seek other sources of validation. There are so many more options than you think! :)

Comment by Past Account (zachary-robertson) on Automatic Rate Limiting on LessWrong · 2023-07-28T22:00:14.864Z · LW · GW

Hi, I think this is incorrect. I had to wait 7 days to write this comment and then almost forgot to. I wrote a comment critiquing a very long post (which was later removed) and was down-voted (by a single user I think) after justifying why I wrote the comment with AI-assistance. My understanding is that a single user with enough karma power can effectively "silence" any opinion they don't like by down-voting a few comments in an exchange.

I think the site has changed enough over the last several months that I am considering leaving. For me personally, choosing between having a conversation with a random commenter on this site vs. an AI model is just about at a wash. I even hesitate to write this comment given how over-confident your comment seemed i.e. I won't be able to interact with this site again for another week.

Comment by zachary-robertson on [deleted post] 2023-07-19T19:20:21.147Z

This is my endorsed review of the article.

Comment by zachary-robertson on [deleted post] 2023-07-19T19:17:07.679Z

This seems like a rhetorical question. Both of our top-level comments seem to reach similar conclusions, but it seems you regret the time spent engaging with the OP to write your comment. This took 10 min, most spent writing this comment. What is your point?

Comment by zachary-robertson on [deleted post] 2023-07-19T16:49:48.773Z

At over 15k tokens, reading the full article requires significant time and effort. While it aims to provide comprehensive detail on QACI, much of this likely exceeds what is needed to convey the core ideas. The article could be streamlined to more concisely explain the motivation, give mathematical intuition, summarize the approach, and offer brief examples. Unnecessary elaborations could be removed or included as appendices. This would improve clarity and highlight the essence of QACI for interested readers.

My acquired understanding is that the article summarizes a new AI alignment approach called QACI (Question-Answer Counterfactual Interval). QACI involves generating a factual "blob" tied to human values, along with a counterfactual "blob" that could replace it. Mathematical concepts like realityfluid and Loc() are used to identify the factual blob among counterfactuals. The goal is to simulate long reflection by iteratively asking the AI questions and improving its answers. QACI claims to avoid issues like boxing and embedded agency through formal goal specification.

While the article provides useful high-level intuition, closer review reveals limitations in QACI's theoretical grounding. Key concepts like realityfluid need more rigor, and details are lacking on how embedded agency is avoided. There are also potential issues around approximation and vulnerabilities to adversarial attacks that require further analysis. Overall, QACI seems promising but requires more comparison with existing alignment proposals and formalization to adequately evaluate. The article itself is reasonably well-written, but the length and inconsistent math notation create unnecessary barriers.

Comment by Past Account (zachary-robertson) on Nature < Nurture for AIs · 2023-06-06T04:48:22.015Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Conditional Prediction with Zero-Sum Training Solves Self-Fulfilling Prophecies · 2023-05-29T18:32:00.145Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Challenge: construct a Gradient Hacker · 2023-03-11T00:52:45.689Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on AI alignment researchers don't (seem to) stack · 2023-02-21T17:31:43.107Z · LW · GW

[Deleted]

Comment by zachary-robertson on [deleted post] 2022-12-11T19:56:35.927Z

So are you suggesting that ChatGPT gets aligned to the values of the human contractor(s) that provide data during finetuning, and then carries these values forward when interacting with users?

You are correct that this appears to stand in contrast one of the key benefits of CIRL games. Namely, that they allow the AI to continuously update towards the user's values. The argument I present is that ChatGPT can still learn something about the preferences of the user it is interacting with through the use of in-context value learning. During deployment, ChatGPT will then be able to learn preferences in-context allowing for continuous updating towards the user's values like in the CIRL game.

Comment by zachary-robertson on [deleted post] 2022-12-11T04:36:04.806Z

The reward is from the user which ranks candidate responses from ChatGPT. This is discussed more in OpenAI’s announcement. I edited the post to clarify this.

Comment by Past Account (zachary-robertson) on Why Weren't Hot Air Balloons Invented Sooner? · 2022-10-18T04:15:53.341Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Clarifying the Agent-Like Structure Problem · 2022-10-01T05:37:00.768Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Simulators · 2022-09-11T18:42:03.762Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Simulators · 2022-09-09T03:33:49.362Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on What's the difference between newer Atari-playing AI and the older Deepmind one (from 2014)? · 2021-11-02T21:58:22.611Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on rohinmshah's Shortform · 2021-08-05T18:16:12.174Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-30T16:53:11.635Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-29T17:26:54.467Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-29T17:20:50.103Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on What are some triggers that prompt you to do a Fermi estimate, or to pull up a spreadsheet and make a simple/rough quantitative model? · 2021-07-26T20:14:09.355Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on The Mountaineer's Fallacy · 2021-07-18T20:30:49.656Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on MDP models are determined by the agent architecture and the environmental dynamics · 2021-05-31T20:59:11.912Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on MDP models are determined by the agent architecture and the environmental dynamics · 2021-05-30T13:21:16.905Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on MDP models are determined by the agent architecture and the environmental dynamics · 2021-05-29T18:02:59.714Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on MDP models are determined by the agent architecture and the environmental dynamics · 2021-05-29T00:12:33.511Z · LW · GW

[Deleted]

Comment by zachary-robertson on [deleted post] 2021-05-22T23:15:52.850Z

Because . They are the same. Does that help?

Comment by Past Account (zachary-robertson) on SGD's Bias · 2021-05-22T14:03:25.754Z · LW · GW

[Deleted]

Comment by zachary-robertson on [deleted post] 2021-05-21T14:23:42.924Z

The term is meant to be a posterior distribution after seeing data. If you have a good prior you could take . However, note could be high. You want trade-off between the cost of updating the prior and the loss reduction.

Example, say we have a neural network. Then our prior would be the initialization and the posterior would be the distribution of outputs from SGD.

(Btw thanks for the correction)

Comment by Past Account (zachary-robertson) on Open and Welcome Thread - May 2021 · 2021-05-06T14:11:58.548Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Open and Welcome Thread - May 2021 · 2021-05-06T13:46:09.388Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-04-24T15:48:20.571Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-04-24T01:28:22.324Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-04-23T20:39:29.614Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on NTK/GP Models of Neural Nets Can't Learn Features · 2021-04-23T16:36:35.235Z · LW · GW

[Deleted]

Comment by zachary-robertson on [deleted post] 2021-03-28T22:00:20.665Z

Your example is interesting and clarifies exchange rates. However,

The shadow price quantifies the opportunity cost, so if I'm paid my shadow price, then that's just barely enough to cover my opportunity cost.

This is an interpretive point I'd like to focus on. When you move a constraint, in this case with price, the underlying equilibrium of the optimization shifts. From this perspective your usage of the word 'barely' stops making sense to me. If you were to 'overshoot' you wouldn't be optimal in the new optimization problem.

At this point I understand that the cheerful price will be equivalent to or more than the shadow price. You want to be able to shift the equilibrium point and have slack left over. It just seems obvious, to me, that shadow price isn't an exactly measurable thing in this context and so you'd naturally be led to make a confidence interval (belief) for it. Cheerful price is just the upper estimate on that. Hence, I'm surprised why this is being treated as a new / distinct concept.

Comment by zachary-robertson on [deleted post] 2021-03-28T21:50:25.374Z

I suppose this is the most correct answer. I'm not really updating very much though. From my perspective I'll continue to see cheerful price as a psychological/subjective reinvention of shadow price.

Edit: It seems clear in this context, shadow price isn't exactly measurable. Cheerful price is just the upper estimate on the shadow price.

Comment by Past Account (zachary-robertson) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-03-04T05:48:33.717Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-02-28T18:22:55.507Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Recognizing Numbers · 2021-01-26T15:56:17.336Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on What is going on in the world? · 2021-01-18T22:22:18.967Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on The Good Try Rule · 2020-12-28T23:55:33.335Z · LW · GW

[Deleted]

Comment by zachary-robertson on [deleted post] 2020-12-08T16:06:09.644Z

I'm going to have to spend some time unpacking the very compact notation in the post, but here are my initial reactions.

I should apologize a bit for that. To a degree I wasn't really thinking about any of the concepts in the title and only saw the connection later.

First, very clean proof of the lemma, well done there.

Thanks!

Second... if I'm understanding this correctly, each neuron activation (or set of neuron activations?) would contain all the information from some-part-of-data relevant to some-other-part-of-data and the output.

To be honest, I haven't thought about interpreting the monad beyond the equivalence with neural networks. One thing I noticed early on is that you can create sequences of activations that delete information in the limit. For example, the ReLU activation is the limit of the SoftMax (change log base). I think something like this could be seen as abstracting away unnecessary data.

Better yet, it looks like the OP gives a recipe for unpacking those natural abstractions?

I'm not sure. I do think the method can justify the reuse of components (queries) and I wouldn't be surprised if this is a pre-requisite for interpreting network outputs. Most of my interest comes from trying to formalize the (perhaps obvious) idea that anything that can be reduced to a sequence of classifications can be used to systematically translate high-level reasoning about these processes into a neural networks.

I guess it's best to give an example of how I currently think about abstraction. Say we take the position that every object is completely determined by the information contained in a set of queries such that . For a picture, consider designing a game-avatar (mii character) by fiddling around with some knobs. The formalism lets us package observations as queries using return. Thus, we're hypothesizing that we can take a large collection of queries and make them equivalent to a small set of queries. Said another way, we can answer a large collection of queries by answering a much smaller set of 'principle' queries. In fact, if our activation was linear we'd be doing PCA. How we decide to measure success determines what abstraction is learned. If we only use the to answer a few queries then we're basically doing classification. However, if the have to be able to answer every query about then we're doing auto-encoding.

Comment by Past Account (zachary-robertson) on Doing discourse better: Stuff I wish I knew · 2020-10-01T12:48:18.789Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Doing discourse better: Stuff I wish I knew · 2020-09-30T13:51:20.322Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Richard Ngo's Shortform · 2020-08-21T00:24:38.789Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Richard Ngo's Shortform · 2020-08-20T23:13:56.178Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Richard Ngo's Shortform · 2020-08-20T22:44:48.091Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Richard Ngo's Shortform · 2020-08-20T19:10:39.275Z · LW · GW

[Deleted]

Comment by Past Account (zachary-robertson) on Developmental Stages of GPTs · 2020-08-16T13:48:49.728Z · LW · GW

[Deleted]