Posts

Goal-misgeneralization is ELK-hard 2023-06-10T09:32:50.397Z
Hutter-Prize for Prompts 2023-03-24T21:26:41.810Z
The AGI needs to be honest 2021-10-16T19:24:09.780Z

Comments

Comment by rokosbasilisk on Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor · 2024-01-11T11:54:19.767Z · LW · GW

though as Geoff Hinton has pointed out, 'confabulations' might be a better word

I think yann lecun was the first one to using this word https://twitter.com/ylecun/status/1667272618825723909

Comment by rokosbasilisk on Alignment can improve generalisation through more robustly doing what a human wants - CoinRun example · 2023-11-21T15:54:15.361Z · LW · GW

not much information is given regarding that so far, i was curious about that too

Comment by rokosbasilisk on Alignment can improve generalisation through more robustly doing what a human wants - CoinRun example · 2023-11-21T13:16:06.811Z · LW · GW

"Algorithm for Concept Extrapolation"

Comment by rokosbasilisk on [deleted post] 2023-10-06T21:27:10.103Z
Comment by rokosbasilisk on Prizes for matrix completion problems · 2023-08-08T19:37:49.457Z · LW · GW

I don't see any recent publications for paul christiano related to this. So i guess the problem[s] is still open.

Comment by rokosbasilisk on Information Loss --> Basin flatness · 2023-06-04T21:18:56.673Z · LW · GW

parameters before L is less than ,

should this be after?

Comment by rokosbasilisk on On AutoGPT · 2023-04-13T19:07:59.125Z · LW · GW

AutoGPT was created by a non-coding VC

 

It looks like you are confusing autoGPT with babyagi which was created by yohei nakajima who is a VC. the creator of autoGPT (Toran Bruce Richards) is a game-developer with a decent programming (game-development) experience. Even the figure shown here is that from babyagi (https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/).

Comment by rokosbasilisk on interpreting GPT: the logit lens · 2023-04-09T10:07:39.895Z · LW · GW

47 layers layer

47 layers later ?

Comment by rokosbasilisk on A central AI alignment problem: capabilities generalization, and the sharp left turn · 2023-04-08T14:25:44.906Z · LW · GW
Comment by rokosbasilisk on Hutter-Prize for Prompts · 2023-03-28T08:27:10.675Z · LW · GW

really interesting idea.

Comment by rokosbasilisk on Hutter-Prize for Prompts · 2023-03-25T09:13:09.810Z · LW · GW

Regarding the first one i am not expecting a single-prompt to generate the entirity of enwiki8/9. I am more interested in finding a set of prompts with a lookup table if possible to replicate enwiki data.

Thanks for the pointer for chincilla post, will look into it.

Comment by rokosbasilisk on [deleted post] 2023-03-25T07:37:49.463Z

not yet

Comment by rokosbasilisk on [deleted post] 2023-03-24T21:18:23.485Z
Comment by rokosbasilisk on Alignment By Default · 2023-01-30T06:19:53.541Z · LW · GW

This requires hitting a window - our data needs to be good enough that the system can tell it should use human values as a proxy, but bad enough that the system can’t figure out the specifics of the data-collection process enough to model it directly. This window may not even exist.

 

are there any real world examples of this? not necessarily in human-values setting 

Comment by rokosbasilisk on ARC's first technical report: Eliciting Latent Knowledge · 2022-03-22T14:02:40.031Z · LW · GW

From a complexity theoretic viewpoint, how hard could ELK be?  is there any evidence that ELK is decidable?

Comment by rokosbasilisk on ELK prize results · 2022-03-13T13:43:18.585Z · LW · GW

is there a separate post for "train a reporter that is useful for another AI" proposal?

Comment by rokosbasilisk on ARC's first technical report: Eliciting Latent Knowledge · 2022-01-17T21:41:08.779Z · LW · GW
Comment by rokosbasilisk on Eliciting Latent Knowledge Via Hypothetical Sensors · 2022-01-10T09:56:12.863Z · LW · GW
Comment by rokosbasilisk on Visible Thoughts Project and Bounty Announcement · 2021-12-06T16:27:51.960Z · LW · GW

silly idea: instead of thought-annotating ai-dungeon plays, we can start with annotating thoughts for akinator gameruns.

pros: much more easier and faster way to build a dataset, with less ambiguity

cons: somewhat restricted than the original idea.

Comment by rokosbasilisk on The AGI needs to be honest · 2021-10-17T21:28:52.417Z · LW · GW

The proof need not be bogus, it can be a long valid proof but since you are describing the problem in natural language,  the proof generated by the AGI need not be for the problem that you described.

Comment by rokosbasilisk on The AGI needs to be honest · 2021-10-17T20:24:05.626Z · LW · GW

Also, the AGI can generate a long valid proof but it may not be for the question you have asked, since the assumption is that the problems described in natural language and its the AGI's job to understand and convert it to formal language and then prove it

I think instead of recursively asking for higher level proof it should be a machine-checkable regarding the correctness of the AGI itself?

Comment by rokosbasilisk on The AGI needs to be honest · 2021-10-17T18:34:54.455Z · LW · GW

verifying a proof may run in polynomial time compared to the exponential of finding one, but it doesn't rule out the possibility that there exist a large enough proof which will be hard to check. 

There are many algorithms which are polynomial in time but are far worse to run in reality. 

Comment by rokosbasilisk on The AGI needs to be honest · 2021-10-17T18:32:30.473Z · LW · GW

language-models if they are AGI would surely surpass human-level understanding of language, humans need language for communication and book-keeping, "words" in any language are mostly for interesting abstractions from a human point-of-view, as for language models it need not have any language since it doesn't have an internal dialogue like humans do. 

As it reaches a certain level of intelligence it starts forming increasingly complex abstractions that don't have (won't have) any vocabulary. It would be impossible to interpret its reasoning, and the only way left is to accept it. 

Comment by rokosbasilisk on The AGI needs to be honest · 2021-10-17T15:46:27.614Z · LW · GW

verifying proof for riemann-hypothesis is not harder than generating one, but say if you have access to one alleged-proof of riemann-hypothesis which is long enough such that verifying itself is super-hard, then you have no evidence to say that the proof is correct unless you prove that the AGI generating the proof is indeed capable of generating such a proof and being honest to you.