Posts

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default? 2022-10-25T20:48:50.895Z
Julian_R's Shortform 2020-10-23T22:15:15.351Z

Comments

Comment by Julian_R on Extended Interview with Zhukeepa on Religion · 2024-08-18T21:53:19.062Z · LW · GW

Thank you for recording and posting these, I feel like I learned a lot, both about how to have conversations and lots of little details like the restaurant thing as proto preference synthesizer and the trauma cancer analogy and the Muhammad story and the disendorsing all judgements/resentments thing.

Comment by Julian_R on How to get nerds fascinated about mysterious chronic illness research? · 2024-05-29T02:25:18.600Z · LW · GW

I wonder if, just like young people not thinking clearly about mortality, it's just something healthy people don't tend to think about, partly because it's depressing.

(I'm also someone who got a lot more interested in this kind of thing after my own health issues)

Comment by Julian_R on Am I going insane or is the quality of education at top universities shockingly low? · 2023-11-20T05:03:47.708Z · LW · GW

re institutional incentives, I've heard that part of US News rankings are based on asking survey respondents to evaluate other universities by reputation. Professors elsewhere (can only, and do) evaluate other professors based on the quality of their research, not teaching.

I'm curious, did you check what the quality of teaching would be like at your university before you went? If not, why? If so, why did you pick it anyway?

Comment by Julian_R on Clarifying the palatability theory of obesity · 2022-02-11T13:48:28.472Z · LW · GW

to clarify, I don't understand why positive CICO can increase your weight set point but negative CICO can't decrease it.

Comment by Julian_R on Clarifying the palatability theory of obesity · 2022-02-11T13:46:57.288Z · LW · GW

Guyenet suspects that our brain's weight set point might never go down dramatically after living long enough in the modern world, even if we eventually stop eating palatable food altogether. If true, this would make his theory harder to test, and again, his theory would earn a penalty for being more unfalsifiable, but at the same time, we should be clear about what observations his theory strongly predicts, and rapid weight loss on unpalatable diets is just not one of them.

I don't understand how CICO can coexist with the idea of a weight set point. If the mechanism of gaining weight is CICO via overeating because food is so palatable, then it seems natural than on unpalatable food you would eat less, and thus I would expect rapid weight loss on unpalatable diets as a prediction of the theory.

Comment by Julian_R on Redwood Research’s current project · 2021-10-14T23:35:56.669Z · LW · GW

I was confused by Buck's response here because I thought we were going for worst-case quality until I realised:

  1. The model will have low quality on those prompts almost by definition - that's the goal.
  2. Given that, we also want to have a generally useful model - for which the relevant distribution is 'all fanfiction', not "prompts that are especially likely to have a violent continuation".

In between those two cases is 'snippets that were completed injuriously in the original fanfic ... but could plausibly have non-violent completions', which seems like the interesting case to me.

I suppose one possibility is to construct a human-labelled dataset of specifically these cases to evaluate on.