Posts

Comments

Comment by yc (AAA) on Drake Thomas's Shortform · 2024-10-23T20:04:16.852Z · LW · GW

Just saw the OP replied in another comment that he is offering advice.

Comment by yc (AAA) on Matt Goldenberg's Short Form Feed · 2024-10-13T01:34:15.322Z · LW · GW

It’s probably less on all internet but more on the rlhf guidelines (I imagine the human reviewers receive a guideline based on the LLM-training company’s policy, legal, and safety experts’ advice). I don’t disagree though that it could present a relatively more objective view on some topics than a particular individual (depending on the definition of bias).

Comment by yc (AAA) on Language Models Model Us · 2024-10-09T21:17:01.222Z · LW · GW

Yeah for sure! 

For PII - A relatively recent survey paper: https://arxiv.org/pdf/2403.05156

For bias/fairness - survey paper: https://arxiv.org/pdf/2309.00770 

This is probably far from complete, but I think the references in the survey paper, and in the Staab et al. paper should have some additional good ones as well.

Comment by yc (AAA) on Language Models Model Us · 2024-10-08T03:45:38.553Z · LW · GW

This is a relatively common topic in responsible AI; glad to see reference on Staab et al, 2023! For PII (Personally Identifiable Information) - RLHF typically is the go to method for refusing such prompts, but since they are easy to be undone, efforts had been put into cleaning the pretaining safety data. For demographics inference - seems to be bias related as well.

Comment by yc (AAA) on MichaelDickens's Shortform · 2024-10-04T19:16:32.016Z · LW · GW

No worries; thanks!

Comment by yc (AAA) on MichaelDickens's Shortform · 2024-10-04T17:10:36.603Z · LW · GW

Examples of right leaning projects that got rejected by him due to his political affiliation, and if these examples are AI safety related

Comment by yc (AAA) on MichaelDickens's Shortform · 2024-10-04T07:27:07.286Z · LW · GW

Out of curiosity - “it's because Dustin is very active in the democratic party and doesn't want to be affiliated with anything that is right-coded” Are these projects related to AI safety or just generally? And what are some examples?

Comment by yc (AAA) on How to choose what to work on · 2024-09-26T05:50:47.507Z · LW · GW

1. Maybe for everyone it would be different. It might be hard to have a standard formula to find obsessions. Sometimes it may come naturally through life events/observations/experiences. If no such experience exists yet, or one seems to be interested in multiple things, I have received an advice to try different things, and see what you would like (I agree with it). Now that I think about it, it would also be fun to survey people and ask them how they got their passion/do what they do (and to derive some standard formula/common elements if possible)!

2. I think maybe we can approach with " the best of one's ability", and when we reach that, the rest may depend a lot on luck and other things too. Maybe through time, we could get better eventually, or maybe some observations/insights accidentally happened, and we found a breakthrough point, with the right accumulation of previous experience/knowledge.

Comment by yc (AAA) on What is a world-model? · 2024-09-25T07:40:41.084Z · LW · GW

https://arxiv.org/pdf/1803.10122 I have a similar question and found this paper source. One thing I am not sure of is if this is no longer the same concept/close enough concept that people currently talk about, nor if this is the origin.

https://www.sciencedirect.com/science/article/pii/S0893608022001150 This paper seems to suggest something at least about multimodal perception with reinforcement learning/agent type of set up.

Comment by yc (AAA) on The alignment stability problem · 2024-09-25T07:28:14.570Z · LW · GW

“A direction: asking if and how humans are stably aligned.” I think this is a great direction, and the next step seems to be breaking out what are humans aligned to - the examples here seems to mention some internal value alignment, but wondering if it would also mean external value system alignment.