Sure - another way of phrasing what I'm saying is that I'm not super interested (as alignment research, at least) in adversarial training that involves looking at difficult subsets of the training distribution, or adversarial training where the proposed solution is to give the AI more labeled examples that effectively extend the training distribution to include the difficult cases.
It would be bad if we build an AI that wasn't robust on the training distribution, of course, but I think of this as a problem already being addressed by the field of ML without any need for looking ahead to AGI.stephen-voris on Prizes for ELK proposals
One more stupid question - how is this different from a "man in the middle" attack? (Term from cryptography where you cannot trust your communications, because of a malicious agent between you and your recipient who's changing your messages)
The current recommended solution for those is encrypting your communication before you send it; I don't know that there are any extant solutions for noticing you've got an MITM situation after the fact.
(I'm copying this into my original review comment so it's easily viewable on the /reviewVoting page, but posting here since it's a new set of thoughts)
A followup question I found myself thinking is: "What do Simulacrum levels actually add to the conversation?" We knew that sometimes people lie. We already knew about Beliefs as Attire, Belief-in-Belief, Professing and Cheering, etc. We knew that social reality was pervasive and politics make us go funny in the head. Do Simulacra add anything?
After thinking a bit, here are my answers:
First, Baudrillard-style simulacrum levels point at a particular progression/mechanism that's interesting You have object level truth that becomes distorted, then masking, then uncoupled completely from reality. This seems relevant to some subsets of Social Reality Woes (the original example of "Vice President of Drudgery" seemed to capture a real phenomenon of a thing that happened to business titles), but it's not obvious to me that this is usually what's going on. Zvi argues that the Baudrillard definition is still fairly intertwined with the Lion definition [LW · GW], but I didn't personally find it that persuasive. (I also don't find it cruxy for anything other than 'should we still call these simulacrum levels?' so I'm not that worried about it).
Second, Simulacrum Level 4 exists. This is mabe vaguely alluded to in the original LW sequences and HPMOR, but I don't think it's really been spelled out. "Belief in Belief" covers Level 3, but it doesn't really address the sort the level of cynicism that goes into someone who is dishonest about their beliefs-in-belief, and what that dishonesty feels like from the inside. This increases my desire to hear Zvi, Benquo or others weigh in about how they think about level 4 these days.nicholaskross on (briefly) RaDVaC and SMTM, two things we should be doing
Good news: my mental model of SMTM could also have them just skimming/cherry-picking the butter quote cited by BasedProf
Bad news: that's not much better than the scenario where SMTM is, in fact, lying on purpose. And I haven't looked deeply enough to tell if there's reasonable alternatives.
More bad news: Yudkowsky and AppliedDivinity both seem to be on the giving-money-to-SMTM-without-checking-on-their-trustworthiness/responding-to-basedprof's-criticisms-publicly grindset. And has SMTM responded to BP publicly either?aphyer on NFTs Are Probably Not Beanie Babies
I think there's a distinction to be drawn here between:
Even if NFTs succeed in sense #1 (with NFTs providing some form of decentralized ownership for large markets and lots of value being embedded in NFTs), that doesn't necessarily imply that they will succeed in sense #2 (with currently-existing NFTs becoming worth a lot and people who hold them making a lot of money).peter-loksa on Why Truth?
What about moral duty to be curious?peter-loksa on Why Truth?
For everytime I am curious about "how the things are?", I would like to be curious also about "what to do?" then. (Curious pragmatism)daniel-kokotajlo on Truthful LMs as a warm-up for aligned AGI
Thanks for this!
I think that working on truthful LMs has a comparative advantage in worlds where:
--We have around 10-40 years until transformative AI
--Transformative AI is built using techniques that resemble modern deep learning
--There is a slow takeoff
--Alignment does not require vastly more theoretical insight (but may require some)
--Our current picture of the risks posed by transformative AI is incomplete
Can you elaborate on what you mean by slow takeoff here?
Also, what do you mean by the current picture of the risks being incomplete? What would it even mean for our picture to be complete?christiankl on You Can Get Fluvoxamine
How do you go about deciding which overseas drug mail-order business to trust?jacob_hilton on Truthful LMs as a warm-up for aligned AGI
one concrete thing I might hope for you to do...
I think this is included in what I intended by "adversarial training": we'd try to find tasks that cause the model to produce negligent falsehoods, train the model to perform better at those tasks, and aim for a model that is robust to someone searching for such tasks.