Posts

"You're the most beautiful girl in the world" and Wittgensteinian Language Games 2024-04-20T14:54:54.503Z
The argument for near-term human disempowerment through AI 2024-04-16T04:50:53.828Z
Reverse Regulatory Capture 2024-04-11T02:40:46.474Z
On the Confusion between Inner and Outer Misalignment 2024-03-25T11:59:34.553Z
The Best Essay (Paul Graham) 2024-03-11T19:25:42.176Z
Can we get an AI to do our alignment homework for us? 2024-02-26T07:56:22.320Z
What's the theory of impact for activation vectors? 2024-02-11T07:34:48.536Z
Notice When People Are Directionally Correct 2024-01-14T14:12:37.090Z
Are Metaculus AI Timelines Inconsistent? 2024-01-02T06:47:18.114Z
Random Musings on Theory of Impact for Activation Vectors 2023-12-07T13:07:08.215Z
Goodhart's Law Example: Training Verifiers to Solve Math Word Problems 2023-11-25T00:53:26.841Z
Upcoming Feedback Opportunity on Dual-Use Foundation Models 2023-11-02T04:28:11.586Z
On Having No Clue 2023-11-01T01:36:10.520Z
Is Yann LeCun strawmanning AI x-risks? 2023-10-19T11:35:08.167Z
Don't Dismiss Simple Alignment Approaches 2023-10-07T00:35:26.789Z
What evidence is there of LLM's containing world models? 2023-10-04T14:33:19.178Z
The Role of Groups in the Progression of Human Understanding 2023-09-27T15:09:45.445Z
The Flow-Through Fallacy 2023-09-13T04:28:28.390Z
Chariots of Philosophical Fire 2023-08-26T00:52:45.405Z
Call for Papers on Global AI Governance from the UN 2023-08-20T08:56:58.745Z
Yann LeCun on AGI and AI Safety 2023-08-06T21:56:52.644Z
A Naive Proposal for Constructing Interpretable AI 2023-08-05T10:32:05.446Z
What does the launch of x.ai mean for AI Safety? 2023-07-12T19:42:47.060Z
The Unexpected Clanging 2023-05-18T14:47:01.599Z
Possible AI “Fire Alarms” 2023-05-17T21:56:02.892Z
Google "We Have No Moat, And Neither Does OpenAI" 2023-05-04T18:23:09.121Z
Why do we care about agency for alignment? 2023-04-23T18:10:23.894Z
Metaculus Predicts Weak AGI in 2 Years and AGI in 10 2023-03-24T19:43:18.522Z
Wittgenstein's Language Games and the Critique of the Natural Abstraction Hypothesis 2023-03-16T07:56:18.169Z
The Law of Identity 2023-02-06T02:59:16.397Z
What is the risk of asking a counterfactual oracle a question that already had its answer erased? 2023-02-03T03:13:10.508Z
Two Issues with Playing Chicken with the Universe 2022-12-31T06:47:52.988Z
Decisions: Ontologically Shifting to Determinism 2022-12-21T12:41:30.884Z
Is Paul Christiano still as optimistic about Approval-Directed Agents as he was in 2018? 2022-12-14T23:28:06.941Z
How is the "sharp left turn defined"? 2022-12-09T00:04:33.662Z
What are the major underlying divisions in AI safety? 2022-12-06T03:28:02.694Z
AI Safety Microgrant Round 2022-11-14T04:25:17.510Z
Counterfactuals are Confusing because of an Ontological Shift 2022-08-05T19:03:46.925Z
Getting Unstuck on Counterfactuals 2022-07-20T05:31:15.045Z
Which AI Safety research agendas are the most promising? 2022-07-13T07:54:30.427Z
Is CIRL a promising agenda? 2022-06-23T17:12:51.213Z
Has there been any work on attempting to use Pascal's Mugging to make an AGI behave? 2022-06-15T08:33:20.188Z
Want to find out about our events? 2022-06-09T15:52:14.544Z
Want to find out about our events? 2022-06-09T15:48:37.789Z
AI Safety Melbourne - Launch 2022-05-29T16:37:52.538Z
The New Right appears to be on the rise for better or worse 2022-04-23T19:36:58.661Z
AI Alignment and Recognition 2022-04-08T05:39:36.015Z
Strategic Considerations Regarding Autistic/Literal AI 2022-04-06T14:57:11.494Z
Results: Circular Dependency of Counterfactuals Prize 2022-04-05T06:29:56.252Z
What are some ways in which we can die with more dignity? 2022-04-03T05:32:58.957Z

Comments

Comment by Chris_Leong on Scaling of AI training runs will slow down after GPT-5 · 2024-04-26T19:16:58.409Z · LW · GW

Only 33% confidence? It seems strange to state X will happen if your odds are < 50%

Comment by Chris_Leong on On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg · 2024-04-23T07:04:13.785Z · LW · GW

Do you have any thoughts on whether it would make sense to push for a rule that forces open-source or open-weight models to be released behind an API for a certain amount of time before they can be released to the public?

Comment by Chris_Leong on "You're the most beautiful girl in the world" and Wittgensteinian Language Games · 2024-04-21T12:16:48.704Z · LW · GW

Would be very curious to know why people are downvoting this post.

Is it:
a) Too obvious
b) Too pretentious
c) Poorly written
d) Unsophisticated analysis
e) Promoting dishonesty

Or maybe something else.

Comment by Chris_Leong on I'm open for projects (sort of) · 2024-04-20T17:40:11.417Z · LW · GW

You say counterfactuals in CLDT should correspond to consistent universes


That's not quite what I wrote in this article:

However, this now seems insufficient as I haven't explained why we should maintain the consistency conditions over comparability after making the ontological shift. In the past, I might have said that these consistency conditions are what define the problem and that if we dropped them it would no longer be Newcomb's Problem... My current approach now tends to put more focus on the evolutionary process that created the intuitions and instincts underlying these incompatible demands as I believe that this will help us figure out the best way to stitch them together.

I'll respond to the other component of your question later.

Comment by Chris_Leong on Express interest in an "FHI of the West" · 2024-04-20T14:33:16.613Z · LW · GW

Just thought I'd add a second follow-up comment.

You'd have a much better idea of what made FHI successful than I would. At the same time, I would bet that in order to make this new project successful - and be its own thing - it'd likely have to break at least one assumption behind what made old FHI work well.

Comment by Chris_Leong on Express interest in an "FHI of the West" · 2024-04-20T14:31:23.754Z · LW · GW

Then much later, when we ran the AI Alignment Prize here on LW, I also noticed that the prize by itself wasn't too important; the interactions between newcomers and old-timers were a big part of what drove the thing.

 

Could you provide more detail?

Comment by Chris_Leong on Express interest in an "FHI of the West" · 2024-04-19T10:25:07.419Z · LW · GW

Reading your list, a bunch of it seems to be about decisions about what to work on or what locally to pursue.

 

I think my list appears more this way then I intended because I gave some examples of projects I would be excited by if they happened. I wasn't intending to stake out a strong position as to whether these projects should projects chosen by the institute vs. some examples of projects that it might be reasonable for a researcher to choose within that particular area.

Comment by Chris_Leong on I'm open for projects (sort of) · 2024-04-19T04:11:17.486Z · LW · GW

I'd love your feedback on my thoughts on decision theory.

If you're trying to get a sense of my approach in order to determine whether it's interesting enough to be worth your time, I'd suggest starting with this article (3 minute read).

I'm also considering applying for funding to create a conceptual alignment course.

Comment by Chris_Leong on Express interest in an "FHI of the West" · 2024-04-19T00:22:13.774Z · LW · GW

I strongly agree with Owen's suggestions about figuring out a plan grounded in current circumstances, rather than reproducing what was.

Here's some potentially useful directions to explore.

Just to be clear, I'm not claiming that it should adopt all of these. Indeed, an attempt to adopt all of these would likely be incoherent and attempting to pursue too many different directions at the same time.

 These are just possibilities, some subset of which is hopefully useful:

  • Rationality as more of a focus area: Given that Lightcone runs Less Wrong, an obvious path to explore is whether rationality could be further developed by providing people either a fellowship or a permanent position to work on developing the art:
    • Being able to offer such paid positions might allow you to draw contributions from people with rare backgrounds. For example, you might decide it would be useful to better understand anthropology as a way of better understanding other cultures and practices and so you could directly hire an anthropologist to help with that.
    • It would also help with projects that would be valuable, but which would be a slog and require specific expertise. For example, it would be great to have someone update the sequences in light of more recent psychological research.
  • Greater focus on entrepreneurship:
    • You've already indicated your potential interest in taking it this direction by adding it as one of the options on your form.
    • This likely makes sense given that Lightcone is located in the Bay Area, the city with the most entrepreneurs and venture capitalists in the world.
    • Insofar as a large part of the impact of FHI was the projects it inspired elsewhere, it may make sense to more directly attempt this kind of incubation.
  • Response to the rise of AI:
    • One of the biggest shifts in the world since FHI was started has been the dramatic improvements in AI
    • One response to this would be to focus more on the risks and impacts from AI. However, there are already a number of institutions focusing on this, so this might simply end up being a worse version of them:
      • You may also think that you might be able to find a unique angle, for example, given how Eliezer was motivated to create rationality in order to help people understand his arguments on AI safety, it might be valuable for there to be a research program which intertwines those two elements.
      • Or you might identify areas, such as AI consciousness, that are still massively neglected
    • Another response would be to try to figure out how to leverage AI:
      • Would it make sense to train an AI agent on Less Wrong content?
      • As an example, how could AI be used to develop wisdom?
    • Another response would be to decide that better orgs are positioned to pursue these projects.
  • Is there anything in the legacy of MIRI, CFAR of FHI that is particularly ripe for further development?:
    • For example, would it make sense for someone to try to publish an explanation of some of the ideas produced by MIRI on decision theory in a mainstream philosophical journal?
    • Perhaps some techniques invented by CFAR could be tested with a rigorous academic study?
  • Potential new sources of ideas:
    • There seems to have been a two-way flow of ideas between LW/EA and FHI.
    • While there may still be more ideas within these communities that are deserving of further exploration, it may also make sense to consider whether there any new communities that could provide a novel source of ideas?:
      • A few possibilities immediately come to mind: post-rationality, progress studies, sensemaking, meditation, longevity, predictions.
  • Less requirement for legibility than FHI:
    • While FHI leaned towards the speculative end of academia, there was still a requirement for projects to still be at least somewhat academically legibility. What is enabled by no longer having that kind of requirement?
  • Opportunities for alpha from philosophical rigour:
    • This was one of the strengths of FHI - bringing philosophical rigour to new areas. It may be worth exploring how this could be preserved/carried forward?
    • One of the strengths of academic philosophy - compared to the more casual writing that is popular on Less Wrong - is its focus on rigour and drawing out distinctions. If this institute were able to recruit people with strong philosophical backgrounds, are there any areas that would be particularly ripe for applying this style of thinking?
    • Pursuing this direction might be a mistake if you would struggle to recruit the right people. It may turn out that the placement of FHI within Oxford was vital for drawing the philosophical talent of the calibre that they drew. 
Comment by Chris_Leong on Transformers Represent Belief State Geometry in their Residual Stream · 2024-04-17T00:55:30.502Z · LW · GW

"The structure of synchronization is, in general, richer than the world model itself. In this sense, LLMs learn more than a world model" given that I expect this is the statement that will catch a lot of people's attention.

Just in case this claim caught anyone else's attention, what they mean by this is that it contains:
• A model of the world
• A model of the agent's process for updating its belief about which state the world is in

Comment by Chris_Leong on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-16T21:38:18.447Z · LW · GW

This strongly updates me towards expecting the institute to produce useful work.

Comment by Chris_Leong on What convincing warning shot could help prevent extinction from AI? · 2024-04-14T07:13:48.581Z · LW · GW

Agreed.

I would love to see more thinking about this.

We've already seen one moment dramatically change the strategic landscape: ChatGPT.

This shift could actually be small compared to if there was a disaster.

Comment by Chris_Leong on A Gentle Introduction to Risk Frameworks Beyond Forecasting · 2024-04-12T13:18:30.551Z · LW · GW

I'll provide an example:

People sometimes dismiss exposure when studying GCR, since “everyone is exposed by definition”. This isn’t always true, and even when it is, it still points us towards interesting questions.


Even if there are some edge cases when this applies to existential risks, it doesn't necessarily mean that it is prominent enough to be worthwhile including as an element in an x-risk framework.
 

Comment by Chris_Leong on A Gentle Introduction to Risk Frameworks Beyond Forecasting · 2024-04-12T06:29:17.365Z · LW · GW

Thanks for this post.

My (hot-)take: lots of useful ideas and concepts, but also many examples of people thinking everything is a nail/wanting to fit risks inside their pre-existing framework.

Comment by Chris_Leong on Reverse Regulatory Capture · 2024-04-11T06:04:56.221Z · LW · GW

I think that sometimes it’s useful to just discuss a concept in the abstract. I’ll leave it to others to discuss this in the concrete.

Comment by Chris_Leong on RTFB: On the New Proposed CAIP AI Bill · 2024-04-11T03:38:19.069Z · LW · GW

My thoughts:
a) Some of the penalties seemed too weak
b) Uncertain whether we want license appeals decided by judges. I would the approval to be decided on technical grounds, but for judges to intervene to ensure that the process is fair. Or maybe a committee that is mostly technical, but that which contains a non-voting legal expert to ensure compliance.
c) I would prefer a strong stand against dangerous open-weight models.

Comment by Chris_Leong on Was Releasing Claude-3 Net-Negative? · 2024-04-03T00:03:56.549Z · LW · GW

I currently believe it’s unlikely that Claude-3 will cause OpenAI to release their next model any sooner (they released GPT4 on Pi day after all), nor for future models

There's now a perception that Claude is better than ChatGPT and I don't believe that Sam Altman will allow that to persist for long.

Comment by Chris_Leong on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-27T00:10:24.279Z · LW · GW

What did he say that was dishonest in the China article? (It's paywalled).

Comment by Chris_Leong on On the Confusion between Inner and Outer Misalignment · 2024-03-26T01:55:51.319Z · LW · GW

That's an excellent point.

I agree. I think that's probably a better way of clarifying the confusion that what I wrote.

Comment by Chris_Leong on On the Confusion between Inner and Outer Misalignment · 2024-03-25T14:20:25.014Z · LW · GW

I was under the impression that this meant that a sufficiently powerful AI would be outer-aligned by default, and that this is what enables several of the kinds of deceptions and other dangers we're worried about.


This would be the case if inner alignment meant what you think it does, but it doesn't.

Is the difference between the goal being specified by humans vs being learned and assumed by the AI itself?

Yeah, outer alignment is focused on whether we can define what we want the AI to learn (ie. write down a reward function). Inner alignment focused on what the learned artifact (the AI) ends up learning to pursue.

Comment by Chris_Leong on On the Confusion between Inner and Outer Misalignment · 2024-03-25T14:15:00.457Z · LW · GW

See: DeepMind's How undesired goals can arise with correct rewards for an empirical example of inner misalignment.

From a quick skim, that post seems to only be arguing against scheming due to inner misalignment. Let me know if I'm wrong.

Comment by Chris_Leong on On the Confusion between Inner and Outer Misalignment · 2024-03-25T13:38:28.638Z · LW · GW

I don't think that's quite accurate:  any sufficiently powerful AI will know what we want it to do.

Comment by Chris_Leong on Habryka's Shortform Feed · 2024-03-25T08:24:45.921Z · LW · GW

I agree with Gwern. I think it's fairly rare that someone wants to write the whole entry themselves or articles for all concepts in a topic.

It's much more likely that someone just wants to add their own idiosyncratic takes on a topic. For example, I'd love to have a convenient way to write up my own idiosyncratic takes on decision theory. I tried including some of these in the main Wiki, but it (understandably) was reverted.

I expect that one of the main advantages of this style of content would be that you can just write a note without having to bother with an introduction or conclusion.

I also think it would be fairly important (though not at the start) to have a way of upweighting the notes added by particular users.

I agree with Gwern that this may result in more content being added to the main wiki pages when other users are in favour of this.

Comment by Chris_Leong on Habryka's Shortform Feed · 2024-03-25T08:20:32.228Z · LW · GW
  1. Users can just create pages corresponding to their own categories
  2. Like Notion we could allow two-way links between pages so users would just tag the category in their own custom inclusions.
Comment by Chris_Leong on Habryka's Shortform Feed · 2024-03-25T08:17:08.902Z · LW · GW

Cool idea, but before doing this one obvious inclusion would be to make it easier to tag LW articles, particularly your own articles, in posts by @including them.

Comment by Chris_Leong on ChatGPT can learn indirect control · 2024-03-25T08:12:00.905Z · LW · GW

I just dumped 100 mana on "no".

This comment indicates a major limitation which makes the result much less impressive.

Comment by Chris_Leong on ChatGPT can learn indirect control · 2024-03-25T08:10:49.393Z · LW · GW

Yeah, that's a pretty sharp limitation on the result.

I'd love to know if any other AI is able to pass this test when we exclude the tag.

Comment by Chris_Leong on D0TheMath's Shortform · 2024-03-24T04:38:22.818Z · LW · GW

Doesn't releasing the weights inherently involve releasing the architecture (unless you're using some kind of encrypted ML)? A closed-source model could release the architecture details as well, but one step at a time. Just to be clear, I'm trying to push things towards a policy that makes sense going forward and so even if what you said about not providing any interesting architectural insight is true, I still think we need to push these groups to defining a point at which they're going to stop releasing open models.

Comment by Chris_Leong on ejenner's Shortform · 2024-03-15T06:01:01.515Z · LW · GW

Doing stuff manually might provide helpful intuitions/experience for automating it?

Comment by Chris_Leong on Explaining the AI Alignment Problem to Tibetan Buddhist Monks · 2024-03-15T04:31:21.201Z · LW · GW

I would be very interested to know what the monks think about this.

Comment by Chris_Leong on How I turned doing therapy into object-level AI safety research · 2024-03-14T14:00:34.780Z · LW · GW

I think it's much easier to talk about boundaries than preferences because true boundaries don't really contradict between individuals


I'm quite curious about this. What if you're stuck on an island with multiple people and limited food?

Comment by Chris_Leong on 'Empiricism!' as Anti-Epistemology · 2024-03-14T09:33:40.364Z · LW · GW

Very Wittgensteinian:

“What is your aim in Philosophy?”

“To show the fly the way out of the fly-bottle” (Philosophical Investigations)

Comment by Chris_Leong on jeffreycaruso's Shortform · 2024-03-14T00:08:18.641Z · LW · GW

Oh, they're definitely valid questions. The problem is that the second question is rather vague. You need to either state what a good answer would look like or why existing answers aren't satisifying.

Comment by Chris_Leong on jeffreycaruso's Shortform · 2024-03-13T16:16:55.207Z · LW · GW

I downvoted this post. I claim it's for the public good, maybe you find this strange, but let me explain my reasoning.

You've come on Less Wrong, a website that probably has more discussion of this than any other website on the internet. If you want to find arguments, they aren't hard to find. It's a bit like walking into a library and saying that you can't find a book to read.

The trouble isn't that you literally can't find any book/arguments, it's that you've got a bunch of unstated requirements that you want satisfied. Now that's perfectly fine, it's good to have standards. At the same time, you've asked the question in a maximally vague way. I don't expect you to be able to list all your requirements. That's probably impossible and when it is. possible, it's often a lot of work. At the same time, I do believe that it's possible to do better than maximally vague.

The problem with maximally vague questions is that they almost guarantee that any attempt to provide an answer will be unsatisfying both for the person answering and the person receiving the answer. Worse, you've framed the question in such a way that some people will likely feel compelled to attempt to answer anyway, lest people who think that there is such a risk come off as unable to respond to critics.

If that's the case, downvoting seems logical. Why support a game where no-one wins?

Sorry if this comes off as harsh, that's not my intent. I'm simply attempting to prompt reflection.

Comment by Chris_Leong on Wei Dai's Shortform · 2024-03-11T02:20:53.174Z · LW · GW

I have access to Gemini 1.5 Pro. Willing to run experiments if you provide me with an exact experiment to run, plus cover what they charge me (I'm assuming it's paid, I haven't used it yet).

Comment by Chris_Leong on TurnTrout's shortform feed · 2024-03-05T03:25:50.750Z · LW · GW

“But also this person doesn't know about internal invariances in NN space or the compressivity of the parameter-function map (the latter in particular is crucial for reasoning about inductive biases), then I become extremely concerned”

Have you written about this anywhere?

Comment by Chris_Leong on Wei Dai's Shortform · 2024-03-02T00:02:01.561Z · LW · GW

Have you tried talking to professors about these ideas?

Comment by Chris_Leong on Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds" · 2024-03-01T11:23:28.154Z · LW · GW

Is there anyone who understand GFlowNets who can provide a high-level summary of how they work?

Comment by Chris_Leong on Counting arguments provide no evidence for AI doom · 2024-02-29T05:30:48.685Z · LW · GW

Nabgure senzr gung zvtug or hfrshy:

Gurer'f n qvssrerapr orgjrra gur ahzore bs zngurzngvpny shapgvbaf gung vzcyrzrag n frg bs erdhverzragf naq gur ahzore bs cebtenzf gung vzcyrzrag gur frg bs erdhverzragf.

Fvzcyvpvgl vf nobhg gur ynggre, abg gur sbezre.

Gur rkvfgrapr bs n ynetr ahzore bs cebtenzf gung cebqhpr gur rknpg fnzr zngurzngvpny shapgvba pbagevohgrf gbjneqf fvzcyvpvgl.

Comment by Chris_Leong on Counting arguments provide no evidence for AI doom · 2024-02-28T21:15:42.413Z · LW · GW

I wrote up my views on the principle of indifference here:

https://www.lesswrong.com/posts/3PXBK2an9dcRoNoid/on-having-no-clue

I agree that it has certain philosophical issues, but I don’t believe that this is as fatal to counting arguments as you believe.

Towards the end I write:

“The problem is that we are making an assumption, but rather than owning it, we're trying to deny that we're making any assumption at all, ie. "I'm not assuming a priori A and B have equal probability based on my subjective judgement, I'm using the principle of indifference". Roll to disbelieve.

I feel less confident in my post than when I wrote it, but it still feels more credible than the position articulated in this post.

Otherwise: this was an interesting post. Well done on identifying some arguments that I need to digest.

Comment by Chris_Leong on Benito's Shortform Feed · 2024-02-28T05:32:08.463Z · LW · GW

Maybe just say that you're tracking the possibility?

Comment by Chris_Leong on New LessWrong review winner UI ("The LeastWrong" section and full-art post pages) · 2024-02-28T02:57:08.730Z · LW · GW

Is there going to be a link to this from somewhere to make it accessible?

Comment by Chris_Leong on Can we get an AI to do our alignment homework for us? · 2024-02-27T13:17:34.908Z · LW · GW

I think an important crux here is whether you think that we can build institutions which are reasonably good at checking the quality of AI safety work done by humans

 

Why is this an important crux? Is it necessarily the case that if we can reliably check AI safety work done by humans that we we reliably check AI safety work done by Ai's which may be optimising against us? 

Comment by Chris_Leong on Can we get an AI to do our alignment homework for us? · 2024-02-26T23:54:40.452Z · LW · GW

Updated

Comment by Chris_Leong on Can we get an AI to do our alignment homework for us? · 2024-02-26T16:14:12.072Z · LW · GW

Second, it is also possible to robustly verify the outputs of a superhuman intelligence without superhuman intelligence.


Why do you believe that a superhuman intelligence wouldn't be able to deceive you by producing outputs that look correct instead of outputs that are correct?

Comment by Chris_Leong on My guess at Conjecture's vision: triggering a narrative bifurcation · 2024-02-26T07:52:05.744Z · LW · GW

I guess the main doubt I have with this strategy is that even if we shift the vast majority of people/companies towards more interpretable AI, there will still be some actors who pursue black-box AI. Wouldn't we just get screwed by those actors? I don't see how CoEm can be of equivalent power to purely black-box automation.

That said, there may be ways to integrate CoEm's into the Super Alignment strategy.

Comment by Chris_Leong on Mapping the semantic void: Strange goings-on in GPT embedding spaces · 2024-02-25T22:33:01.398Z · LW · GW

GPT-J token embeddings inhabit a zone in their 4096-dimensional embedding space formed by the intersection of two hyperspherical shells


You may want to update the TLDR if you agree with the comments that indicate that this might not be accurate.

Comment by Chris_Leong on Mapping the semantic void: Strange goings-on in GPT embedding spaces · 2024-02-25T22:29:32.785Z · LW · GW

If there's a 100 tokens for snow, it probably indicates that it's a particularly important concept for that language.

Comment by Chris_Leong on How well do truth probes generalise? · 2024-02-25T14:51:35.412Z · LW · GW

For Linear Tomography and Principle Component Analysis, I'm assuming that by unsupervised you mean that you don't use the labels for finding the vector, but that you do use them for determining which sign is true and which is false. If so, this might be worth clarifying in the table.

Comment by Chris_Leong on Communication Requires Common Interests or Differential Signal Costs · 2024-02-25T09:14:33.424Z · LW · GW

Agreed. Good counter-example.

I'm very curious as to whether Zac has a way of reformulating his claim to save it.