Posts

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks 2024-05-20T17:53:25.985Z

Comments

Comment by debrevitatevitae (cindy-wu) on The Field of AI Alignment: A Postmortem, and What To Do About It · 2025-01-01T00:42:17.658Z · LW · GW

Some broad points:

  • My interpretation of those at fault are the field builders and funders. That is part of the reason I quit doing alignment. The entire funding landscape feels incredibly bait and switch: come work for us! We are desperate for talent! The alignment problem is the hardest issue of the century! (Cue 2 years and an SBF later) Erm, no, we don't fund AI safety startups or interp, and we want to see tangible results in a few narrow domains...
  • In particular, I advocate for the concept of endorsing work with a nonlinear apparent progress rate. Call it 'slow work' or something. Often a lot of hard things look like they're getting nowhere but all the small failures add up to something big. This is also why I do not recommend MATS as a one size fits all solution for people joining the alignment field: some people do better with slow work, and with carefully thinking about where they are heading with direction and intent, not just putting their heads down to 'get something done'. In fact, this mindset gave me burnout earlier this year.
  • The people not at fault are those who are middle of the pack undergrads or not Physicists Doing Real Things. This is a system wide problem.

I agree with 'don't streetlight' and 'we should move the field towards riskier and harder projects'. For me, this means bets on indviduals, not tangible projects. What I mean by this, is similar to how Entrepreneur First makes bets on founders, not products, funders should make bets on people, who they believe with enough time, can make meaningful progress that's orthogonal to existing directions.

I don't agree at all with the elitist take of 'and that is why only physics postdocs (or those of similar capability - whatever that means) are only capable of doing real work', at all. This take is quite absurd to me, and frankly a little angering (because I have the impression it's quietly shared among certain circles the same way some STEM majors look down on non-STEM majors), but I think that was the goal, so achieved. In particular, here are the reasons why I disagree:

  • It isn't true that this skill can only be found in 'physics postdocs'. The ability to push through hard things and technical fluency can be gained by a good chunk of STEM degrees. Anyone can open a mathematics textbook and read.

  • Critical thinking is more important in my opinion than technical fluency, for avoiding falling into streetlights and deferring opinions to others.

  • On a meta level, beliefs like this lead to segregation of the research community in a way that is unhealthy. Promoting further segregration is not ideal. There must be balance. I think any sensible human is capable of maintaining their own opinions while taking in those of others.

  • Agree with the comments about the car and driver. My current opinion is physicists need to work with non-physicists. There is a risk otherwise of working only on interesting problems that lead to us maybe 'solving alignment' (with 139 caveats) by 2178.

Comment by debrevitatevitae (cindy-wu) on The ‘strong’ feature hypothesis could be wrong · 2024-08-18T23:08:43.573Z · LW · GW

Thanks for the post! This is fantastic stuff, and IMO should be required MI reading.

Does anyone who perhaps knows more about this than me wonder if SQ dimension is a good formal metric for grounding the concept of explicit vs tacit representations? It appears to me the only reason you can't reduce a system down further by compressing into a 'feature' is that by default it relies on aggregation, requiring a 'bird's eye view' of all the information in the network. 

I mention this as I was revisiting some old readings today on inductive biases of NNs, then realised that one reason why low complexity functions can be arbitrarily hard for NNs to learn could be because they have high SQ dimension (best example: binary parity).

Comment by debrevitatevitae (cindy-wu) on You don't know how bad most things are nor precisely how they're bad. · 2024-08-18T14:40:34.088Z · LW · GW

So, I'm mostly referencing trends in e-commerce here. For example, first Amazon put storefronts out of business by allowing drop-shipping of cheaply manufactured goods with no warranty. Now, Temu is competing with Amazon by exploiting import tax loopholes, selling the same items at below production price, many of which contain pthalates and other chemical compounds at multiple times the safe standards. This is a standard trick for monopolisation pulled by large giants: they will then rack the prices back up once they have a stable user base, and start making profit. Uber did this.

The drop in clothing standard is real, though, because fast fashion didn't really exist until the 2000s.10 years is not far enough back: you need to go about 25-30. If I want high quality clothing made fairtrade, I now have to go on specialist websites like Good on You which compile databases of very niche companies and pay upwards of $100 for an item of clothing. I cannot get something that I expect to last long by walking into a department store.

Enshittification also exists in the apps and services that have been established via monopolisation or acquiring an existing user base.

Comment by debrevitatevitae (cindy-wu) on You don't know how bad most things are nor precisely how they're bad. · 2024-08-17T18:43:10.824Z · LW · GW

I do worry we are already seeing this. To quote the word exactly, the 'enshittification' of everything we can buy and services we are provided is real. The best example of this high-quality clothing, but pretty much everything you can buy online at Amazon shows this too. It's important to be able to maintain quality separate of market dynamics, IMO, at least because some people value it (and consumers aren't really voting if there is no choice).

Comment by debrevitatevitae (cindy-wu) on cleanwhiteroom's Shortform · 2024-08-02T19:53:16.788Z · LW · GW

I think seems to be a very accurate abstraction of what is happening. During sleep, the brain consolidates (compresses and throws away) information. This would be equivalent to summarising the context window + discussion so far, and adding it to a running 'knowledge graph'. I would be surprised if someone somewhere has not tried this already on LLMs - summarising the existing context + discussion, formalising it in an external knowledge graph, and allowing the LLM to do RAG over this during inference in future.

Although, I do think LLM hallucinations and brain hallucinations arise via separate mechanisms. Especially there is evidence showing human hallucinations (sensory processing errors) occur as an inability of the brain's top-down inference (the bayesian 'what I expect to see based on priors') to happen correctly.   There is instead increased reliance on bottom-up processing (https://www.neuwritewest.org/blog/why-do-humans-hallucinate-on-little-sleep).