Posts

"Safety as a Scientific Pursuit" (2024) 2024-01-23T12:40:13.902Z
Appendices to the live agendas 2023-11-27T11:10:32.187Z
Shallow review of live agendas in alignment & safety 2023-11-27T11:10:27.464Z
ActAdd: Steering Language Models without Optimization 2023-09-06T17:21:56.214Z
Announcing the Alignment of Complex Systems Research Group 2022-06-04T04:10:14.337Z
Case for emergency response teams 2022-04-05T12:45:08.371Z
Hinges and crises 2022-03-29T11:11:03.605Z
Experimental longtermism: theory needs data 2022-03-24T08:23:40.454Z
We have some evidence that masks work 2021-07-11T18:36:46.942Z
Self-help, hard and soft 2020-06-07T15:39:29.746Z
Automatic for the people 2018-07-08T14:23:08.787Z

Comments

Comment by technicalities on Are extreme probabilities for P(doom) epistemically justifed? · 2024-03-29T10:34:49.557Z · LW · GW

As of two years ago, the evidence for this was sparse. Looked like parity overall, though the pool of "supers" has improved over the last decade as more people got sampled.

There are other reasons to be down on XPT in particular.

Comment by technicalities on Least-problematic Resource for learning RL? · 2024-02-19T03:59:13.116Z · LW · GW

I like Hasselt and Meyn (extremely friendly, possibly too friendly for you)

Comment by technicalities on Dalcy's Shortform · 2024-02-19T03:55:19.635Z · LW · GW

Maybe he dropped the "c" because it changes the "a" phoneme from æ to ɑː and gives a cleaner division in sounds: "brac-ket" pronounced together collides with "bracket" where "braa-ket" does not. 

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-12-02T10:13:54.215Z · LW · GW

It's under "IDA". It's not the name people use much anymore (see scalable oversight and recursive reward modelling and critiques) but I'll expand the acronym.

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-29T19:38:36.225Z · LW · GW

The story I heard is that Lightspeed are using SFF's software and SFF jumped the gun in posting them and Lightspeed are still catching up. Definitely email.

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-29T19:37:14.268Z · LW · GW

d'oh! fixed

no, probably just my poor memory to blame

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-29T09:57:27.794Z · LW · GW

Yep, no idea how I forgot this. concept erasure!

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-29T09:52:41.806Z · LW · GW

Interesting. I hope I am the bearer of good news then

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-29T09:51:35.726Z · LW · GW

thankyou!

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-29T09:44:12.965Z · LW · GW

Not speaking for him, but for a tiny sample of what else is out there, ctrl+F "ordinary"

Comment by technicalities on Appendices to the live agendas · 2023-11-29T09:31:29.689Z · LW · GW

yeah you're right

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-28T10:51:11.902Z · LW · GW

If the funder comes through I'll consider a second review post I think

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-28T10:03:45.096Z · LW · GW

You're clearly right, thanks

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-28T09:43:38.611Z · LW · GW

Thanks!

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-28T09:43:13.979Z · LW · GW

Being named isn't meant as an honorific btw, just a basic aid to the reader orienting.

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-28T09:39:52.651Z · LW · GW

Ta! 

I've added a line about the ecosystems. Nothing else in the umbrella strikes me as direct work (Public AI is cool but not alignment research afaict). (I liked your active inference paper btw, see ACS.)

A quick look suggests that the stable equilibrium things aren't in scope - not because they're outgroup but because this post is already unmanageable without handling policy, governance, political economy and ideology. The accusation of site bias against social context or mechanism was perfectly true last year, but no longer, and my personal scoping should not be taken as indifference.

Of the NSF people, only Sharon Li strikes me as doing things relevant to AGI. 

Happy to be corrected if you know better!

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-27T18:34:51.748Z · LW · GW

I like this. It's like a structural version of control evaluations. Will think where to put it in

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-27T16:32:19.579Z · LW · GW

One big omission is Bengio's new stuff, but the talk wasn't very precise. Sounds like Russell:

With a causal and Bayesian model-based agent interpreting human expressions of rewards reflecting latent human preferences, as the amount of compute to approximate the exact Bayesian decisions increases, we increase the probability of safe decisions.

Another angle I couldn't fit in is him wanting to make microscope AI, to decrease our incentive to build agents.

Comment by technicalities on Appendices to the live agendas · 2023-11-27T13:37:13.115Z · LW · GW

I care a lot! Will probably make a section for this in the main post under "Getting the model to learn what we want", thanks for the correction.

Comment by technicalities on Shallow review of live agendas in alignment & safety · 2023-11-27T12:29:30.672Z · LW · GW

Thanks!

Comment by technicalities on Thomas Kwa's Shortform · 2023-11-27T11:48:18.224Z · LW · GW

Here's an unstructured input for this

Comment by technicalities on Report on Frontier Model Training · 2023-09-12T08:02:37.885Z · LW · GW

I'm not seeing anything here about the costs of data collection (for licenced stuff) or curation (probably hundreds of thousands of cheap hours?), apart from one bullet on OAI's combined costs. As a total outsider I would guess this could move your estimates by 20-100%.

Comment by technicalities on Internal communication framework · 2022-11-15T16:44:25.549Z · LW · GW

ICF is the only such mental viz whizz technique that has ever worked for me, and I say that having done CFAR, a dedicated focussing retreat, a weekend vipassana retreat, and a dedicated circling retreat. 

Comment by technicalities on The Track Record of Futurists Seems ... Fine · 2022-07-02T12:54:44.937Z · LW · GW

From context I think he meant not fibre laser but "free-space optics", a then-hyped application of lasers to replace radio. I get this from him mentioning it in the same sentence as satellites and then comparing lasers to radio: "A continuing advance of communications satellites, and the use of laser beams for communication in place of electric currents and radio waves. A laser beam of visible light is made up of waves that are millions of times shorter than those of radio waves". So I don't think this rises above the background radiation (ha) of Asimov's vagueness.

As for 3D TV, if I expand the context you see it's an explicit replacement for screens: "wall screens will have replaced the ordinary set; but transparent cubes will be making their appearance in which three-dimensional viewing will be possible. In fact, one popular exhibit at the 2014 World's Fair will be such a 3-D TV, built life-size, in which ballet performances will be seen. The cube will slowly revolve for viewing from all angles." Also my understanding is that our 3D TVs don't allow any varying POV, let alone all angles. 

Thanks! Added these to the changelog.

Comment by technicalities on It’s Probably Not Lithium · 2022-07-02T12:03:16.436Z · LW · GW

Good reason to apply this with nearly equal intensity to mainstream medical arguments, though. (Applies to a lesser extent to evidence-based places like Cochrane, but sadly still applies.)

Comment by technicalities on The Track Record of Futurists Seems ... Fine · 2022-07-01T19:06:59.512Z · LW · GW

Good catch! The book is generally written as the history of the world leading up to 2000, and most of its predictions are about that year. But this is clearly an exception and the section offers nothing more precise than "By the year 3000, then, it may well be that Earth will be only a small part of the human realm." I've moved it to the "nonresolved" tab.

DM me for your bounty ($10)! I added your comment to the changelog. Thanks! 

Comment by technicalities on The Track Record of Futurists Seems ... Fine · 2022-07-01T18:24:00.995Z · LW · GW

Data collector here. Strongly agree with your general point: most of these entries are extremely far from modern "clairvoyant" (cleanly resolving) forecasting questions. 
 

Space travel. Disagree. In context he means mass space travel. The relevant lead-up is this: 

"According to her, the Moon is a great place and she wants us to come visit her."

"Not likely!" his wife answers. "Imagine being shut up in an air - conditioned cave."    

"When you are Aunt Jane's age, my honey lamb, and as frail as she is, with a bad heart thrown in, you'll go to the Moon and like it."

Re: footnote 1. He was a dishonest bugger in his old age so I don't doubt he would argue that.


Central piloting. Yep, you're right. We caught this before, but changed it in the wrong branch of the data. Going to make it 'ambiguous'; let me know if that seems wrong. 
 

Commercial interplanetary travel. Disagree - "C.O.D." is an old-timey word meaning something so normal and cheap that you don't even need to pay for your ticket upfront - which implies that "you" is a consumer, not a government. (But again I see what you're saying.)
 

DM me for your bounty ($10)! I've linked to your comment in the changelog. Thanks! 

Comment by technicalities on The Track Record of Futurists Seems ... Fine · 2022-07-01T17:10:23.343Z · LW · GW

Is the point that 1) AGI specifically is too weird for normal forecasting to work, or 2) that you don't trust judgmental forecasting in general, or 3) that respectability bias swamps the gains from aggregating a heavily selected crowd, spending more time, and debiasing in other ways?

The OpenPhil longtermists' respectability bias seems fairly small to me; their weirder stuff is comparable to Asimov (but not Clarke, who wrote a whole book about cryptids). 

And against this, you have to factor in the Big Three's huge bias towards being entertaining instead of accurate (as well as e.g. Heinlein's inability to admit error). 

Can you point at examples? (Bio anchors?)

Comment by technicalities on Lives of the Cambridge polymath geniuses · 2022-02-11T15:06:59.540Z · LW · GW

Bentham was nonzero discount apparently (fn6). (He used 5% but only as an example.)

Mill thought about personal time preference (and was extremely annoyed by people's discount there). Can't see anything about social rate of discounting.

Comment by technicalities on Lives of the Cambridge polymath geniuses · 2022-02-10T22:46:02.866Z · LW · GW

See here.

I think Ramsey is also the first (quantitative) longtermist ever (zero discount rate).

Comment by technicalities on The Best Software For Every Need · 2021-09-18T09:55:15.382Z · LW · GW

Ooh that's more intense that I realised. There might be plugins for yEd, but I don't know em. Maybe Tetrad?

Comment by technicalities on The Best Software For Every Need · 2021-09-16T07:59:14.059Z · LW · GW

I love Sketchviz for 10 second prototypes, but it requires the DOT language, and if you need very specific label placements it's a nightmare.

For using a mouse, yEd is good. Exports to GraphML for version control.

Comment by technicalities on We have some evidence that masks work · 2021-09-02T07:34:48.027Z · LW · GW

Givewell's fine! 

Thanks again for caring about this.

Comment by technicalities on We have some evidence that masks work · 2021-07-31T09:47:24.207Z · LW · GW

Sounds fine. Just noticed they have a cloth and a surgical treatment. Take the mean?

Comment by technicalities on We have some evidence that masks work · 2021-07-30T15:52:49.972Z · LW · GW

Great! Comment below if you like this wording and this can be our bond:

"Gavin bets 100 USD to GiveWell, to Mike's 100 USD to GiveWell that the results of NCT04630054 will show a median reduction in Rt > 15.0 % for the effect of a whole population wearing masks [in whatever venues the trial chose to study]."

Comment by technicalities on Fire Law Incentives · 2021-07-22T20:16:37.079Z · LW · GW

This is an interesting counterpoint (though I'd like to see a model of CO2 cost vs thinning cost if you have one), and it's funny we happen to have such a qualified person on the thread. But your manner is needlessly condescending and - around here - brandishing credentials as a club will seriously undermine you rather than buttressing you. 

Comment by technicalities on We have some evidence that masks work · 2021-07-19T16:12:44.205Z · LW · GW

Sadly Gelman didn't have time to destroy us. (He rarely does.)

Comment by technicalities on Critiques of the Agent Foundations agenda? · 2020-12-04T09:26:47.951Z · LW · GW

Stretching the definition of 'substantial' further:

Beth Zero was an ML researcher and Sneerclubber with some things to say. Her blog is down unfortunately but here's her collection of critical people. Here's a flavour of her thoughtful Bulverism. Her post on the uselessness of Solomonoff induction and the dishonesty of pushing it as an answer outside of philosophy was pretty good.

Sadly most of it is against foom, against short timelines, against longtermism, rather than anything specific about the Garrabrant or Demski or Kosoy programmes.

Comment by technicalities on Critiques of the Agent Foundations agenda? · 2020-12-04T09:15:10.334Z · LW · GW

Nostalgebraist (2019) sees it as equivalent to solving large parts of philosophy: a noble but quixotic quest. (He also argues against short timelines but that's tangential here.)

Here is what this ends up looking like: a quest to solve, once and for all, some of the most basic problems of existing and acting among others who are doing the same. Problems like “can anyone ever fully trust anyone else, or their future self, for that matter?” In the case where the “agents” are humans or human groups, problems of this sort have been wrestled with for a long time using terms like “coordination problems” and “Goodhart’s Law”; they constitute much of the subject matter of political philosophy, economics, and game theory, among other fields.

The quest for “AI Alignment” covers all this material and much more. It cannot invoke specifics of human nature (or non-human nature, for that matter); it aims to solve not just the tragedies of human coexistence, but the universal tragedies of coexistence which, as a sad fact of pure reason, would befall anything that thinks or acts in anything that looks like a world.

It sounds misleadingly provincial to call such a quest “AI Alignment.” The quest exists because (roughly) a superhuman being is the hardest thing we can imagine “aligning,” and thus we can only imagine doing so by solving “Alignment” as a whole, once and forever, for all creatures in all logically possible worlds. (I am exaggerating a little in places here, but there is something true in this picture that I have not seen adequately talked about, and I want to paint a clear picture of it.)

There is no doubt something beautiful – and much raw intellectual appeal – in the quest for Alignment. It includes, of necessity, some of the most mind-bending facets of both mathematics and philosophy, and what is more, it has an emotional poignancy and human resonance rarely so close to the surface in those rarefied subjects. I certainly have no quarrel with the choice to devote some resources, the life’s work of some people, to this grand Problem of Problems. One imagines an Alignment monastery, carrying on the work for centuries. I am not sure I would expect them to ever succeed, much less to succeed in some specified timeframe, but in some way it would make me glad, even proud, to know they were there.

I do not feel any pressure to solve Alignment, the great Problem of Problems – that highest peak whose very lowest reaches Hobbes and Nash and Kolomogorov and Gödel and all the rest barely began to climb in all their labors...

#scott wants an aligned AI to save us from moloch; i think i'm saying that alignment would already be a solution to moloch

Comment by technicalities on Rationalists from the UK -- what are your thoughts on Dominic Cummings? · 2020-11-22T20:03:31.659Z · LW · GW

Huh, works for me. Anyway I'd rather not repeat his nasty slander but "They're [just] a sex cult" is the gist.

Comment by technicalities on Rationalists from the UK -- what are your thoughts on Dominic Cummings? · 2020-11-22T18:32:09.853Z · LW · GW

https://books.google.co.uk/books?id=OLB1DwAAQBAJ&q=sex cult&f=false

Comment by technicalities on Rationalists from the UK -- what are your thoughts on Dominic Cummings? · 2020-11-22T11:38:22.282Z · LW · GW

The received view of him is as just another heartless Conservative with an extra helping of tech fetishism and deceit. In reality he is an odd accelerationist just using the Tories (Ctrl+F "metastasising"). Despite him quoting Yudkowsky in that blog post, and it getting coverage in all the big papers, people don't really link him to LW or rationality, because those aren't legible, even in the country's chattering classes. We are fortunate that he is such a bad writer, so that no one reads his blog.

Here's a speculative rundown of things he probably got implemented (but we won't really know until 2050 declassification):

  • Doubling of the already large state R&D budget (by 2025). This will make the government half of all UK R&D spending. £800m ARPA like. £300m STEM funding already out.

  • Pushed the COVID science committee into an earlier lockdown. Lockdown sceptics / herd immunity types likely to gain influence now.

  • An uncapped immigration path for scientists

  • Tutoring in state schools

  • Data-driven reform of the civil service is incomplete and probably abortive. His remaining crew are "misfits", little influence. Associated data science, superforecasting and evidence-based policy with racists and edgelords. (One of those is on record as having a ridiculously negative view of LW.) Weirdo hiring scheme may mean Whitehall hiring even more staid in the short run.

  • Something something bullying, norms, deception, centralisation of power. Whipping the Treasury probably not a good precedent.

  • His hypocrisy probably weakened lockdown norms. This also wasted a huge amount of Boris Johnson's political capital during a public health crisis; I don't know how to evaluate that.

Comment by technicalities on Model Depth as Panacea and Obfuscator · 2020-11-09T09:27:05.881Z · LW · GW

Great post. Do you have a sense of

  1. how much of tree success can be explained / replicated by interpretable models;
  2. whether a similar analysis would work for neural nets?

You suggest that trees work so well because they let you charge ahead when you've misspecified your model. But in the biomedical/social domains ML is most often deployed, we are always misspecifying the model. Do you think your new GLM would offer similar idiotproofing?

Comment by technicalities on [deleted post] 2020-10-03T09:07:35.226Z

Yeah, the definition of evidence you use (that results must single out only one hypothesis) is quite strong, what people call "crucial" evidence.

https://en.m.wikipedia.org/wiki/Experimentum_crucis

Comment by technicalities on Are there good ways to find expert reviews of popular science books? · 2020-06-09T15:44:06.544Z · LW · GW

I suspect there is no general way. ): Even the academic reviews tend to cherry-pick one or two flaws and gesture at the rest.

Partial solutions:

  1. Invest the time to follow the minority of Goodreads users who know their stuff. (Link is people I follow.)
  2. See if Stuart Ritchie has reviewed it for money.
Comment by technicalities on Most reliable news sources? · 2020-06-06T21:29:50.204Z · LW · GW

The Economist ($) for non-Western events and live macroeconomics. They generally foreground the most important thing that happens every week, wherever it happens to occur. They pack the gist into a two page summary, "The World this Week". Their slant is pro-market pro-democracy pro-welfare pro-rights, rarely gets in the way. The obituaries are often extremely moving.

https://www.economist.com/the-world-this-week/

Comment by technicalities on A revolution in philosophy: the rise of conceptual engineering · 2020-06-03T09:27:09.039Z · LW · GW

Raised in the old guard, Chalmers doesn't understand...

This amused me, given that in the 90s he was considered an outsider and an upstart, coming round here with his cognitive science, shaking things up. (" 'The Conscious Mind' is a stimulating, provocative and agenda-setting demolition-job on the ideology of scientific materialism. It is also an erudite, urbane and surprisingly readable plea for a non-reductive functionalist account of mind. It poses some formidable challenges to the tenets of mainstream materialism and its cognitivist offshoots" )

Not saying you're wrong about him in that lecture. Maybe he has socialised and hardened as he gained standing. A funny cycle, in that case.

Comment by technicalities on What are objects that have made your life better? · 2020-05-22T07:10:34.254Z · LW · GW

I did a full accounting, including vague cost-benefit ranking:

https://www.gleech.org/stuff

Ignoring the free ones, which you should just go and get now, I think the best are:

  • Sweet Dreams Contoured sleep mask. Massively improved sleep quality, without having to alter the room, close the windows, whatever. 100:1.

  • Bowflex SelectTech dumbbells. A cheap gym membership is £150 a year; using these a couple times a week for 2 years means I’ve saved hundreds of pounds and dozens of hours commuting. They should last 15 years, so maybe total 30:1. (During the present lockdown, with gyms closed, the dumbbells get a temporary massive boost too.)

  • [Queal, a complete food powder] once a day. Saves money (if a lunch would otherwise be £4) and time and the delivery vector means I actually use the other powders I buy (spirulina, creatine, beta-alanine). Big discount for verifiable EAs. Also a handy automatic prepper store. 10:1.

  • Filco Majestouch 2 Tenkeyless mechanical keyboard. Assuming this decreases my RSI risk by 1%, it will have paid off 10 times over. But also in comfort and fun alone. 10:1

Comment by technicalities on What are the relative speeds of AI capabilities and AI safety? · 2020-04-24T22:20:12.205Z · LW · GW

Some more ways:

If it turns out that capabilities and safety are not so dichotomous, and so robustness / interpretability / safe exploration / maybe even impact regularisation get solved by the capabilities lot.

If early success with a date-competitive performance-competitive safety programme (e.g. IDA) puts capabilities research onto a safe path.

Comment by technicalities on The Samurai and the Daimyo: A Useful Dynamic? · 2020-04-14T09:56:03.030Z · LW · GW

My name for this Einsteins and Eddingtons.** Besides the vital testing and extension of the big ideas, the Eddington can also handle popularisation and, most important of all, the identification and nurturing of new Einsteins. This is one reason I think teaching in academia could be high-impact, despite all the notorious inefficiencies and moral mazes.


** Not totally fair to Eddington, since he was a pretty strong theorist himself.