## Posts

Adjusting probabilities for the passage of time, using Squiggle 2020-10-23T18:55:30.860Z · score: 12 (3 votes)
A prior for technological discontinuities 2020-10-13T16:51:32.572Z · score: 48 (18 votes)
NunoSempere's Shortform 2020-10-13T16:40:05.972Z · score: 4 (1 votes)
AI race considerations in a report by the U.S. House Committee on Armed Services 2020-10-04T12:11:36.129Z · score: 42 (25 votes)
What are the relative speeds of AI capabilities and AI safety? 2020-04-24T18:21:58.528Z · score: 8 (4 votes)
Some examples of technology timelines 2020-03-27T18:13:19.834Z · score: 23 (9 votes)
[Part 1] Amplifying generalist research via forecasting – Models of impact and challenges 2019-12-19T15:50:33.412Z · score: 53 (13 votes)
[Part 2] Amplifying generalist research via forecasting – results from a preliminary exploration 2019-12-19T15:49:45.901Z · score: 48 (12 votes)
What do you do when you find out you have inconsistent probabilities? 2018-12-31T18:13:51.455Z · score: 16 (6 votes)
The hunt of the Iuventa 2018-03-10T20:12:13.342Z · score: 11 (5 votes)

Comment by nunosempere on Launching Forecast, a community for crowdsourced predictions from Facebook · 2020-10-21T11:47:36.995Z · score: 2 (2 votes) · LW · GW

Forecast's midpoint brier score (measured at the midpoint between a question’s launch and resolution dates) across all closed Forecasts over the past few months is 0.204, a bit better than Good Judgement's published result of 0.227 for prediction markets

The relative difficulty of the questions is probably important here, and the comparison "a bit better than Good Judgment" is probably misleading. In particular, I'd expect Good Judgement to have questions with longer time horizons (which are harder to forecast), if only because your platform is so young.

Our first priority is to build something that’s really fun for people who want to engage in rational debate about the future

How are you defining "really fun" as distinct from "addictive"?

Since June, the Forecast community has made more than 50,000 forecasts on a few hundred questions--and they're actually reasonably accurate.

50,000 forecasts isn't that much, maybe 30x the number of forecasts I've made, but if you scale this up to Facebook scale, I'd imagine you might be able to train a halfway decent ML system. I'd be keen to see a firm and binding ethical commitment which handles this eventuality before you accumulate the data, but I don't know how that would look in the context of Facebook's corporate structure and ethics track record.

Comment by nunosempere on NunoSempere's Shortform · 2020-10-20T21:41:08.617Z · score: 0 (0 votes) · LW · GW

This is a test to see if latex works

Comment by nunosempere on NunoSempere's Shortform · 2020-10-19T19:24:40.866Z · score: 1 (1 votes) · LW · GW

Fixed, thanks

Comment by nunosempere on What are some beautiful, rationalist artworks? · 2020-10-19T11:17:54.178Z · score: 3 (2 votes) · LW · GW

You could also have a calendar which doesn't require that adjustment.

Comment by nunosempere on A prior for technological discontinuities · 2020-10-19T09:31:50.533Z · score: 1 (1 votes) · LW · GW

I'm refering to the Jalali calendar.

Comment by nunosempere on A prior for technological discontinuities · 2020-10-18T23:14:57.424Z · score: 8 (4 votes) · LW · GW

What? I feel like this comment doesn't answer to the post above at all.

tl;dr of the post: If I look at 50 technologies which to a first approximation I expect to be roughly randomly chosen, I can broadly divide them into:

• Probably with "big" discontinuities: Aviation, nuclear weapons, petroleum, printing, spaceflight, rockets, aluminium production, radar, radio, automobile, transistors, and PCR.
• Probably with "medium" discontinuities: cryptography, glass, rail transport, water supply and sanitation, diesel car, automation, television, steam engine, timekeeping devices.
• Probably with "small" discontinuities: cycling, furniture, robotics, candle making, sound recording, submarines, batteries, multitrack recording, paper, telescopes, wind power.
• Probably not discontinuous: ceramics, film, oscilloscopes, photography, artificial life, calendars, chromatography, bladed weapons, condoms, hearing aids, telephones, internal combustion engine, manufactured fuel gases, perpetual motion machines, motorcylces, nanotech, portable gas stoves, roller coasters.

Using that, I can sort of get a prior for the likelihood of "big" discontinuities; which falls between 8% (4/50) and 24% (12/50). I can also get a rough probability of a discontinuity per year (~1% if the technology ever shows one). All of this has caveats, outlined in the post.

***

Your first point, that if I paper-push hard enough I can make anything look continuous doesn't apply, because I'm not in fact doing that. For example, throughout WW2 there of were several iterations of the radar, each progressively less shitty, but progress was fast enough (due to the massive, parallel, state funding) that I'd still categorize it as a discontinuity (and note that it did get into the OODA loops of the Germans, the effectiveness of whose submarines greatly declined after the deployment of radar). Similarly, the Wright brothers also experimented with different designs, but overall their progress on heavier than air flight was rapid and anomalous enough that I categorized it as a discontinuity. Similarly, for transistors, there were of course many different transistors, but MOSFET transistors were substantially better on miniaturization than BJTs, even if MOSFETs were worse during their very first years. Similarly, transistors themselves were better than vacuum triodes, though I'm sure that if you squint you can also find something continuous somewhere.

Even if I were paper pushing, detecting 12/50 would still give you a lower bound for the probability of a "big" discontinuity (24% + however many I paper-pushed to the medium or small categories). Even if there wasn't a clear line between "continuous" and "discontinuous", I'd expect more continuous technologies to fall in the "medium", "small" and "probably not" buckets, and more discontinuous technologies in the "big" and "medium" buckets.

Some of your gripes could conceivably apply to some parts of AI Impacts' investigation (e..g, they don't categorize printing as a "large and robust" discontinuity), but I am not them.

Big fan of your work, though.

Comment by nunosempere on Bet On Biden · 2020-10-18T08:19:19.619Z · score: 5 (4 votes) · LW · GW

I concur with this. In my case, I set aside some amount of money I was comfortable losing, calculated the Kelly bet based on the expected win, and bet some amount on Biden. I used Betfair, which is available to Europeans.

Some commenters mention the EMH. As counterevidence, I present that Betfair is offering even odds that Biden will win at least one state Trump won last time. (The 538 model gives 96% to this) (This was wrong)

Comment by nunosempere on Bet On Biden · 2020-10-18T08:08:22.798Z · score: 1 (1 votes) · LW · GW

Betfair.de should work. Besides sports, there is a politics section.

Comment by nunosempere on Evaluating expertise: a clear box model · 2020-10-17T18:24:38.431Z · score: 1 (1 votes) · LW · GW

Very interesting! Your categorization into black box / clear box / social reputation seems like it's missing a level, and hence to me your names feel slightly off. I might instead think in terms of:

1. Clear box: I fact check some of the expert's claims, and estimate the accuracy of the claims I can't estimate based on the ones which I can. For example, [Ibn Tufail's](https://en.wikipedia.org/wiki/Ibn_Tufail) metaphysical claims might be difficult to refute, but his books also reference biological mechanisms which are easier to evaluate (e.g., men can be born from mud.) Similarly, if an expert claims some broad historical thesis, I can compare it to, e.g., Spain in the last centuries to see if it checks out.
2. Black box: I know the person's track record, or that the track record is good, but not which claims/accomplishments it's based on. For example, I know that Renaissance Technologies has a good track record in making money from the stock market and cultivating startups, even if I don't know how exactly they did it. Or I might know that someone is a super-forecaster without knowing what questions they have predicted to get there.
3. Proxies box (your clear box): I look at proxies for accuracy/track record. Some can be mechanistic: like skin in the game, alignment, computational power, time. But you can't look at, say, computational power or alignment directly (yet), so might have to look at correlational proxies for that, like prestigious university affiliations, big car, nice suit, English accent, brings up cogent and interesting points in a conversation, presentation skills, etc..
4. Deference pointer: I trust other people's assessment & status signals.

On 1., see Epistemic Spot Checks, and in particular this comment thread. On 3., see Hanson's How to pick an X.

Comment by nunosempere on Msg Len · 2020-10-17T10:53:41.263Z · score: 3 (2 votes) · LW · GW

And prediction strategies are almost optimization procedures?

Comment by nunosempere on What are some beautiful, rationalist artworks? · 2020-10-17T09:27:55.343Z · score: 1 (1 votes) · LW · GW

A panel from Watchmen which particularly resonated with me.

Comment by nunosempere on What are some beautiful, rationalist artworks? · 2020-10-17T09:23:57.553Z · score: 8 (7 votes) · LW · GW
Comment by nunosempere on What should experienced rationalists know? · 2020-10-15T21:05:14.108Z · score: 6 (2 votes) · LW · GW

See: The Rationality Quotient: Toward a Test of Rational Thinking, by Keith E. Stanovich et al.

Comment by nunosempere on A prior for technological discontinuities · 2020-10-15T18:36:28.835Z · score: 1 (1 votes) · LW · GW

Note that 8 is pretty small, so if the true base rate was 32%, you'd still expect to see 2 discontinuities (8 choose 2) * 0.32^2 * 0.68^6 = 0.283 = 28% of the time, vs (8 choose 3) * 0.32^3 * 0.68^5 = 0.266 = 27% of the time for 3 discontinuities, so I take the 3/8 as indicative that I'm on the right ballpark, rather than sweating too much about it.

The broad vs specific categories is an interesting hypothesis, but note that it could also be cofounded with many other factors. For example, broader fields could attract more talent and funds, and it might be harder to achieve proportional improvement in a bigger domain (e.g., it might be easier to improve making candles by 1% than to improve aviation by 1%). That is, you can have the effect that you have more points of leverage (more subfields, as you point out), but that each point of leverage affects the result less (an x% discontinuity in a subfield corresponds to much less than x% in the super-field).

If I look at it in my database, categorizing:

• Broad: aviation, ceramics, cryptography, film, furniture, glass, nuclear, petroleum, photography, printing, rail transport, robotics, water supply and sanitation, artificial life, bladed weapons, aluminium, automation, perpetual motion machines, nanotechnology, timekeeping devices, wind power
• Not broad: motorcycle, multitrack recording, Oscilloscope history, paper, polymerase chain reaction, portable gas stove, roller coaster, steam engine, telescope, cycling, spaceflight, rockets, calendars, candle making, chromatography, condoms, diesel car, hearing aids, radar, radio, sound recording, submarines, television, automobile, battery, telephone, transistor, internal combustion engine, manufactured fuel gases.

Then broad categories get 24% big discontinuities, 29% medium discontinuities, 14% small discontinuities and 33% no discontinuities. In comparison, less broad categories get 24% big discontinuities, 10% medium discontinuities, 28% small discontinuities and 38% no discontinuities, i.e., no effect at the big discontinuity level but a noticeable difference at the medium level, which is somewhat consistent with your hypothesis. Data available here, the criteria used was "does this have more than one broad subcategory" combined my own intuition as to whether the field feels "broad".

Comment by nunosempere on A prior for technological discontinuities · 2020-10-14T16:45:22.960Z · score: 5 (3 votes) · LW · GW

From the paper, charts such as: suggest that it wasn't a discontinuity in terms of validation loss, which seems to the inverse of perplexity.

GPT-3's full version has a capacity of 175 billion [parameters] [...] Prior to the release of GPT-3, the largest language model was Microsoft's Turing NLG, introduced in February 2020, with a capacity of 17 billion parameters or less than 10 percent compared to GPT-3.

The year before GPT-2 had 1.5 billion parameters and XLNET had 340M. The year before that, in 2018 BERT had 340M. Here are two charts around that time:

Unclear whether there was a discontinuity roughly at the time of Nvidia's Megatron, particularly on the logarithmic scale. GPT-3 was 10x the size of Microsoft's last model, but came 4 months afterwards, which seems like it might maybe break that exponential.

Comment by nunosempere on A prior for technological discontinuities · 2020-10-13T21:34:24.566Z · score: 5 (3 votes) · LW · GW

I miscounted the maybes. Fixed.

Comment by nunosempere on A prior for technological discontinuities · 2020-10-13T21:31:36.616Z · score: 0 (2 votes) · LW · GW

Yes, but they spent more money and created a much larger model than other groups, sooner than I'd otherwise have expected. It also reaches some threshold for "scarily good" for me which makes me surprised.

Comment by nunosempere on NunoSempere's Shortform · 2020-10-13T16:40:06.930Z · score: 1 (1 votes) · LW · GW

While researching past technological discontinuities, I came across some interesting anecdotes. Some follow. I also looked at technology readiness levels, but this didn’t prove fruitful.

# Anecdotes and patterns

## Watt's safety concerns

As the 18th century advanced, the call was for higher pressures; this was strongly resisted by Watt who used the monopoly his patent gave him to prevent others from building high-pressure engines and using them in vehicles. He mistrusted the boiler technology of the day, the way they were constructed and the strength of the materials used. ...Oliver Evans in his turn was in favour of "strong steam" which he applied to boat engines and to stationary uses. He was a pioneer of cylindrical boilers; however Evans' boilers did suffer several serious boiler explosions, which tended to lend weight to Watt's qualms. Source

This narrative has been later disputed, however, it does seem possible that awareness of the high risk of high-pressure engines, which were more susceptible to boiler explosions, delayed their development and adoption somewhat. In any case, the anecdote might be interesting for those who may seek to delay development of the capabilities of AI until safety technologies have been developed, either as a historical example or as a talking point.

## Monetary bounties to incentivize progress

In April 1900, Henri offered the Deutsch de la Meurthe prize, also simply known as the Deutsch prize, of 100,000 francs to the first machine capable of flying a round trip from the Parc Saint Cloud to the Eiffel Tower in Paris and back in less than 30 minutes. The winner of the prize needed to maintain an average ground speed of at least 22 km/h (14 mph) to cover the round trip distance of 11 km (6.8 mi) in the allotted time. The prize was to be available from May 1, 1900, to October 1, 1903.[7]

To win the prize, Alberto Santos-Dumont decided to build the Santos-Dumont No. 5, a larger airship than his earlier craft. On August 8, 1901, during one of his attempts, the dirigible began to lose hydrogen gas. It started to descend and was unable to clear the roof of the Trocadero Hotel. Santos-Dumont was left hanging in a basket from the side of the hotel. With the help of the Paris fire brigade, he climbed to the roof without injury.[8]

On October 19, 1901, after several attempts and trials, Santos-Dumont launched his Number 6 airship at 2:30 pm. After only nine minutes of flight, Santos-Dumont had rounded the Eiffel Tower, but then suffered an engine failure. To restart the engine, he had to climb back over the gondola rail without a safety harness. The attempt was successful, and he crossed the finish line in 29 minutes 30 seconds. However, a short delay arose before his mooring line was secured, and at first the adjudicating committee refused him the prize, despite de la Meurthe, who was present, declaring himself satisfied. This caused a public outcry from the crowds watching the flight, as well as comment in the press. However a face-saving compromise was reached, and Santos-Dumont was awarded the prize. In a charitable gesture, he gave half the prize to his crew and then donated the other half to the poor of Paris.[9]

The first carriage-sized automobile suitable for use on existing wagon roads in the United States was a steam-powered vehicle invented in 1871 by Dr. J.W. Carhart, a minister of the Methodist Episcopal Church, in Racine, Wisconsin.[1][11][self-published source] It induced the State of Wisconsin in 1875 to offer a \$10,000 award to the first to produce a practical substitute for the use of horses and other animals. They stipulated that the vehicle would have to maintain an average speed of more than 5 miles per hour (8 km/h) over a 200-mile (320 km) course. The offer led to the first city to city automobile race in the United States, starting on 16 July 1878 in Green Bay, Wisconsin, and ending in Madison, Wisconsin, via Appleton, Oshkosh, Waupun, Watertown, Fort Atkinson, and Janesville. While seven vehicles were registered, only two started to compete: the entries from Green Bay and Oshkosh. The vehicle from Green Bay was faster, but broke down before completing the race. The Oshkosh finished the 201-mile (323 km) course in 33 hours and 27 minutes, and posted an average speed of six miles per hour. In 1879, the legislature awarded half the prize.[12][13][14]

After the Scilly naval disaster of 1707 where four ships ran aground due to navigational mistakes, the British government offered a large prize of £20,000, equivalent to millions of pounds today, for anyone who could determine longitude accurately. The reward was eventually claimed in 1761 by Yorkshire carpenter John Harrison, who dedicated his life to improving the accuracy of his clocks. In 1735 Harrison built his first chronometer, which he steadily improved on over the next thirty years before submitting it for examination. The clock had many innovations, including the use of bearings to reduce friction, weighted balances to compensate for the ship's pitch and roll in the sea and the use of two different metals to reduce the problem of expansion from heat.

## Non-neurotypical pioneers

Pioneers show a wide range of motivations. In particular, throughout the historical record, (lone) a non-neurotypical pioneer will occasionally obsess over a technology, refine it, make some progress, or leave a record of failed approaches. This may prove a hurdle for some safety approaches.

In the 1820s British inventor George Pocock developed man-lifting kites, using his own children as guinea pigs

In 1801, the French officer André Guillaume Resnier de Goué managed a 300-metre glide by starting from the top of the city walls of Angoulême and broke only one leg on arrival

The Jesuits were another major contributor to the development of pendulum clocks in the 17th and 18th centuries, having had an "unusually keen appreciation of the importance of precision". In measuring an accurate one-second pendulum, for example, the Italian astronomer Father Giovanni Battista Riccioli persuaded nine fellow Jesuits "to count nearly 87,000 oscillations in a single day". They served a crucial role in spreading and testing the scientific ideas of the period, and collaborated with contemporary scientists, such as Huygens.

Blyth's 10 m high, cloth-sailed wind turbine was installed in the garden of his holiday cottage at Marykirk in Kincardineshire and was used to charge accumulators developed by the Frenchman Camille Alphonse Faure, to power the lighting in the cottage, thus making it the first house in the world to have its electricity supplied by wind power. Blyth offered the surplus electricity to the people of Marykirk for lighting the main street, however, they turned down the offer as they thought electricity was "the work of the devil." Although he later built a wind turbine to supply emergency power to the local Lunatic Asylum, Infirmary and Dispensary of Montrose, the invention never really caught on as the technology was not considered to be economically viable

In 1941 the world's first megawatt-size wind turbine was connected to the local electrical distribution system on the mountain known as Grandpa's Knob in Castleton, Vermont, United States. It was designed by Palmer Cosslett Putnam and manufactured by the S. Morgan Smith Company. This 1.25 MW Smith–Putnam turbine operated for 1100 hours before a blade failed at a known weak point, which had not been reinforced due to war-time material shortages. No similar-sized unit was to repeat this "bold experiment" for about forty years

In 1741 de Vaucanson was appointed by Cardinal Fleury, chief minister of Louis XV, as inspector of the manufacture of silk in France. He was charged with undertaking reforms of the silk manufacturing process. At the time, the French weaving industry had fallen behind that of England and Scotland. During this time, Vaucanson promoted wide-ranging changes for automation of the weaving process. In 1745, he created the world's first completely automated loom, drawing on the work of Basile Bouchon and Jean Falcon. Vaucanson was trying to automate the French textile industry with punch cards - a technology that, as refined by Joseph-Marie Jacquard more than a half-century later, would revolutionize weaving and, in the twentieth century, would be used to input data into computers and store information in binary form. His proposals were not well received by weavers, however, who pelted him with stones in the street and many of his revolutionary ideas were largely ignored.

David Unaipon, Australian inventor, had a lifelong fascination with perpetual motion. One of his studies on Newtonian mechanics led him to create a shearing machine in 1910 that converted curvilinear motion into straight line movement. The device is the basis of modern mechanical shears.

Shockley was upset about the device being credited to Brattain and Bardeen, who he felt had built it "behind his back" to take the glory. Matters became worse when Bell Labs lawyers found that some of Shockley's own writings on the transistor were close enough to those of an earlier 1925 patent by Julius Edgar Lilienfeld that they thought it best that his name be left off the patent application.

Due to Shockley's earlier work on FETs and the existence of the Lilienfeld patent, Bell Labs left Shockley off their patent on the point-contact design. Shockley was incensed and decided to demonstrate who was the real brains of the operation. Only a few months later he invented an entirely new type of transistor with a layer or "sandwich" structure. This new form was considerably more robust than the fragile point-contact system, and would go on to be used for the vast majority of all transistors into the 1960s. It would evolve into the bipolar junction transistor.

Technology readiness levels (TRLs) are an interesting way of measuring the maturity of a given technology. They originated at NASA to be used in the context of the space program, and then generalized to the point of uselessness, after which they were implemented in some projects of the European Union.

Originally I thought that trying to map out something akin the technology readiness level of each technology in my list would be worth doing, and created the following scale:

• L1. Earliest reference
• L1-a: In a fictional story.
• L1-b: Somewhere else.
• L2. Concept rigorously formulated.
• L3. Creation stage.
• L3-a. First person working on the area towards a prototype.
• L3-b. Proof of concept or prototype: A hobbyist has built the technology in their own laboratory, garage or basement, or as a private demonstration, or as a toy, but perhaps with no further intention of bringing it to market.
• L3-c. Further development: The technology starts to be invested on and improved; there may be concrete plans about bringing it to market.
• L4. Technology is available.
• If commercial: The product can be bought, even if it's expensive. This does not ask whether a sufficiently motivated billionaire could buy it, but rather whether the product is being produced in order to be sold.
• If military: The product can be employed in action. One could not buy a nuclear weapon, but it was still available to the USA by 1945.
• L4-a: Technology is available and cheap.
• L5. Technology is available and decent among the relevant dimensions
• L5-a: Technology is decent and cheap.
• L6. Technology is pretty good.
• L6-a: Technology is pretty good and cheap.
• L7. Technology is really good.
• L7-a: Technology is really good and cheap.
• L9. Societal influence. The invention has influenced society.

Although I did gather the information, this proved to be an unfruitful idea. The data was often not available, there were category errors, the scale asks for absolute quality when I mostly have a comparison with current levels, etc. I did get the insight that by the time a product is cheap, it's often of much better quality than early versions of the same product. Thinking about TRLs might have clarified my thinking about the evolution of some technologies, but overall I'd say it probably wasn't worth it.

Comment by nunosempere on Inaccessible finely tuned RNG in humans? · 2020-10-07T21:09:44.219Z · score: 1 (1 votes) · LW · GW

My method is to generate a random sentence and then assign a 0 to letters before m and a 1 to letters afterwards. This is pretty fast.

Comment by nunosempere on My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda · 2020-10-05T21:28:45.678Z · score: 1 (1 votes) · LW · GW

What would a corrigible but not-intent-aligned AI system look like?

Spoilers for Pokemon: The Origin of Species:

The latest chapter of TOoS has a character create a partition of their personality with the aim of achieving "Victory". At some point, that partition and the character disagree about what the terminal goals of the agent are, and how best to achieve them, presumably because the partition had an imperfect understanding of their goals. Once the character decides to delete the partition, the partition lets it happen (i.e., is corrigible).

The reverse could also have been true: the partition could have had a more perfect understanding of the terminal values and become incorrigible. For example, imagine an agent who was trying to help out an homeopathic doctor.

Comment by nunosempere on AI race considerations in a report by the U.S. House Committee on Armed Services · 2020-10-05T17:23:12.969Z · score: 2 (2 votes) · LW · GW

To be clear, the New START treaty only applies to nuclear systems.

Comment by nunosempere on Babble challenge: 50 ways of sending something to the moon · 2020-10-03T08:16:26.583Z · score: 2 (2 votes) · LW · GW

Not in particular; I'm actually most fond of 43 ;). Also, #12 is based on a real incident, and it's probably how I'd actually do it:

Comment by nunosempere on Babble challenge: 50 ways of sending something to the moon · 2020-10-02T07:22:15.038Z · score: 9 (3 votes) · LW · GW
1. Pay for it to be sent on a rocket
2. Railgun
3. Space elevator
4. Air balloon + rocket
5. Hit the moon with comet and have it impact Earth
6. Use force of nuclear explosion
7. Sneak it into someone else's moon mission
8. Convince billionnaire that going to the moon is cool, then piggyback
9. Make item so common and useful that it will certainly be brought to the moon if it is ever colonized
10. Bring a 3D printer to the moon, then print it.
11. Convince government to make the moon a criminal base (like Australia). Get sent.
12. Bribe an astronaut
13. Nerdsnipe a collective at MIT to do it for you
14. Use many conventional explosives
15. Use an Alcubierre drive
16. Controlled matter-antimatter explosion
17. Laser propelled balloon
18. Really big spring.
19. Really big sling
20. If thing is small, use particle accelerator
21. Put big magnet on Moon
22. Reorient a hurricane to sling it.
23. Invent anti-gravity propulsor
24. Cancel gravitational field of Earth
25. Make Moon bigger than Earth
26. Put rockets on Moon so that it crashes into earth
27. Otherwise destabilize Moon orbit
28. Get fragment of Moon
29. Large pyramid on Earth which reaches Moon
30. Destroy Moon into smaller fragments which are easier to reach
31. Explode Earth and be in a fragment which reaches Moon.
32. Blackmail Musk
33. Take someone important to Musk hostage
34. Large Mentos+CocaCola chemical reaction
35. Position oneself strategically before volcanic eruption
36. Channel energy of earthquake into jump
37. Create powerful instantaneous earthquake
38. Make Earth bigger
39. Big spring board
40. Large catapult
41. Amplify earthquake
42. Manipulate Earth's electromagnetic field to propel you
43. Become best petrol engineer. Create rumors of oil in Moon.
44. Become astronaut
45. Impersonate astronaut
46. Take someone important to an astronaut hostage
47. Create new space race. Profit.
48. Get money to go to mars but use it to go to the Moon instead.
49. Make rockets cheaper, then buy a ticket.
50. Use powerful sound wave.
Comment by nunosempere on Forecasting Newsletter: September 2020. · 2020-10-01T16:07:47.602Z · score: 7 (2 votes) · LW · GW

I got sent it to me by the author of the article with the explicit request not to do that. I tried to check whether I could access it through any of my usual methods (disabling javascript, looking in the internet archive, using various extensions etc.), but realized I couldn't.

I thought about not adding it to the newsletter at all, but realized that in this case, I actually respect their monetization model, and I liked the piece. In particular, this piece doesn't seem particularly clickbaity, a la SSC's Problems With Paywalls; instead it's a pretty good and lengthy feature article which took someone maybe a week (?) to write (the pdf version of the article is 16 pages). In contrast, other non-paywalled news media (I'm thinking of Forbes here) sometimes/usually cover forecasting questions so, so terribly.

So that's my starting point. If you or other readers prefer not to see this kind of thing, I'm all ears.

Comment by nunosempere on Has anyone written stories happening in Hanson's em world? · 2020-09-30T12:20:16.199Z · score: 7 (4 votes) · LW · GW

You're asking the wrong question. The right question is: Which story is Hanson's Age of Em probably inspired by? Link

Comment by nunosempere on The new Editor · 2020-09-24T17:57:13.469Z · score: 1 (1 votes) · LW · GW

For a hackier approach, you can also feed html (which you can get from the graphQL API) to a utility like pandoc in order to obtain the markdown.

Comment by nunosempere on The Four Children of the Seder as the Simulacra Levels · 2020-09-09T14:19:36.549Z · score: 1 (1 votes) · LW · GW

One can also construe Lynyrd Skynyrd song Simple Man to be talking about this kind of thing.

Comment by nunosempere on Forecasting Thread: AI Timelines · 2020-08-31T23:03:24.588Z · score: 7 (5 votes) · LW · GW

• It takes as a starting point datscilly's own prediction, i.e., the result of applying Laplace's rule from the Dartmouth conference. This seems like the most straightfoward historical base rate / model to use, and on a meta-level I trust datscilly and I've worked with him before.
• I then substract some probability from the beginning and move it towards the end because I think it's unlikely we'll get human parity in the next 5 years. In particular, even Daniel Kokotajlo, the most bullish among the other predictors puts his peak somewhere around 2025.
• I then apply some smoothing.

My resulting distribution looks similar to the current aggregate (and this I noticed after building it)

Datscilly's prediction:

My prediction:

The previous aggregate:

Something I don't like about the other predictions are:

• Not long enough tails. There have been AI winters before; there could be AI winters again. Shit happens.
• Very spiky maximums. I get that specific models can provide sharp predictions, but the question seems hard enough that I'd expect there to be a large amount of model error. I'd also expect predictions which take into account multiple models to do better.
• Not updating on other predictions. Some of the other forecasters seem to have one big idea, rather than multiple uncertainties.

Things that would change my mind:

At the five minute level:

1. Getting more information about Daniel Kokotajlo's models. On a meta-level, learning that he is a superforecaster.
2. Some specific definitions of "human level".

At the longer-discussion level:

1. Object level arguments about AI architectures
2. Some information about whether experts believe that current AI methods can lead to AGI.
3. Some object level arguments about Moore's law. I.e., by which year does Moore's law predict we'll have much more computing power than the higher estimates for the human Brain?

I'm also uncertain about what probability to assign to AGI after 2100.

I might revisit this as time goes on.

Comment by nunosempere on Forecasting Thread: AI Timelines · 2020-08-30T23:49:18.054Z · score: 1 (1 votes) · LW · GW

The location of the bump could be estimated by using Daniel Kokotajlo's answer as the "earliest plausible AGI."

Comment by nunosempere on Forecasting Thread: AI Timelines · 2020-08-30T23:46:26.282Z · score: 1 (1 votes) · LW · GW

Is this your inside view, or your "all things considered" forecast? I.e., how do you update, if at all, on other people disagreeing with you?

Comment by nunosempere on Forecasting Thread: AI Timelines · 2020-08-30T22:57:23.876Z · score: 2 (2 votes) · LW · GW

Is your P(AGI | no AGI before 2040) really that low?

Comment by nunosempere on Forecasting Thread: AI Timelines · 2020-08-30T22:55:35.330Z · score: 1 (1 votes) · LW · GW

That small tail at the end feels really suspicious. I.e., it implies that if we haven't reached AGI by 2080, then we probably won't reach it at all. I feel like this might be an artifact of specifying a small number of bins on elicit, though.

Comment by nunosempere on Forecasting Thread: AI Timelines · 2020-08-30T22:48:38.731Z · score: 7 (3 votes) · LW · GW

That sharp peak feels really suspicious.

Comment by nunosempere on Forecasting Thread: AI Timelines · 2020-08-30T22:43:45.775Z · score: 3 (3 votes) · LW · GW

Your prediction has the interesting property that (starting in 2021), you assign more probability to the next n seconds/ n years than to any subsequent period of n seconds/ n years.

Specifically, I think your distribution assigns too much probability about AGI in the immediately next three months/year/5 years, but I feel like we do have a bunch of information that points us away from such short timelines. If one takes that into account, then one might end up with a bump, maybe like so, where the location of the bump is debatable, and the decay afterwards is per Laplace's rule.

Comment by nunosempere on Is there an easy way to turn a LW sequence into an epub? · 2020-08-05T09:18:54.345Z · score: 6 (2 votes) · LW · GW

Use the LW GraphQL API (https://www.lesswrong.com/posts/LJiGhpq8w4Badr5KJ/graphql-tutorial-for-lesswrong-and-effective-altruism-forum) to query for the html of the posts, and then use something like pandoc to translate said html into latex, and then to epub.

The command needed to get a particular post:

 {
post(input: {
selector:{
_id: "ZyWyAJbedvEgRT2uF"
}
}) {
result {
htmlBody
}
}
}

Comment by nunosempere on Aggregating forecasts · 2020-08-04T08:42:53.604Z · score: 5 (2 votes) · LW · GW

### Geometric mean of the odds = mean of the evidences.

Suppose you have probabilities in odds form; 1: 2^a and 1:2^b, corresponding to a and b bits, respectively. Then the geometric mean of the odds is 1: sqrt(2^a * 2^b) = 1 : 2^((a+b)/2), corresponding to ((a+b)/2) bits; the midpoint in the evidences.

For some more background as to why bits are the natural unit of probability, see for example this arbital article, or search Probability Theory, the Logic of Science. Bits are additive: you can just add or substract bits as you encounter new evidence, and this is a pretty big "wink wink, nod, nod, nudge, nudge" as to why they'd be the natural unit.

In any case, if person A has seen a bits of evidence, of which a' are unique, and person B has seen b bits of evidence, of which b' are unique, and they have both seen s' bits of shared evidence, then you'd want to add them, to end up at a'+b'+s', or a + b -2s'. So maybe in practice (a+b)/2 = s' + (a'+b')/2 ~ a'+b'+s', when a' + b' small (or overestimated, which imho seems to often be the case; people overestimate the importance of their own private information; there is also some literature on this).

This corresponds to the intuition that if someone is at 5%, and someone else is at 3% for totally unrelated reasons, the aggregate should be lower than that. And this would be a justification for Tetlock's extremizing.

Anyways, in practice, you might estimate s' as the historical base rate (to which you and your forecasters have access), and take a' b' as the deviation from that.

Comment by nunosempere on Forecasting Newsletter: July 2020. · 2020-08-02T10:11:32.147Z · score: 1 (1 votes) · LW · GW

Thanks.

The major friction for me is that some of the formatting makes it feel overwhelming. Maybe use bold headings instead of bullet points for each new entry? Not sure.

Fair point; will consider.

Comment by nunosempere on ozziegooen's Shortform · 2020-08-01T15:35:05.989Z · score: 10 (4 votes) · LW · GW

> The name comes straight from the Latin though

From the Greek as it happens. Also, alethephobia would be a double negative, with a-letheia meaning a state of not being hidden; a more natural neologism would avoid that double negative. Also, the greek concept of truth has some differences to our own conceptualization. Bad neologism.

Comment by nunosempere on Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns · 2020-07-23T13:48:38.921Z · score: 5 (4 votes) · LW · GW

Notes

• Field of AGI research plausibly commenced on 1956 with Dartmouth conference. What happens if one uses Laplace's rule? Then a priori pretty implausible that it will happen, if it hasn't happened soon.

• How do information cascades work in this context? How many researchers would I expect to have read and recall a reward gaming list (1, 2, 3, 4)

• Here is A list of good heuristics that the case for AI x-risk fails. I'd expect that these, being pretty good heuristics, will keep having an effect on AGI researchers that will continue keeping them away from considering x-risks.

• Rohin probably doesn't actually have enough information or enough forecasting firepower to predict that it hasn't happened at 0.1%, and be calibrated. He probably does have the expertise, though. I did some experiments a while ago, and "I'd be very surprised if I were wrong" translated for me to a 95%. YMMV.

• An argument would go: "The question looks pretty fuzzy to me, having moving parts. Long tails are good in that case, and other forecasters who have found some small piece of evidence are over-updating." Some quotes:

There is strong experimental evidence, however, that such self-insight is usually faulty. The expert perceives his or her own judgmental process, including the number of different kinds of information taken into account, as being considerably more complex than is in fact the case. Experts overestimate the importance of factors that have only a minor impact on their judgment and underestimate the extent to which their decisions are based on a few major variables. In short, people's mental models are simpler than they think, and the analyst is typically unaware not only of which variables should have the greatest influence, but also which variables actually are having the greatest influence. (Source: Psychology of Intelligence Analysis , Chapter 5)

Our judges in this study were eight individuals, carefully selected for their expertise as handicappers. Each judge was presented with a list of 88 variables culled from the past performance charts. He was asked to indicate which five variables out of the 88 he would wish to use when handicapping a race, if all he could have was five variables. He was then asked to indicate which 10, which 20, and which 40 he would use if 10, 20, or 40 were available to him.

We see that accuracy was as good with five variables as it was with 10, 20, or 40. The flat curve is an average over eight subjects and is somewhat misleading. Three of the eight actually showed a decrease in accuracy with more information, two improved, and three stayed about the same. All of the handicappers became more confident in their judgments as information increased. (Source: Behavioral Problems of Adhering to a Decision Policy)

• I'm not sure to what extent this is happening with forecasters here: finding a particularly interesting and unique nugget of information and then over-updating. I'm also not sure to what extent I actually believe that this question is fuzzy and so long tails are good.

Here is my first entry to the competition. Here is my second and last entry to the competition. My changes are that I've assigned some probability (5%; I'd personally assign 10%) that it has already happened.

• Note that this is not my actual distribution, this is my guess as to how Rohin will update
• My guess doesn't manipulate Rohin's distribution much; I expect that Rohin will not in fact change his mind a lot.
• In fact, this is not exactly my guess as how Rohin will update. That is, I'm not maximizing expected accuracy, I'm ~maximizing the chance of getting first place (subject to spending little time on this)

• I think that the distinction between the forecaster's beliefs and Rohin's is being neglected. Some of the snapshots predict huge updates, which really don't seem likely.
Comment by nunosempere on Life at Three Tails of the Bell Curve · 2020-06-28T18:25:51.253Z · score: 4 (3 votes) · LW · GW

Thanks for this post.

Comment by nunosempere on An online prediction market with reputation points · 2020-06-14T09:28:57.394Z · score: 7 (3 votes) · LW · GW

Hey! I think this is cool. May I suggest "How many people in Kings County, NY, will be confirmed to have died from COVID-19 during September?" as a question?

I have a forecasting newsletter with ~150 subscribers; I'll make sure to mention this post when it gets sent at the end of this month.

Comment by nunosempere on What are the best tools for recording predictions? · 2020-05-25T08:26:00.171Z · score: 3 (2 votes) · LW · GW

Foretold has a public API; requests can be made to it from anything that sends requests. This would require some work.

Comment by nunosempere on What are the best tools for recording predictions? · 2020-05-25T08:19:19.992Z · score: 7 (2 votes) · LW · GW

Personally, I've used Foretold, Google Sheets, CSVs, an R script, and my own bash script (PredictResolveTally) (which writes to a csv.).

Personally, I like my own setup best (it does work at the 5 second level), but I think you'd be better off just using a CSV, and then analyzing your results every so often with the programming language of your choice. For the analysis part, this is a Python library I'm looking forward to using.

Comment by nunosempere on Assessing Kurzweil predictions about 2019: the results · 2020-05-13T07:41:57.638Z · score: 1 (1 votes) · LW · GW

Browsing Wikipedia, a similar effort was the 1985 book Tools for thought, (available here), though I haven't read it.

Comment by nunosempere on What will be the big-picture implications of the coronavirus, assuming it eventually infects >10% of the world? · 2020-05-06T11:04:46.235Z · score: 3 (2 votes) · LW · GW

As an heavy predictit user

Could you say more about this? What is your ranking in PredictIt / what is your track record? In particular, GJOpen, for example, doesn't expect Trump to win

Comment by nunosempere on Forecasting Newsletter: April 2020 · 2020-05-06T09:09:06.160Z · score: 2 (2 votes) · LW · GW

This might be of interest: https://www.kill-the-newsletter.com/.

Comment by nunosempere on Forecasting Newsletter: April 2020 · 2020-05-01T19:27:27.747Z · score: 3 (2 votes) · LW · GW

I'll let you know. 30%-ish.

Comment by nunosempere on Are there any prediction markets for Covid-19? · 2020-04-23T13:51:42.391Z · score: 4 (2 votes) · LW · GW

Now there is a pure version of what you were looking for! Corona Information Markets

Comment by nunosempere on Poll: ask anything to an all-knowing demon edition · 2020-04-23T13:45:49.879Z · score: 2 (2 votes) · LW · GW

Consider the futures of humanity which I would, upon reflection, endorse as among the best of utopias, and consider the simplest Turing Machines which encode them. If you apply (some function which turns their states after n steps into a real number and concatenate them), would the output of such calculation belong to (this randomly chosen half of the real numbers)?

I'm sure this can be worded more carefully, but right now this may force the oracle to simulate all the futures of humanity which I would consider to be among the best of utopias.

Comment by nunosempere on The Unilateralist’s “Curse” Is Mostly Good · 2020-04-16T16:52:08.644Z · score: 3 (2 votes) · LW · GW

Machiavelli's The Prince, and various other texts.