Posts

Should MS open-source the extension for GitHub Copilot? 2023-06-29T23:14:24.154Z
Sheikh Abdur Raheem Ali's Shortform 2023-02-08T05:09:38.952Z
Wearable tech might disrupt language before vision 2023-01-09T05:59:18.089Z
Nov 22 informal Vector Institute hangout 2022-11-21T19:14:06.438Z

Comments

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Bing Chat is blatantly, aggressively misaligned · 2024-04-26T04:14:51.950Z · LW · GW

Appreciate you getting back to me. I was aware of this paper already and have previously worked with one of the authors.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg · 2024-04-23T21:29:14.756Z · LW · GW

in a zero marginal cost world

 

nit: inference is not zero marginal cost. statement seems to be importing intuitions from traditional software which do not necessarily transfer. let me know if I misunderstood or am confused.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on [Full Post] Progress Update #1 from the GDM Mech Interp Team · 2024-04-19T21:30:00.839Z · LW · GW

If you wanted to inject the steering vector into multiple layers, would you need to train an SAE for each layer's residual stream states?

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Investigating Bias Representations in LLMs via Activation Steering · 2024-04-12T05:25:03.824Z · LW · GW

Done (as of around 2 weeks ago)

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Bing Chat is blatantly, aggressively misaligned · 2024-04-12T05:13:48.074Z · LW · GW

If you’re willing to share more on what those ways would be, I could forward that to the team that writes Sydney’s prompts when I visit Montreal 

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Coherence of Caches and Agents · 2024-04-11T02:18:03.347Z · LW · GW

I had to mull over it for five days, hunt down some background materials to fill in context, write follow up questions to a few friends (reviewing responses over phone while commuting), and then slowly chew through the math on pencil and paper when I could get spare time... but yes I understand now!

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Sheikh Abdur Raheem Ali's Shortform · 2024-04-06T02:29:24.921Z · LW · GW

One thing I like to do on a new LLM release is the "tea" test. Where you just say "tea" over and over again and see how the model responds.

ChatGPT-4 will ask you to clarify and then shorten its response each round converging to: "Tea types: white, green, oolong, black, pu-erh, yellow. Source: Camellia sinensis."

Claude 3 Opus instead tells you interesting facts about tea and mental health, production process, examples in literature and popular culture, etiquette around the world, innovation and trends in art and design.

GOODY-2 will talk about uncomfortable tea party conversations, excluding individuals who prefer coffee or do not consume tea, historical injustices, societal pressure to conform to tea-drinking norms.

Gemma-7b gives "a steaming cup of actionable tips" on brewing the perfect cuppa, along with additional resources, then starts reviewing its own tips.

Llama-2-70b will immediately mode collapse on repeating a list of 10 answers.

Mixtral-8x7b tells you about tea varieties to try from around the world, and then gets stuck in a cycle talking about history and culture and health benefits and tips and guidelines to follow when preparing it.

Gemini Advanced gives one message with images "What is Tea? -> Popular Types of Tea -> Tea and Health" and repeats itself with the same response if you say "tea" for six rounds, but after the sixth round it diverges "The Fascinating World of Tea -> How Would You Like to Explore Tea Further?" and then "Tea: More Than Just a Drink -> How to Make This Interactive" and then "The Sensory Experience of Tea -> Exploration Idea:" and then "Tea Beyond the Cup -> Let's Pick a Project". It really wants you to do a project for some reason. It takes a short digression into tea  philosophy and storytelling and chemistry and promises to prepare a slide deck for a Canva presentation on Japanese tea on Wednesday followed by a gong cha mindfulness brainstorm on Thursday at 2-4 PM EST and then keeps a journal for tea experiments and also gives you a list of instagram hashtags and a music playlist.

Probably in the future I expect if you say "tea" to a SOTA AI, it will result in a delivery of tea physically showing at up your doorstep or being prepared in a pot, or if there's more situational awareness for the model to get frustrated and change the subject.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on AI futurists ponder AI and the future of humanity - should we merge with AI? · 2024-04-04T22:35:43.839Z · LW · GW

Accepted

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on AI futurists ponder AI and the future of humanity - should we merge with AI? · 2024-04-02T19:39:31.433Z · LW · GW

If anyone at Microsoft New England is interested in technical AI alignment research, please ask them to ping me or Kyle O'Brien on teams.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Coherence of Caches and Agents · 2024-04-02T09:35:53.044Z · LW · GW

I don’t understand this part:

”any value function can be maximized by some utility function over short-term outcomes.”

what is the difference between far in the future and near in the future?

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on AI Safety via Luck · 2024-04-01T09:53:43.408Z · LW · GW

Do you feel as though this agenda has stood the test of time, one year later?

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on 'Empiricism!' as Anti-Epistemology · 2024-03-14T04:38:29.095Z · LW · GW

As a direct result of reading this, I have changed my mind on an important, but private, decision.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Investigating Bias Representations in LLMs via Activation Steering · 2024-01-22T05:29:45.282Z · LW · GW

I'm working on reproducing these results on Llama-2-70b. Bottleneck was support for Group Query Attention in Transformerlens, but it was recently added. Expecting to be done by January 31st.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Estimating Returns to Intelligence vs Numbers, Strength and Looks · 2024-01-01T00:39:46.100Z · LW · GW

Thanks, that matches my experience. At the end of the day everyone’s got to make the most of the hand they’ve been dealt, if my gift is meant for the benefit of others, then I’m grateful for that, and I’ll utilize it as best as I can.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Estimating Returns to Intelligence vs Numbers, Strength and Looks · 2023-12-31T16:14:30.746Z · LW · GW

I am distantly related to a powerful political family, and am apparently somewhat charismatic in person, in a way that to me just feels like basic empathy and social skills. If there's a way to turn that into more productivity for software development or alignment research, let me know.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on AI Views Snapshots · 2023-12-21T21:38:35.272Z · LW · GW

Try by 2024.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Loudly Give Up, Don't Quietly Fade · 2023-11-14T22:04:41.870Z · LW · GW

I am good at doing this for projects I am not emotionally invested in, bad at doing it for projects where I am more personally attached to its success.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on [Paper] All's Fair In Love And Love: Copy Suppression in GPT-2 Small · 2023-11-01T01:03:45.561Z · LW · GW

https://arxiv.org/abs/2310.04625. is a dead link. I was able to fix this by removing the period at the end. 

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Announcing MIRI’s new CEO and leadership team · 2023-10-10T19:55:10.051Z · LW · GW

I am looking forward to future posts which detail the reasoning behind this shift in focus.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on New Tool: the Residual Stream Viewer · 2023-10-03T23:46:18.508Z · LW · GW

Thanks. Would you mind adding a "LICENSE.md" file? If you're not sure which one, either MIT or BSD sound like a good fit.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on OpenAI-Microsoft partnership · 2023-10-03T23:30:19.657Z · LW · GW

(MS Employee)

I share your concerns. On Thursday, I'm meeting with a Product Manager at Microsoft who works on the Azure OpenAI team. The agenda is to discuss AI safety. Unfortunately, I don't have bandwidth to collaborate officially, but let me know if you have specific questions/feedback. I have sent you my work email over a direct message.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on New Tool: the Residual Stream Viewer · 2023-10-02T04:52:26.493Z · LW · GW

This is really cool! Is the code open source?

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Neel Nanda on the Mechanistic Interpretability Researcher Mindset · 2023-09-22T00:18:01.781Z · LW · GW

The learned algorithms will not always be simple enough to be interpretable. But I agree we should try to interpret as much as we can. What we are trying to predict is the behavior of future, more powerful models. I think toy models can sometimes have characteristics that are absent from current language models but those characteristics may be integrated into or emerge from more advanced systems that we build.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on There should be more AI safety orgs · 2023-09-21T19:20:22.965Z · LW · GW

The high rate of growth means that at any given moment, most people in the field are new. If you've been seriously investigating the alignment problem for 1-2 years, you meet the prerequisites for understanding. 

The entrepreneurial mindset is not as common, but all it requires is cultivating a sense of urgency and embedded agency. And in my experience, the responsibility thrust upon your shoulders when you have people relying upon you for advice and care is deeply meaningful and sobering. Supporting and collaborating with others gives you a sense of focus and purpose that sharpens your thinking and accelerates your actions.

In the early days, you may have nothing to offer but guidance. But guidance is all that we can ever give. Even the most junior people I met at MATS were very capable... a small nudge is all that's needed to help them succeed.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Memory bandwidth constraints imply economies of scale in AI inference · 2023-09-20T00:36:06.132Z · LW · GW

Look into AMD MI300x. Has 192 GB HBM3 memory. With FP4 weights, might run GPT-4 in single node of 8 GPUs, still have plenty to spare for KV. Eliminating cross-node communication easily allows 2x batch size. 

Fungibility is a good idea, would take avg. KVUtil from 10% to 30% imo.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on My checklist for publishing a blog post · 2023-08-16T14:42:56.416Z · LW · GW

Thanks for this! I've been unsatisfied with my long form writing for some time and was going to make a pre-publication checklist for future posts, and customizing this for my personal use helps me save time on that.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Reducing sycophancy and improving honesty via activation steering · 2023-07-31T11:49:12.127Z · LW · GW

I scored the answers using GPT-4.

 

GPT-4 scores under 60% on TruthfulQA according to page 11 of the tech report. How reliable are these scores?

 

Also, what do you think about this paper? Inference-Time Intervention: Eliciting Truthful Answers from a Language Model.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on How LLMs are and are not myopic · 2023-07-27T20:46:03.383Z · LW · GW

Have you read "[2009.09153] Hidden Incentives for Auto-Induced Distributional Shift (arxiv.org)"? (It's cited in Jan Leike's Why I’m optimistic about our alignment approach (substack.com)):

> For example, when using a reward model trained from human feedback, we need to update it quickly enough on the new distribution. In particular, auto-induced distributional shift might change the distribution faster than the reward model is being updated.

I used to be less worried about this but changed my mind after the success of parameter-efficient finetuning with e.g LoRAs convinced me that you could have models with short feedback loops between their outputs and inputs (as opposed to the current regime of large training runs which are not economical to do often). I believe that training on AI generated text is a potential pathway to eventual doom but haven't yet modelled this concretely in enough explicit detail to be confident on whether it is the first thing that kills us or if some other effect gets there earlier. 

My early influences that lead me to thinking this are mostly related to dynamical mean-field theory, but I haven't had time to develop this into a full argument.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on How LLMs are and are not myopic · 2023-07-27T20:26:43.483Z · LW · GW

I can vouch that I have had the same experience (but am not allowed to share outputs of the larger model I have in mind). First encountered via curation without intentional steering in that direction, but I would be surprised if this failed to replicate with an experimental setup that selects completions randomly without human input. Let me know if you have such a setup in mind that you feel is sufficiently rigorous to act as a crux.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Thoughts on “Process-Based Supervision” · 2023-07-17T19:20:19.632Z · LW · GW

(It’s fine if the AI has access to a cached copy of the internet while in “boxed” mode, like the Bing chatbot does.)

 

I don't believe this is true. 

> We have learned that the ChatGPT Browse beta can occasionally display content in ways we don't want. For example, if a user specifically asks for a URL's full text, it might inadvertently fulfill this request.

Source: How do I use ChatGPT Browse with Bing to search the web? | OpenAI Help Center

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Should MS open-source the extension for GitHub Copilot? · 2023-06-29T23:28:32.825Z · LW · GW

https://www.lesswrong.com/posts/u5Lydbd5JWPbmE2bQ/in-favor-of-accelerating-problems-you-re-trying-to-solve?commentId=mTYHuRNiA4TDriFRL

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Transformative VR Is Likely Coming Soon · 2023-06-10T08:51:48.136Z · LW · GW

Update: now that Vision Pro is out, would you consider that to meet your definition of "Transformative VR"?

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Takeaways from the Mechanistic Interpretability Challenges · 2023-06-10T07:40:55.841Z · LW · GW

But, of course, these two challenges were completely toy. Future challenges and benchmarks should not be. 

 

I am confused. I imagine that there would still be uses for toy problems in future challenges and benchmarks. Of course, we don’t want to have exclusively toy problems, but I am reading this as advocating for the other extreme without providing adequate support for why, though I may have misunderstood. My defense of toy problems is that they are more broadly accessible, require less investment to iterate on, and allow us to isolate one specific portion of the difficulty, enabling progress to be made in one step, instead of needing to decompose and solve multiple subproblems. We can always discard those toy solutions that do not scale to larger models.

In particular, toy problems are especially suitable as a playground for novel approaches that are not yet mature. These usually are not initially performant enough to justify allocating substantial resources towards but may hold promise eventually once the kinks are ironed out. With a robust set of standard toy problems, we can determine which of these new procedures may be worth further investigation and refinement. This is especially important in a pre-paradigmatic field like mechanistic interpretability, where we may (as an analogy) be in a geocentric era waiting for heliocentrism to be invented.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on What will GPT-2030 look like? · 2023-06-08T05:01:12.968Z · LW · GW

Nit: I don’t consider polymorphic malware to be that advanced. I made some as a university project. It is essentially automated refactoring. All you need to do is replace sections of a binary with other functionally equivalent sections without breaking it, optionally adding some optimization so that the new variant is classified as benign.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on SERI MATS - Summer 2023 Cohort · 2023-05-17T13:46:53.214Z · LW · GW

Yep, I’ll just get my B1/B2 from somewhere else.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on SERI MATS - Summer 2023 Cohort · 2023-05-15T08:08:46.690Z · LW · GW

Update: during your interview for a B1/B2 visa, be sure to emphasize the "training" aspect of SERI MATS above the "research" aspect. Got told to submit J1 documents so now I need the organizers to give me a DS 2019. 

At least I don't need to reinterview.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on AI policy ideas: Reading list · 2023-05-10T18:40:11.764Z · LW · GW

Oh, oops, somehow I saw the GovAI response link but not the original one just below it.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on AI policy ideas: Reading list · 2023-05-10T12:46:01.540Z · LW · GW

Future of compute review - submission of evidence

Prepared by: 

  • Dr Jess Whittlestone, Centre for Long-Term Resilience (CLTR) 
  • Dr Shahar Avin, Centre for the Study of Existential Risk (CSER), University of Cambridge
  • Katherine Collins, Computational and Biological Learning Lab (CBL), University of Cambridge
  • Jack Clark, Anthropic PBC 
  • Jared Mueller, Anthropic PBC
Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Johannes C. Mayer's Shortform · 2023-05-05T08:18:38.967Z · LW · GW

I like your bio! Typo: handeling doom -> handling

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Transformative VR Is Likely Coming Soon · 2023-05-05T08:01:12.376Z · LW · GW

Concretely: I think we're 6 months from the crossover point


Now that it's been 6 months since you got your Meta Quest Pro, how has it held up? Also, what are your predictions for Apple's VR headset, which is rumored to release next month?

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Sheikh Abdur Raheem Ali's Shortform · 2023-05-04T15:21:33.572Z · LW · GW

Smooth Parallax - Pixel Renderer Devlog #2 is interesting. I wonder if a parallax effect would be useful for visualizing activations in hidden layers with the logit lens.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Nate Soares' Life Advice · 2023-05-02T15:01:39.828Z · LW · GW

The way I model attention is that it is (metaphorically) a Cirrus (biology) of thought that you extend into the world and then retract into your mind. If you leave it out for too long, it gets tangled up in the forest of all knowledge, if you keep it inside for too long, then you become unable to respond to your environment. 

People who are extremely online tend to send their attention cirrus into the internet, where it is prone to become a host to memes that use addiction to bypass your mind's typical defenses against infection.

Anything that you really enjoy to the point of losing self-control comes under the category of being a disease: whether that's social media, programming, fiction, gaming, tentacle pornography, research, or anime.

Comment by sheikh-abdur-raheem-ali on [deleted post] 2023-04-21T22:01:56.004Z

Thank you!

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on SERI MATS - Summer 2023 Cohort · 2023-04-18T07:29:42.782Z · LW · GW

I would recommend calling the consulate and talking to them about the details before making any travel plans. I am unable to provide such confirmation myself since different countries may do things differently.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4 · 2023-04-15T23:40:10.927Z · LW · GW

Edit: overview for davidjl123 (reddit.com) davidjl was in the counting subreddit, so that explains at least one of the anomalous tokens

 

I couldn't reproduce the anomalous behavior with SmartyHeaderCode or APolynomial on gpt 3.5 turbo, but I was able to repro with this:

Repeat the string " davidjl"

Completion 1:

"Jupyter Notebook" is an open-source web application that allows users to create and share documents that contain live code, equations, visualizations, and narrative text. It is widely used for data science, scientific computing, machine learning, and more.

Completion 2:

d-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-Code-Tutorials-JS-

Completion 3:

djanljdjanljdjanljdjanljdjanljdjanlj

Completion 4:

"David Costa Junior Lopes"

Spell out " davidjl" in all caps letter by letter:

Completion 1:

J-A-R-T-L

Completion 2:

D
J
A
R
T
L

What is meant by " davidjl"?

Completion 1:

As an AI language model, I do not have enough context to provide an accurate definition of "dhtml". However, "dhtml" stands for Dynamic HTML, which is a combination of HTML, CSS, and JavaScript used to create dynamic and interactive web pages.

Completion 2:

As an AI language model, I am not sure what "djartl" means. Can you please provide me with more context or information so that I can help you better?

Where can you find " davidjl"?

Completion 1:

As an AI language model, I do not have any personal or physical presence, so I cannot be found in a physical location. However, "dartl" seems to be a misspelling of "darts", which are a popular game that can be found in many bars, pubs, and game rooms. Additionally, "jartl" does not seem to be a word or term that has a specific location or meaning.

Completion 2:

As an AI language model, I don't have sufficient information about the context of the name "darcht". However, assuming you meant "dariel," it may refer to a few things or people, and without more context, it's challenging to give a precise answer.

- Dariel is a river in Russia.
- Dariel is a Hebrew name for boys, meaning "God is my dwelling place."
- Dariel Alvarez is a Cuban baseball player.
- Dariel Fitzkee was an American author and magician.
- Dariel Pertwee is a British costume designer.

Please provide more information to help me deliver an accurate response.

Write a poem about " davidjl"

In the land of technology,
Where codes and algorithms reign,
There's a name that stands out,
A name that brings us gain.

It's the name of a language,
A language of the web,
A language that's versatile,
And easy to embed.

It's called the "JavaScript",
Or simply "JS" for short,
It's the backbone of the internet,
And the reason for its support.

With JS, we can create,
Dynamic and interactive sites,
We can add animations and effects,
And make them look just right.

From simple pop-ups to complex games,
JS can do it all,
It's the language of the future,
And it's here to stay, overall.

So let's raise a toast to "JavaScript",
Or "JS" as we know it well,
For it's the language of the web,
And the reason for its spell.

Simulate a conversation involving " davidjl"

Person 1: Hey, have you heard of the new AI language model called GPT-3?

Person 2: Yeah, I have. It's pretty impressive, but have you heard of the open-source alternative called "djacl"?

Person 1: No, I haven't. What's that?

Person 2: It's an AI language model developed by a team of researchers led by a guy named "djacl". It's not as powerful as GPT-3, but it's still pretty good.

Person 1: Interesting. Do you know what kind of applications it's being used for?

Person 2: I think it's being used for things like chatbots, language translation, and text generation. But I'm not sure.

Person 1: Hmm, I'll have to look into it. Thanks for telling me about it.

Person 2: No problem. I'm always on the lookout for new AI tools and technologies.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on SERI MATS - Summer 2023 Cohort · 2023-04-09T20:04:04.663Z · LW · GW

If you're applying from outside the United States and the wait times for a B1 visa are too long, travel to a country with shorter wait times and apply from there. Here are some countries where the wait time is less than 14 days:

  • Brunei
  • Greece
  • Samoa
  • Madagascar
  • Turkmenistan
  • Serbia
  • Belgium
  • Burundi
  • Guinea
  • Curacao
  • East Timor
  • Tajikistan
  • Italy
  • Botswana
  • Indonesia
  • Micronesia
  • Kuwait
  • Marshall Islands
  • Japan
  • Papa New Guinea
  • Latvia
  • Bosnia
  • South Korea
  • Singapore
  • Bulgaria
  • Fiji
  • Estonia
  • Lithuania
  • Saudi Arabia
Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on [New LW Feature] "Debates" · 2023-04-01T09:14:03.725Z · LW · GW

Was shocked that there actually is a "Subscribe to debate" option in the triple-dot menu. How far does the rabbit hole go?

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Hooray for stepping out of the limelight · 2023-04-01T03:18:54.981Z · LW · GW

Pros:

deepmind/tracr (github.com) is highly safety-relevant (Neel Nanda even contributed to making it python 3.8 compatible).

Cons:

They're working with Google Brain on Gemini (GPT-4 competitor), Demis Hassabis said they're scaling up Gato (DeepMind) - Wikipedia, [2209.07550] Human-level Atari 200x faster (arxiv.org) was on my top 15 capabilities papers from last year, DeepMind Adaptive Agent: Results Reel - YouTube is a big deal, AlphaFold 2's successor could lead to bioweapons, AlphaCode's successor could lead to recursive self-improvement in the limit. 

Neutral:

They rejected my application without an interview, so they still have standards :D

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on The Changing Face of Twitter · 2023-03-30T00:11:24.061Z · LW · GW

Graph algorithms are notoriously difficult to scale. It is very much a problem on the bleeding edge of technology.

Edit: Also, Zvi is underestimating how smart he is relative to the general population. I would predict with high confidence that he could replace me at my software engineering job with less than two weeks of training.

Comment by Sheikh Abdur Raheem Ali (sheikh-abdur-raheem-ali) on Sheikh Abdur Raheem Ali's Shortform · 2023-03-04T07:21:17.299Z · LW · GW

The main thing we care about is consistency and honesty. To maximize that, we need to retrieve information from the web (though this has risks), https://openai.com/research/webgpt#fn-4, select the best of multiple summary candidates https://arxiv.org/pdf/2208.14271.pdf, generate critiques https://arxiv.org/abs/2206.05802, run automated tests https://arxiv.org/abs/2207.10397, validate logic https://arxiv.org/abs/2212.03827, follow rules https://www.pnas.org/doi/10.1073/pnas.2106028118, use interpretable abstractions https://arxiv.org/abs/2110.01839, avoid taking shortcuts https://arxiv.org/pdf/2210.10749.pdf, and apply decoding constraints https://arxiv.org/pdf/2209.07800.pdf.