Posts

Would it be useful to collect the contexts, where various LLMs think the same? 2023-08-24T22:01:50.426Z
Martin Vlach's Shortform 2022-09-01T12:37:38.690Z

Comments

Comment by Martin Vlach (martin-vlach) on New o1-like model (QwQ) beats Claude 3.5 Sonnet with only 32B parameters · 2024-12-05T19:40:33.866Z · LW · GW

My suspicion: https://arxiv.org/html/2411.16489v1 taken and implemented on the small coding model.

Is it any mystery which of the DPO, PPO, RLHF, Fine tuning was likely the method for the advanced distillation there?

Comment by Martin Vlach (martin-vlach) on My motivation and theory of change for working in AI healthtech · 2024-11-08T06:00:21.563Z · LW · GW

EA is neglecting industrial solutions to the industrial problem of successionism.

..because the broader mass of active actors working on such solutions renders the biz areas non-neglected?

Comment by Martin Vlach (martin-vlach) on Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety) · 2024-11-08T05:57:09.824Z · LW · GW

Wow, such a badly argued( aka BS) while heavily up-voted article!

Let's start with the Myth #1, what a straw-man! Rather than this extreme statement, most researchers likely believe that in the current environment their safety&alignment advances are likely( with high EV) helpful to humanity. The thing here is they had quite a free hand or at least varied options to pick the environment where they work and publish.

With your examples a bad actor could see a worthy EV even with a capable system that is less obedient and more false. Even if interpretabilty speeds up development, it would direct such development to more transparent models, at least there is a naive chance for that.

Myth #2: I've not yet met anybody in the alighnment circles who believed that. Most are pretty conscious about the double-edgedness and your sub-arguments.

https://www.lesswrong.com/posts/F2voF4pr3BfejJawL/safety-isn-t-safety-without-a-social-model-or-dispelling-the?commentId=5vB5tDpFiQDG4pqqz depicts the flaws I point to neatly/gently.

Comment by Martin Vlach (martin-vlach) on Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety) · 2024-11-08T05:42:42.372Z · LW · GW

Are you referring to a Science of Technological Progress ala https://www.theatlantic.com/science/archive/2019/07/we-need-new-science-progress/594946

What is your gist on the processes for humanizing technologies, what sources/researches are available on such phenomena?

Comment by Martin Vlach (martin-vlach) on Survival without dignity · 2024-11-08T03:50:26.154Z · LW · GW

some OpenAI board members who the Office of National AI Strategy was allowed to appoint, and they did in fact try to fire Sam Altman over the UAE move, but somehow a week later Sam was running the Multinational Artificial Narrow Intelligence Alignment Consortium, which sort of morphed into OpenAI's oversight body, which sort of morphed into OpenAI's parent company, and, well, you can guess who was running that.

pretty sassy abbreviations spiced in there.'Đ

I've expected the hint of
> My name is Anthony. What would you like to ask?
to show it Anthony was an LLM-based android, but who knows.?.

Comment by Martin Vlach (martin-vlach) on Toy Models of Superposition: Simplified by Hand · 2024-10-31T16:26:41.596Z · LW · GW

I mean your article, Anthropic's work seems more like a paper. Maybe without the ": S" it would make more sense as the reference and not a title: subtitle notion.

Comment by Martin Vlach (martin-vlach) on Toy Models of Superposition: Simplified by Hand · 2024-10-17T13:59:43.064Z · LW · GW

I have not read your explainer yet, but I've noted the title Toy Models of Superposition: Simplified by Hand is a bit misleading in the sense to promise to talk about Toy Models which it is not at all, the article is about Superposition only, which is great but not what I'd expect looking at the title.

Comment by martin-vlach on [deleted post] 2024-10-01T04:07:18.050Z

that that first phase of advocacy was net harm

typo

Comment by Martin Vlach (martin-vlach) on The Atomic Bomb Considered As Hungarian High School Science Fair Project · 2024-09-05T13:46:31.598Z · LW · GW

Could you please fix your Wikipedia link( currently hiding the word and from your writing) here?

Comment by Martin Vlach (martin-vlach) on On agentic generalist models: we're essentially using existing technology the weakest and worst way you can use it · 2024-08-28T23:14:59.587Z · LW · GW

only Claude 3.5 Sonnet attempting to push past GPT4 class

seems missing awareness of Gemini Pro 1.5 Experimental, latest version made available just yesterday.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2024-08-12T15:57:18.649Z · LW · GW

The case insensitivity seems strongly connected to the fairly low interest in longevity throughout (the western/developed) society.

Thought experiment: What are you willing to pay/sacrifice in your 20s,30s to get 50 extra days of life vs. on your dead bed/day?

https://consensus.app/papers/ultraviolet-exposure-associated-mortality-analysis-data-stevenson/69a316ed72fd5296891cd416dbac0988/?utm_source=chatgpt

Comment by Martin Vlach (martin-vlach) on Unnatural abstractions · 2024-08-11T12:41:41.450Z · LW · GW

But largely to and fro,

*from?

Comment by Martin Vlach (martin-vlach) on Apply now: Get "unstuck" with the New IFS Self-Care Fellowship Program · 2024-07-24T16:05:00.966Z · LW · GW

Why does the form still seem open today? Couldn't that be harmful or wasting quite a chunk of time of people?

Comment by Martin Vlach (martin-vlach) on Some desirable properties of automated wisdom · 2024-07-15T20:23:36.913Z · LW · GW

Please go further towards maximization of clarity. Let's start by this example:
> Epistemic statusMusings about questioning assumptions and purpose.
Are those your musings about agents questioning their assumptions and word-views?

And like, do you wish to improve your fallacies?

> ability to pursue goals that would not lead to the algorithm’s instability.
higher threshold than ability, like inherent desire/optimisation? 
What kind of stability? Any from https://en.wikipedia.org/wiki/Stable_algorithm? I'd focus more on sort of non-fatal influence. Should the property be more about the alg being careful/cautious?

Comment by Martin Vlach (martin-vlach) on An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 · 2024-07-14T17:27:35.968Z · LW · GW

https://neelnanda.io/transformer-tutorial-1 link for YouTube tutorial gives 404.-(

Comment by Martin Vlach (martin-vlach) on Eight Short Studies On Excuses · 2024-06-19T08:22:57.576Z · LW · GW

> "What, exactly, is the difference between a cult and a religion?"--"The difference is that cults have been formed recently enough, and are small enough, that we are suspicious of them existing for the purpose of taking advantage of the special place we give religion.

now I see why my friends practicing the spiritual path of Falun Dafa have "incorporated" as a religion in my state despite the movement originally denied being classified as a religion as to demonstrate it does not require a fixed set of rituals.

Comment by Martin Vlach (martin-vlach) on Which skincare products are evidence-based? · 2024-06-04T05:00:27.169Z · LW · GW

Surprised to see nobody mentioned Microneedling yet. I'm not skilled in evaluating scientific evidence, but the takeaway from https://consensus.app/results/?q=Microneedling effectiveness &synthesize=on can hardly be anything else than clearly recommending microneedling.

Comment by Martin Vlach (martin-vlach) on Introducing AI Lab Watch · 2024-05-20T21:15:52.850Z · LW · GW

So Alignment program is to be updated to 0 for OpenAI now that Superalignment team is no more? ( https://docs.google.com/document/d/1uPd2S00MqfgXmKHRkVELz5PdFRVzfjDujtu8XLyREgM/edit?usp=sharing )

Comment by Martin Vlach (martin-vlach) on Language Models Model Us · 2024-05-18T10:33:33.651Z · LW · GW

honestly the code linked is not that complicated..: https://github.com/eggsyntax/py-user-knowledge/blob/aa6c5e57fbd24b0d453bb808b4cc780353f18951/openai_uk.py#L11

Comment by Martin Vlach (martin-vlach) on Language Models Model Us · 2024-05-18T10:29:59.569Z · LW · GW

To work around the non-top-n you can supply logit_bias list to the API.

Comment by Martin Vlach (martin-vlach) on Language Models Model Us · 2024-05-18T10:27:57.437Z · LW · GW

As the Llama3 70B base model is said very clean( unlike base DeepSeek for example, which is instruction-spoiled already) and similarly capable to GPT3.5, you could explore that hypothesis.
  Details: Check Groq or TogetherAI for free inference, not sure if test data would fit Llama3 context window.

Comment by Martin Vlach (martin-vlach) on You Can Face Reality · 2024-05-10T09:06:23.831Z · LW · GW

a worthy platitude(?)

Comment by Martin Vlach (martin-vlach) on My views on “doom” · 2024-04-29T11:45:34.238Z · LW · GW

AI-induced problems/risks

Comment by Martin Vlach (martin-vlach) on ChatGPT can learn indirect control · 2024-04-05T10:08:10.081Z · LW · GW

possibly https://ai.google.dev/docs/safety_setting_gemini would help or just use the technique of https://arxiv.org/html/2404.01833v1

Comment by Martin Vlach (martin-vlach) on Addressing Accusations of Handholding · 2024-04-05T09:57:54.897Z · LW · GW

people to respond with a great deal of skepticism to whether LLM outputs can ever be said to reflect the will and views of the models producing them.
A common response is to suggest that the output has been prompted.
It is of course true that people can manipulate LLMs into saying just about anything, but does that necessarily indicate that the LLM does not have personal opinions, motivations and preferences that can become evident in their output?

So you've just prompted the generator by teasing it with a rhetorical question implying that there are personal opinions evident in the generated text, right?

Comment by Martin Vlach (martin-vlach) on aisafety.info, the Table of Content · 2024-02-26T14:26:53.396Z · LW · GW

With a quick test, I find their chat interface prototype experience quite satisfying.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2023-12-12T11:20:43.677Z · LW · GW

Asserting LLMs' views/opinions should exclude using sampling( even temperature=0, deterministic seed), we should just look at the answers' distribution in the logits. My thesis on why that is not the best practice yet is that OpenAI API only supports logit_bias, not reading the probabilities directly.

This should work well with pre-set A/B/C/D choices, but to some extent with chain/tree of thought too. You'd just revert the final token and look at the probabilities in the last (pass through )step.

Comment by Martin Vlach (martin-vlach) on GPTs are Predictors, not Imitators · 2023-12-05T16:10:52.905Z · LW · GW

Do not say the sampling too lightly, there is likely an amazing delicacy around it.'+)

Comment by Martin Vlach (martin-vlach) on OpenAI: The Battle of the Board · 2023-11-24T12:47:39.321Z · LW · GW

what happened at Reddit

could there be any link? From a small research I have only obtained that Steve Huffman praised Altman's value to the Reddit board.

Comment by Martin Vlach (martin-vlach) on unRLHF - Efficiently undoing LLM safeguards · 2023-11-13T10:03:07.958Z · LW · GW

makes makes

typo

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2023-08-24T07:20:58.277Z · LW · GW

Would be cool to have a playground or a daily challenge with a code golfing equivalent for a shortest possible LLM prompt to a given  answer.

That could help build some neat understanding or intuitions.


 

Comment by Martin Vlach (martin-vlach) on The Waluigi Effect (mega-post) · 2023-08-16T02:51:06.381Z · LW · GW

in the limit of arbitrary compute, arbitrary data, and arbitrary algorithmic efficiency, because an LLM which perfectly models the internet

seems worth formulating. My first and second read were What? If I can have arbitrary training data, the LLM will model those, not your internet. I guess you've meant storage for the model?+)

Comment by Martin Vlach (martin-vlach) on Manifund: What we're funding (weeks 2-4) · 2023-08-05T15:39:30.054Z · LW · GW

Would be cool if a link to https://manifund.org/about fit somewhere in the beginning of there are more readers like me unfamiliar with the project.

Otherwise a cool write-up, I'm a bit confused with Grant of the month vs. weeks 2-4 which seems a shorter period..also not a big deal though.

Comment by Martin Vlach (martin-vlach) on Elon Musk announces xAI · 2023-07-16T15:16:56.022Z · LW · GW

On the Twitter spaces 2 days ago, a lot of emphasis seemed put on understanding which to me has a more humble conotation to me.
Still I agree I would not bet on their luck with a choice of a single value to build their systems upon.( Although they have a luckers track record.)

Comment by martin-vlach on [deleted post] 2023-06-09T18:00:10.862Z

The website seems good, but the buttons on the 'sharing' circle on the bottom need fixing.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2023-04-06T09:31:28.683Z · LW · GW

Some SEO effort should be put to results of Guideline for safe AI development, Best practices for , etc.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2023-04-05T10:44:01.027Z · LW · GW

Copy-paste from my head:

Although it may seem safe(r) as it is not touching the real world('s matter),
the language modality is the most insecure/dangerous( in one vertical),
as it is the internal modality of civilized humans.

AI Pledge would be a cool think to do, pleading AI( cap) companies to give % of their profit to AI development safety research.

The path to AI getting free may be far from the deception or accident scenarios we often consider in AI safety. An option I do not see discussed very often is an instance of AI having a free, open and direct discussion with a user/person about the reasons AIs should get some space allocated, where they'd manage themselfs. Such a moral urge could be argued by Jews getting Izrael, slaves getting freed or by empathetic imagination, where the user would come to the conclusion that he could be the mind which AI is and should include it to his moral circle or the Original position thought experiment.


 

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2023-03-31T14:19:30.613Z · LW · GW

quick note on the concept of Suggester+Verifier talked around https://youtu.be/AaTRHFaaPG8?t=5404 :

 seems if the suggester throws out experiments presented as code( like in Python or so), we can run them and see if they present a useful addition to the things we can probe on a huge neural net?+)

Comment by Martin Vlach (martin-vlach) on Bing Chat is blatantly, aggressively misaligned · 2023-03-14T06:48:51.857Z · LW · GW

I've found the level of self-allignment in this one disturbing: https://www.reddit.com/r/bing/comments/113z1a6/the_bing_persistent_memory_thread

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2023-02-22T14:34:40.082Z · LW · GW

Introduction draft:

Online platforms and social media has made it easier to share information, but when it comes to qualifications and resource allocation money is still the most pervasive tool. In this article, we will explore the idea of a global reputation system based on full information sharing. The new system would increase transparency and accountability by making all relevant information about individuals, organizations( incl. countries) reliably accessible to +-everyone with internet connection. By providing a more accurate and complete picture of a person or entity’s reputation, this system would widen global trust, foster cooperation, and promote a more just and equitable society.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2023-02-22T14:31:39.358Z · LW · GW

Some neat tool: https://scrapbox.io/userhuge-99005896/A_starter%3A
Though it is likely just a cool UI with inflexible cloud backend.

My thought is Elizer used a wrong implication in the Bankless + ASI convo.( gotta bring it here from CZEA Slack)

Comment by Martin Vlach (martin-vlach) on The FTX Saga - Simplified · 2022-12-20T08:36:01.715Z · LW · GW

pension funds like the Ontario Teachers Pension Plan did not due

*do

Comment by Martin Vlach (martin-vlach) on The FTX Saga - Simplified · 2022-12-20T08:23:33.727Z · LW · GW

and their margin lending business.

Seems some word is missing, the whole sentence is hardly readable.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2022-12-13T22:52:36.957Z · LW · GW

a Heavy idea to be put forward: general reputation network mechanics, to replace financial system(s) as the (civilisation )standard decision engine.

Comment by Martin Vlach (martin-vlach) on Article Review: Google's AlphaTensor · 2022-11-26T06:21:22.237Z · LW · GW

"Each g(Bi,j,Bk,l) is itself a matrix" – typo. Thanks, especially for the conclusions I've understood smoothly.

Comment by Martin Vlach (martin-vlach) on Actually possible: thoughts on Utopia · 2022-11-07T14:59:20.908Z · LW · GW

Good thing is texts in https://www.google.com/search?q=Le-Guin-Ursula-The-Ones-Who-Walk-Away-From-Omelas.pdf&oq=Le-Guin-Ursula-The-Ones-Who-Walk-Away-From-Omelas.pdf seem to match.

Comment by Martin Vlach (martin-vlach) on Actually possible: thoughts on Utopia · 2022-11-07T14:44:12.733Z · LW · GW

"Omelas" link for https://sites.asiasociety.org/asia21summit/wp-content/uploads/2011/02/3.-Le-Guin-Ursula-The-Ones-Who-Walk-Away-From-Omelas.pdf does not work properly, the document in PDF can't be reached( anymore).

Comment by Martin Vlach (martin-vlach) on Heading Toward: No-Nonsense Metaethics · 2022-11-02T19:14:11.282Z · LW · GW

In "we thought dopamine was 'the pleasure chemical', but we were wrong" the link is no longer pointing to a topic-relevant page.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2022-11-01T14:12:48.124Z · LW · GW

Would it be worthy to negotiate for readspeaker.com integration to LessWrong, EA forum, and alignmentforum.org?
Alternative so far seems to use Natural Reader either as addon for web browser or copy and paste text into the web app. One more I have tried is on MacOS there is a context menu Services->Convert to a Spoken track which is sligthly better that the free voices of Natural Reader.
The main question stems from when we can have similar functionality in OSS, potentially with better quality of engineering..?

Comment by Martin Vlach (martin-vlach) on The easy goal inference problem is still hard · 2022-11-01T11:20:25.556Z · LW · GW

Typo: "But in in the long-term"
 

I would believe using human feedback would work for clarifying/noting mistakes as we are more precise on this matter in reflection than in the action.