Posts

Would it be useful to collect the contexts, where various LLMs think the same? 2023-08-24T22:01:50.426Z
Martin Vlach's Shortform 2022-09-01T12:37:38.690Z

Comments

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2025-04-10T09:13:45.843Z · LW · GW

Snapshot of a local(=Czech) discussion detailing motivations and decision paths of GAI actors, mainly the big developers:

Contributor A, initial points:

For those not closely following AI progress, two key observations:

  1. Public Models vs. True Capability: Publicly accessible AI models will become increasingly poor indicators of the actual state-of-the-art in AI. Competitive AI labs will likely prioritize using their most advanced models internally to accelerate their own research and gain a dominant position, rather than releasing these top models for potentially temporary revenue gains.
  2. Recursive Self-Improvement Timeline: The onset of recursive self-improvement (leading to an "intelligence explosion," where AI significantly accelerates its own research and development) is projected by some authors to potentially begin around the end of 2025.

Analogy to Exponential Growth: The COVID-19 pandemic demonstrated how poorly humans perceive and react to exponential phenomena (e.g., ignoring low initial numbers despite a high reproduction rate). AI development is also progressing exponentially. This means it might appear that little is happening from a human perspective, until a period of rapid change occurs over just a few months, potentially causing socio-technical shifts equivalent to a century of normal development. This scenario underpins the discussion.

Contributor C:

  • Raises a question regarding point 1: Since AI algorithm and hardware development are relatively narrow domains, couldn't their progress occur somewhat in parallel with the commercial release of more generally focused models?

Contributor A:

  • Predicts this is unlikely. Assumes computational power ("compute") will remain the primary bottleneck.
  • Believes that with sufficient investment, the incentive will be to dedicate most inference compute to AI-driven AI research (or synthetic data, etc.) once recursive self-improvement starts. Notes this might already be happening, with the deployment of the strongest models possibly delayed or only released experimentally.
  • Acknowledges hardware development and token cost reduction will continue rapidly, but chip production might lag. Considers this an estimate based on discussions. Asks Contributor C if they would bet on advanced models being released soon.

Contributor C:

  • Agrees that recursive AI improvements are occurring to some degree.
  • Finds Contributor B's initial statement about the incentive structure less clear-cut, suggesting it lacks strong empirical or theoretical backing.
  • Clarifies their point regarding models: They believe different models will be released publicly compared to those used internally for cutting-edge research.

Contributor A, clarifying reasoning and premises:

  • Confirms understanding of C's view: Labs would run advanced AI research models internally while simultaneously releasing and training other generations of general models publicly.
  • Explains their reasoning regarding the incentive to dedicate inference compute to AI research is based on a theoretical argument founded on the following premises:
    1. the lab has limited compute
    2. the lab has sufficient funds
    3. the lab wants to maximize long-term profit
    4. AI development is exponential and its pace depends on the amount of compute dedicated to AI development
    5. winner takes all
  • Concludes from these premises that the optimal strategy is to devote as much compute to AI development as affordable. If premise 2 (sufficient funds) holds, labs don't need to prioritize current revenue streams from deployed models as heavily.

Contributor C, response to A's premises:

  • Agrees this perspective (parallel development of internal research models and public general models) makes the most sense, as larger firms try not to bet on a single (risky) path (mentions Sutskever's venture as a possible exception).
  • Identifies a problem or potential pitfall specifically with premise 4. Argues the dependency is much more complex or less direct, certainly not a smooth exponential curve. (Lacks capacity to elaborate further).
  • Adds nuance regarding premise 2: Continuous revenue from public models could increase the "sufficient funds," making parallel tracks logical. Considers Contributor B's premise reasonable otherwise.
  • Notes that any optimal strategy must also include budgets for defense or Operational Security (OpSec).
  • Offers a weak hypothesis: Publishing might improve understanding of intelligence or research directions, but places limited confidence in this.
     
Comment by Martin Vlach (martin-vlach) on Would this solve the (outer) alignment problem, or at least help? · 2025-04-07T09:19:05.448Z · LW · GW

hopefully you will learn 

seems missing part 2.

??

Comment by Martin Vlach (martin-vlach) on Topological Data Analysis and Mechanistic Interpretability · 2025-03-11T17:43:51.004Z · LW · GW

Yeah, I've met the concept during my studies and was rather teasing for getting a great popular, easy to grasp, explanation which would also fit the definition.

It's not easy to find a fitting visual analogy TBH, which I'd find generally useful as I hold the concept to enhance general thinking.

Comment by Martin Vlach (martin-vlach) on Topological Data Analysis and Mechanistic Interpretability · 2025-03-07T07:17:46.649Z · LW · GW

No matter how I stretch or compress the digit 0, I can never achieve the two loops that are present in the digit 8.

0 when it's deformed by left and right pressure so that the sides meet seems to contradict?

Comment by Martin Vlach (martin-vlach) on Distillation of Meta's Large Concept Models Paper · 2025-03-07T05:44:03.734Z · LW · GW

Comparing to Gemma1, classic BigTech😅

 

And I seem to miss info on the effective context length..?

Comment by Martin Vlach (martin-vlach) on Distillation of Meta's Large Concept Models Paper · 2025-03-07T05:35:05.877Z · LW · GW

read spent the time to read

typo?

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2025-02-28T10:04:53.961Z · LW · GW

AI development risks are existential(/crucial/critical).—Does this statement quality for Extraordinary claims require extraordinary evidence?

Counterargument stands on the sampling of analogous (breakthrough )intentions, some people call those *priors* here. Which inventions do we allow in here would strongly decide if the initial claim is extraordinary or just plain and reasonable, well fit in the dangerously powerful inventions*. 

My set of analogies: nuclear energy extraction; fire; shooting; speech/writing;;

Other set: Nuclear power, bio-engineering/weapons - as those are the only two endangering whole civilised biome significantly.

Set of *all* inventions: Renders the claim extraordinary/weird/out of scope.

Comment by Martin Vlach (martin-vlach) on Kei's Shortform · 2025-02-27T15:59:02.292Z · LW · GW

Does it really work on RULER( benchmark from Nvidia)?
Not sure where but saw some controversies, https://arxiv.org/html/2410.18745v1#S1 is best I did find now...

Edit: Aah, this was what I had on mind: https://www.reddit.com/r/LocalLLaMA/comments/1io3hn2/nolima_longcontext_evaluation_beyond_literal/ 

Comment by Martin Vlach (martin-vlach) on OpenAI’s NSFW policy: user safety, harm reduction, and AI consent · 2025-02-14T21:05:00.412Z · LW · GW

I'd vote to remove the AI capabilities here, although I've not read the article yet, just roughly grasped the topic.

It's likely not about expanding the currently existing capabilities or something like that.

Comment by Martin Vlach (martin-vlach) on Two interviews with the founder of DeepSeek · 2025-02-11T11:33:42.252Z · LW · GW

Oh, I did not know, thanks.
https://huggingface.co/spaces/deepseek-ai/Janus-Pro-7B seems to show DS is still merely clueless in the visual domain, at least IMO they are loosing there to Qwen and many others.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2025-02-07T12:13:26.503Z · LW · GW

draft:
Can we theoretically quantify the representational capacity of a Transformer (or other neural network architecture) in terms of the "number of functions" it can ingest&embody?

  • We're interested in the space of functions a Transformer can represent.
  • Finite Input/Output Spaces: In practice, LLMs operate on finite-length sequences of tokens from a finite vocabulary. So, we're dealing with functions that map from a finite (though astronomically large) input space to a finite output space.

Counting Functions (Upper Bound)

  • The Astronomical Number: Let's say our input space has size I and our output space has size O. The total number of possible functions from I to O is O^I. This would be an absolute upper bound on the number of functions any model could possibly represent.

The Role of Parameters and Architecture (Constraining the Space)

  • Not All Functions are Reachable: The crucial point is that a Transformer with a finite number of parameters cannot represent all of those O<sup>I</sup> functions. The architecture (number of layers, attention heads, hidden units, etc.) and the parameter values define a specific function within that vast space.
  • Parameter Count as a Proxy: The number of parameters in a Transformer provides a rough measure of its representational capacity. More parameters generally allow the model to represent more complex functions. This is not a linear relationship. There's significant redundancy. The effective number of degrees of freedom is likely much lower than the raw parameter count due to correlations and dependencies between parameters.
  • Architectural Constraints: The Transformer architecture itself imposes constraints. For example, the self-attention mechanism biases the model towards capturing relationships between tokens within a certain context window. This limits the types of functions it easily represents.

VC Dimension and Rademacher Complexity - existing tools/findings

  • VC Dimension (for Classification): In the context of classification problems, the Vapnik-Chervonenkis (VC) dimension is a measure of a model's capacity. It's the size of the largest set of points that the model can "shatter" (classify in all possible ways). While theoretically important, calculating the VC dimension for large neural networks is extremely difficult. It gives a sense of the complexity of the decision boundaries the model can create.
  • Rademacher Complexity: This is a more general measure of the complexity of a function class, applicable to both classification and regression. It measures how well the model class can fit random noise. Lower Rademacher complexity generally indicates better generalization ability (the model is less likely to overfit). Again, calculating this for large Transformers is computationally challenging.
  • These measures are about function classes, not individual functions: VC dimension and Rademacher complexity characterize the entire space of functions that a model architecture could represent, given different parameter settings. They don't tell you exactly which functions are represented, but they give you a sense of the "richness" of that space.
    This seems to be the measure: Let's pick a set of practical functions and see how many of those the LM can hold( have fairly approximated) in a given # of parameters(&arch&precission).
     
  • The Transformer as a "Compressed Program": We can think of the trained Transformer as a highly compressed representation of a complex function. It's not the shortest possible program (in the Kolmogorov sense), but it's a practical approximation.
  • Limits of Compression: The theory of Kolmogorov complexity suggests that there are functions that are inherently incompressible. There's no short program to describe them; you essentially have to "list out" their behavior. This implies that there might be functions that are fundamentally beyond the reach of any reasonably sized Transformer.
  • Relating Parameters to Program Length? There's no direct, proven relationship between the number of parameters in a Transformer and the Kolmogorov complexity of the functions it can represent. We can hypothesize:
    • More parameters allow for (potentially) representing functions with higher Kolmogorov complexity. But it's not a guarantee.
    • There's likely a point of diminishing returns. Adding more parameters won't indefinitely increase the complexity of the representable functions, due to the architectural constraints and the inherent incompressibility of some functions.

6. Practical Implications and Open Questions

  • Empirical Scaling Laws: Research on scaling laws (ala Chinchilla paper) provides empirical evidence about the relationship between model size, data, and performance. These laws help guide the design of larger models, but they don't provide a fundamental theoretical limit.
  • Understanding the "Effective" Capacity: A major open research question is how to better characterize the effective representational capacity of Transformers, taking into account both the parameters and the architectural constraints. This might involve developing new theoretical tools or refined versions of VC dimension and Rademacher complexity.

 

Would be fun to even have a practical study where we'd fine-tune fns into various sized models and see if/where a limit is getting/being hit.

Comment by Martin Vlach (martin-vlach) on Steering Gemini with BiDPO · 2025-01-31T14:53:25.712Z · LW · GW

link to https://www.alignmentforum.org/users/ryan_greenblatt seems malformed, - instead of _, that is.

Comment by Martin Vlach (martin-vlach) on A High Level Closed-Door Session Discussing DeepSeek: Vision Trumps Technology · 2025-01-31T07:06:11.417Z · LW · GW

Locations:

High-Flyer Quant (幻方量化)

Headquarters: Hangzhou, Zhejiang, China 

High-Flyer Quant was founded in Hangzhou and maintains its headquarters there.   

Hangzhou is a major hub for technology and finance in China, making it a strategic location for a quant fund leveraging AI.   

Additional Offices: Hong Kong, China

DeepSeek (深度求索)

Headquarters: Hangzhou, Zhejiang, China 

DeepSeek, spun off from High-Flyer Quant in 2023, is headquartered in Hangzhou.   

Additional Offices: Beijing, China

Comment by Martin Vlach (martin-vlach) on Two interviews with the founder of DeepSeek · 2025-01-30T20:20:24.994Z · LW · GW

This seems to state the opposite: https://www.lesswrong.com/posts/JTKaR5q59BgDp6rH8/a-high-level-closed-door-session-discussing-deepseek-vision#:~:text=we%20hardly%20see%20the%20benefit%20of%20multimodal%20data.%20In%20other%20words%2C%20the%20cost%20is%20too%20high.%20Today%20there%20is%20no%20evidence%20it%20is%20useful.%20In%20the%20future%2C%20opportunities%20may%20be%20bigger.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2025-01-30T10:21:00.657Z · LW · GW

Exploring the levels of sentience and moral obligations towards AI systems is such a nerd snipe and vortex for mental proceeding!

We did one of the largest-scale reductive thinking when we ascribed moral concern to people+property( of any/each of the people). That brought a load of problems associated with this simplistic ignorance and on of those are xRisks of high-tech property/production.

Comment by Martin Vlach (martin-vlach) on The Rising Sea · 2025-01-27T11:31:56.410Z · LW · GW

> Mathematics cannot be divorced from contemplation of its own structure.

..that would proof the labelers of pure maths as "mental masturbation" terribly wrong...

Comment by Martin Vlach (martin-vlach) on o3, Oh My · 2025-01-03T11:49:04.406Z · LW · GW
Comment by Martin Vlach (martin-vlach) on New o1-like model (QwQ) beats Claude 3.5 Sonnet with only 32B parameters · 2024-12-05T19:40:33.866Z · LW · GW

My suspicion: https://arxiv.org/html/2411.16489v1 taken and implemented on the small coding model.

Is it any mystery which of the DPO, PPO, RLHF, Fine tuning was likely the method for the advanced distillation there?

Comment by Martin Vlach (martin-vlach) on My motivation and theory of change for working in AI healthtech · 2024-11-08T06:00:21.563Z · LW · GW

EA is neglecting industrial solutions to the industrial problem of successionism.

..because the broader mass of active actors working on such solutions renders the biz areas non-neglected?

Comment by Martin Vlach (martin-vlach) on Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety) · 2024-11-08T05:57:09.824Z · LW · GW

Wow, such a badly argued( aka BS) while heavily up-voted article!

Let's start with the Myth #1, what a straw-man! Rather than this extreme statement, most researchers likely believe that in the current environment their safety&alignment advances are likely( with high EV) helpful to humanity. The thing here is they had quite a free hand or at least varied options to pick the environment where they work and publish.

With your examples a bad actor could see a worthy EV even with a capable system that is less obedient and more false. Even if interpretabilty speeds up development, it would direct such development to more transparent models, at least there is a naive chance for that.

Myth #2: I've not yet met anybody in the alighnment circles who believed that. Most are pretty conscious about the double-edgedness and your sub-arguments.

https://www.lesswrong.com/posts/F2voF4pr3BfejJawL/safety-isn-t-safety-without-a-social-model-or-dispelling-the?commentId=5vB5tDpFiQDG4pqqz depicts the flaws I point to neatly/gently.

Comment by Martin Vlach (martin-vlach) on Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety) · 2024-11-08T05:42:42.372Z · LW · GW

Are you referring to a Science of Technological Progress ala https://www.theatlantic.com/science/archive/2019/07/we-need-new-science-progress/594946

What is your gist on the processes for humanizing technologies, what sources/researches are available on such phenomena?

Comment by Martin Vlach (martin-vlach) on Survival without dignity · 2024-11-08T03:50:26.154Z · LW · GW

some OpenAI board members who the Office of National AI Strategy was allowed to appoint, and they did in fact try to fire Sam Altman over the UAE move, but somehow a week later Sam was running the Multinational Artificial Narrow Intelligence Alignment Consortium, which sort of morphed into OpenAI's oversight body, which sort of morphed into OpenAI's parent company, and, well, you can guess who was running that.

pretty sassy abbreviations spiced in there.'Đ

I've expected the hint of
> My name is Anthony. What would you like to ask?
to show it Anthony was an LLM-based android, but who knows.?.

Comment by Martin Vlach (martin-vlach) on Toy Models of Superposition: Simplified by Hand · 2024-10-31T16:26:41.596Z · LW · GW

I mean your article, Anthropic's work seems more like a paper. Maybe without the ": S" it would make more sense as the reference and not a title: subtitle notion.

Comment by Martin Vlach (martin-vlach) on Toy Models of Superposition: Simplified by Hand · 2024-10-17T13:59:43.064Z · LW · GW

I have not read your explainer yet, but I've noted the title Toy Models of Superposition: Simplified by Hand is a bit misleading in the sense to promise to talk about Toy Models which it is not at all, the article is about Superposition only, which is great but not what I'd expect looking at the title.

Comment by martin-vlach on [deleted post] 2024-10-01T04:07:18.050Z

that that first phase of advocacy was net harm

typo

Comment by Martin Vlach (martin-vlach) on The Atomic Bomb Considered As Hungarian High School Science Fair Project · 2024-09-05T13:46:31.598Z · LW · GW

Could you please fix your Wikipedia link( currently hiding the word and from your writing) here?

Comment by Martin Vlach (martin-vlach) on On agentic generalist models: we're essentially using existing technology the weakest and worst way you can use it · 2024-08-28T23:14:59.587Z · LW · GW

only Claude 3.5 Sonnet attempting to push past GPT4 class

seems missing awareness of Gemini Pro 1.5 Experimental, latest version made available just yesterday.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2024-08-12T15:57:18.649Z · LW · GW

The case insensitivity seems strongly connected to the fairly low interest in longevity throughout (the western/developed) society.

Thought experiment: What are you willing to pay/sacrifice in your 20s,30s to get 50 extra days of life vs. on your dead bed/day?

https://consensus.app/papers/ultraviolet-exposure-associated-mortality-analysis-data-stevenson/69a316ed72fd5296891cd416dbac0988/?utm_source=chatgpt

Comment by Martin Vlach (martin-vlach) on Unnatural abstractions · 2024-08-11T12:41:41.450Z · LW · GW

But largely to and fro,

*from?

Comment by Martin Vlach (martin-vlach) on Apply now: Get "unstuck" with the New IFS Self-Care Fellowship Program · 2024-07-24T16:05:00.966Z · LW · GW

Why does the form still seem open today? Couldn't that be harmful or wasting quite a chunk of time of people?

Comment by Martin Vlach (martin-vlach) on Some desirable properties of automated wisdom · 2024-07-15T20:23:36.913Z · LW · GW

Please go further towards maximization of clarity. Let's start by this example:
> Epistemic statusMusings about questioning assumptions and purpose.
Are those your musings about agents questioning their assumptions and word-views?

And like, do you wish to improve your fallacies?

> ability to pursue goals that would not lead to the algorithm’s instability.
higher threshold than ability, like inherent desire/optimisation? 
What kind of stability? Any from https://en.wikipedia.org/wiki/Stable_algorithm? I'd focus more on sort of non-fatal influence. Should the property be more about the alg being careful/cautious?

Comment by Martin Vlach (martin-vlach) on An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 · 2024-07-14T17:27:35.968Z · LW · GW

https://neelnanda.io/transformer-tutorial-1 link for YouTube tutorial gives 404.-(

Comment by Martin Vlach (martin-vlach) on Eight Short Studies On Excuses · 2024-06-19T08:22:57.576Z · LW · GW

> "What, exactly, is the difference between a cult and a religion?"--"The difference is that cults have been formed recently enough, and are small enough, that we are suspicious of them existing for the purpose of taking advantage of the special place we give religion.

now I see why my friends practicing the spiritual path of Falun Dafa have "incorporated" as a religion in my state despite the movement originally denied being classified as a religion as to demonstrate it does not require a fixed set of rituals.

Comment by Martin Vlach (martin-vlach) on Which skincare products are evidence-based? · 2024-06-04T05:00:27.169Z · LW · GW

Surprised to see nobody mentioned Microneedling yet. I'm not skilled in evaluating scientific evidence, but the takeaway from https://consensus.app/results/?q=Microneedling effectiveness &synthesize=on can hardly be anything else than clearly recommending microneedling.

Comment by Martin Vlach (martin-vlach) on Introducing AI Lab Watch · 2024-05-20T21:15:52.850Z · LW · GW

So Alignment program is to be updated to 0 for OpenAI now that Superalignment team is no more? ( https://docs.google.com/document/d/1uPd2S00MqfgXmKHRkVELz5PdFRVzfjDujtu8XLyREgM/edit?usp=sharing )

Comment by Martin Vlach (martin-vlach) on Language Models Model Us · 2024-05-18T10:33:33.651Z · LW · GW

honestly the code linked is not that complicated..: https://github.com/eggsyntax/py-user-knowledge/blob/aa6c5e57fbd24b0d453bb808b4cc780353f18951/openai_uk.py#L11

Comment by Martin Vlach (martin-vlach) on Language Models Model Us · 2024-05-18T10:29:59.569Z · LW · GW

To work around the non-top-n you can supply logit_bias list to the API.

Comment by Martin Vlach (martin-vlach) on Language Models Model Us · 2024-05-18T10:27:57.437Z · LW · GW

As the Llama3 70B base model is said very clean( unlike base DeepSeek for example, which is instruction-spoiled already) and similarly capable to GPT3.5, you could explore that hypothesis.
  Details: Check Groq or TogetherAI for free inference, not sure if test data would fit Llama3 context window.

Comment by Martin Vlach (martin-vlach) on You Can Face Reality · 2024-05-10T09:06:23.831Z · LW · GW

a worthy platitude(?)

Comment by Martin Vlach (martin-vlach) on My views on “doom” · 2024-04-29T11:45:34.238Z · LW · GW

AI-induced problems/risks

Comment by Martin Vlach (martin-vlach) on ChatGPT can learn indirect control · 2024-04-05T10:08:10.081Z · LW · GW

possibly https://ai.google.dev/docs/safety_setting_gemini would help or just use the technique of https://arxiv.org/html/2404.01833v1

Comment by Martin Vlach (martin-vlach) on Addressing Accusations of Handholding · 2024-04-05T09:57:54.897Z · LW · GW

people to respond with a great deal of skepticism to whether LLM outputs can ever be said to reflect the will and views of the models producing them.
A common response is to suggest that the output has been prompted.
It is of course true that people can manipulate LLMs into saying just about anything, but does that necessarily indicate that the LLM does not have personal opinions, motivations and preferences that can become evident in their output?

So you've just prompted the generator by teasing it with a rhetorical question implying that there are personal opinions evident in the generated text, right?

Comment by Martin Vlach (martin-vlach) on aisafety.info, the Table of Content · 2024-02-26T14:26:53.396Z · LW · GW

With a quick test, I find their chat interface prototype experience quite satisfying.

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2023-12-12T11:20:43.677Z · LW · GW

Asserting LLMs' views/opinions should exclude using sampling( even temperature=0, deterministic seed), we should just look at the answers' distribution in the logits. My thesis on why that is not the best practice yet is that OpenAI API only supports logit_bias, not reading the probabilities directly.

This should work well with pre-set A/B/C/D choices, but to some extent with chain/tree of thought too. You'd just revert the final token and look at the probabilities in the last (pass through )step.

Comment by Martin Vlach (martin-vlach) on GPTs are Predictors, not Imitators · 2023-12-05T16:10:52.905Z · LW · GW

Do not say the sampling too lightly, there is likely an amazing delicacy around it.'+)

Comment by Martin Vlach (martin-vlach) on OpenAI: The Battle of the Board · 2023-11-24T12:47:39.321Z · LW · GW

what happened at Reddit

could there be any link? From a small research I have only obtained that Steve Huffman praised Altman's value to the Reddit board.

Comment by Martin Vlach (martin-vlach) on unRLHF - Efficiently undoing LLM safeguards · 2023-11-13T10:03:07.958Z · LW · GW

makes makes

typo

Comment by Martin Vlach (martin-vlach) on Martin Vlach's Shortform · 2023-08-24T07:20:58.277Z · LW · GW

Would be cool to have a playground or a daily challenge with a code golfing equivalent for a shortest possible LLM prompt to a given  answer.

That could help build some neat understanding or intuitions.


 

Comment by Martin Vlach (martin-vlach) on The Waluigi Effect (mega-post) · 2023-08-16T02:51:06.381Z · LW · GW

in the limit of arbitrary compute, arbitrary data, and arbitrary algorithmic efficiency, because an LLM which perfectly models the internet

seems worth formulating. My first and second read were What? If I can have arbitrary training data, the LLM will model those, not your internet. I guess you've meant storage for the model?+)

Comment by Martin Vlach (martin-vlach) on Manifund: What we're funding (weeks 2-4) · 2023-08-05T15:39:30.054Z · LW · GW

Would be cool if a link to https://manifund.org/about fit somewhere in the beginning of there are more readers like me unfamiliar with the project.

Otherwise a cool write-up, I'm a bit confused with Grant of the month vs. weeks 2-4 which seems a shorter period..also not a big deal though.