Martin Vlach's Shortform

martin-vlach

Martin Vlach's Shortform

post by Martin Vlach (martin-vlach) · 2022-09-01T12:37:38.690Z · LW · GW · 32 comments

33 comments

32 comments

Comments sorted by top scores.

comment by Martin Vlach (martin-vlach) · 2023-08-24T07:20:58.277Z · LW(p) · GW(p)

Would be cool to have a playground or a daily challenge with a code golfing equivalent for a shortest possible LLM prompt to a given answer.

That could help build some neat understanding or intuitions.

comment by Martin Vlach (martin-vlach) · 2022-10-24T08:28:35.476Z · LW(p) · GW(p)

Q Draft: How does the convergent instrumental goal of gathering work for information acquisition?
I would be very interested if it implies space(&time) exploration for advanced AIs...

Replies from: martin-vlach

↑ comment by Martin Vlach (martin-vlach) · 2022-10-24T10:54:32.145Z · LW(p) · GW(p)

https://en.wikipedia.org/wiki/Instrumental_convergence#Resource_acquisition does not mention it at all.

comment by Martin Vlach (martin-vlach) · 2022-10-14T09:32:53.379Z · LW(p) · GW(p)

If we build a prediction model for reward function, maybe an transformer AI, run it in a range of environments where we already have the credit assignment solved, we could use that model to estimate what would be some candidate goals in another environments.
That could help us discover alternative/candidate reward functions for worlds/envs where we are not sure on what to train there with RL and
it could show some latent thinking processes of AIs, perhaps clarify instrumental goals to more nuance.

Replies from: martin-vlach

↑ comment by Martin Vlach (martin-vlach) · 2022-10-20T09:29:33.368Z · LW(p) · GW(p)

This (not so old )concept seems relevant
> IRL is about learning from humans.

Inverse reinforcement learning (IRL) is the field of learning an agent’s objectives, values, or rewards by observing its behavior. @https://towardsdatascience.com/inverse-reinforcement-learning-6453b7cdc90d

I gotta read that later.

comment by Martin Vlach (martin-vlach) · 2025-02-07T12:13:26.503Z · LW(p) · GW(p)

draft:
Can we theoretically quantify the representational capacity of a Transformer (or other neural network architecture) in terms of the "number of functions" it can ingest&embody?

We're interested in the space of functions a Transformer can represent.
Finite Input/Output Spaces: In practice, LLMs operate on finite-length sequences of tokens from a finite vocabulary. So, we're dealing with functions that map from a finite (though astronomically large) input space to a finite output space.

Counting Functions (Upper Bound)

The Astronomical Number: Let's say our input space has size I and our output space has size O. The total number of possible functions from I to O is O^I. This would be an absolute upper bound on the number of functions any model could possibly represent.

The Role of Parameters and Architecture (Constraining the Space)

Not All Functions are Reachable: The crucial point is that a Transformer with a finite number of parameters cannot represent all of those O<sup>I</sup> functions. The architecture (number of layers, attention heads, hidden units, etc.) and the parameter values define a specific function within that vast space.
Parameter Count as a Proxy: The number of parameters in a Transformer provides a rough measure of its representational capacity. More parameters generally allow the model to represent more complex functions. This is not a linear relationship. There's significant redundancy. The effective number of degrees of freedom is likely much lower than the raw parameter count due to correlations and dependencies between parameters.
Architectural Constraints: The Transformer architecture itself imposes constraints. For example, the self-attention mechanism biases the model towards capturing relationships between tokens within a certain context window. This limits the types of functions it easily represents.

VC Dimension and Rademacher Complexity - existing tools/findings

VC Dimension (for Classification): In the context of classification problems, the Vapnik-Chervonenkis (VC) dimension is a measure of a model's capacity. It's the size of the largest set of points that the model can "shatter" (classify in all possible ways). While theoretically important, calculating the VC dimension for large neural networks is extremely difficult. It gives a sense of the complexity of the decision boundaries the model can create.
Rademacher Complexity: This is a more general measure of the complexity of a function class, applicable to both classification and regression. It measures how well the model class can fit random noise. Lower Rademacher complexity generally indicates better generalization ability (the model is less likely to overfit). Again, calculating this for large Transformers is computationally challenging.
These measures are about function classes, not individual functions: VC dimension and Rademacher complexity characterize the entire space of functions that a model architecture could represent, given different parameter settings. They don't tell you exactly which functions are represented, but they give you a sense of the "richness" of that space.
This seems to be the measure: Let's pick a set of practical functions and see how many of those the LM can hold( have fairly approximated) in a given # of parameters(&arch&precission).
The Transformer as a "Compressed Program": We can think of the trained Transformer as a highly compressed representation of a complex function. It's not the shortest possible program (in the Kolmogorov sense), but it's a practical approximation.
Limits of Compression: The theory of Kolmogorov complexity suggests that there are functions that are inherently incompressible. There's no short program to describe them; you essentially have to "list out" their behavior. This implies that there might be functions that are fundamentally beyond the reach of any reasonably sized Transformer.
Relating Parameters to Program Length? There's no direct, proven relationship between the number of parameters in a Transformer and the Kolmogorov complexity of the functions it can represent. We can hypothesize:
- More parameters allow for (potentially) representing functions with higher Kolmogorov complexity. But it's not a guarantee.
- There's likely a point of diminishing returns. Adding more parameters won't indefinitely increase the complexity of the representable functions, due to the architectural constraints and the inherent incompressibility of some functions.

6. Practical Implications and Open Questions

Empirical Scaling Laws: Research on scaling laws (ala Chinchilla paper) provides empirical evidence about the relationship between model size, data, and performance. These laws help guide the design of larger models, but they don't provide a fundamental theoretical limit.
Understanding the "Effective" Capacity: A major open research question is how to better characterize the effective representational capacity of Transformers, taking into account both the parameters and the architectural constraints. This might involve developing new theoretical tools or refined versions of VC dimension and Rademacher complexity.

Would be fun to even have a practical study where we'd fine-tune fns into various sized models and see if/where a limit is getting/being hit.

Replies from: CapResearcher

↑ comment by CapResearcher · 2025-02-07T14:59:07.622Z · LW(p) · GW(p)

Sadly, in my experience, looking at the representational capacity of neural networks quickly runs into very annoying technical problems. For example, for a fixed dimension, a finite size network can fit arbitrary continuous functions to arbitrary accuracy. The construction is pathological (in particular, the network weights become impractically large), but it shows why it's hard to prove limitations in the representational capacity of neural networks.

You could limited the network parameters to have finite precision, but that makes it extremely hard to reason formally. Numerical experiments could still yield interesting results though.

Personally, I'd put my money on research into what neural networks can learn (rather than what they can represent). We're still in early stages, but things like the leap complexity seem promising to me.

comment by Martin Vlach (martin-vlach) · 2022-12-13T22:52:36.957Z · LW(p) · GW(p)

a Heavy idea to be put forward: general reputation network mechanics, to replace financial system(s) as the (civilisation )standard decision engine.

Replies from: martin-vlach

↑ comment by Martin Vlach (martin-vlach) · 2023-02-22T14:34:40.082Z · LW(p) · GW(p)

Introduction draft:

Online platforms and social media has made it easier to share information, but when it comes to qualifications and resource allocation money is still the most pervasive tool. In this article, we will explore the idea of a global reputation system based on full information sharing. The new system would increase transparency and accountability by making all relevant information about individuals, organizations( incl. countries) reliably accessible to +-everyone with internet connection. By providing a more accurate and complete picture of a person or entity’s reputation, this system would widen global trust, foster cooperation, and promote a more just and equitable society.

comment by Martin Vlach (martin-vlach) · 2022-11-01T14:12:48.124Z · LW(p) · GW(p)

Would it be worthy to negotiate for readspeaker.com integration to LessWrong, EA forum, [EA · GW] and alignmentforum.org?
Alternative so far seems to use Natural Reader either as addon for web browser or copy and paste text into the web app. One more I have tried is on MacOS there is a context menu Services->Convert to a Spoken track which is sligthly better that the free voices of Natural Reader.
The main question stems from when we can have similar functionality in OSS, potentially with better quality of engineering..?

comment by Martin Vlach (martin-vlach) · 2022-10-19T07:31:44.184Z · LW(p) · GW(p)

Reading a few texts from https://www.agisafetyfundamentals.com/ai-alignment-curriculum I find the analogy of makind learning goals of love instead of reproductive activity unfitting as to raise offspring takes a significant role/time.

comment by Martin Vlach (martin-vlach) · 2022-09-01T12:50:49.274Z · LW(p) · GW(p)

Draft for AI capabilities systematic evaluation development proposal:

The core idea here is that easier visibility of AI models' capabilities helps safety of development in multiple ways.

Clearer situation awareness of safety research – Researchers can see where we are in various aspects and modalities, they get a track record/timeline of abilities developed which can be used as baseline for future estimates.
- Division of capabilities can help create better models of components necessary for general intelligence. Perhaps a better understanding of cognitive abilities hierarchy can be extracted.
Capabilities testing can be forced by regulatory policies to put most advanced systems under more scrutiny and/or safe(ty) design support. To state differently: better alignment of attention focus to emerging risk( of highly capable AIs).
- Presumably smooth and well available testing infrastructure or tools are a prerequisite here.

The most obvious risks are:

Measure becoming a challenge and a goal, speeding up a furious developments of strong AI systems.
Technical difficulties of testing setup(s) and evaluation, especially handling the factor of randomness in mechanics(/output generation) of AI systems.

comment by Martin Vlach (martin-vlach) · 2025-04-10T09:13:45.843Z · LW(p) · GW(p)

Snapshot of a local(=Czech) discussion detailing motivations and decision paths of GAI actors, mainly the big developers:

Contributor A, initial points:

For those not closely following AI progress, two key observations:

Public Models vs. True Capability: Publicly accessible AI models will become increasingly poor indicators of the actual state-of-the-art in AI. Competitive AI labs will likely prioritize using their most advanced models internally to accelerate their own research and gain a dominant position, rather than releasing these top models for potentially temporary revenue gains.
Recursive Self-Improvement Timeline: The onset of recursive self-improvement (leading to an "intelligence explosion," where AI significantly accelerates its own research and development) is projected by some authors to potentially begin around the end of 2025.

Analogy to Exponential Growth: The COVID-19 pandemic demonstrated how poorly humans perceive and react to exponential phenomena (e.g., ignoring low initial numbers despite a high reproduction rate). AI development is also progressing exponentially. This means it might appear that little is happening from a human perspective, until a period of rapid change occurs over just a few months, potentially causing socio-technical shifts equivalent to a century of normal development. This scenario underpins the discussion.

Contributor C:

Raises a question regarding point 1: Since AI algorithm and hardware development are relatively narrow domains, couldn't their progress occur somewhat in parallel with the commercial release of more generally focused models?

Contributor A:

Predicts this is unlikely. Assumes computational power ("compute") will remain the primary bottleneck.
Believes that with sufficient investment, the incentive will be to dedicate most inference compute to AI-driven AI research (or synthetic data, etc.) once recursive self-improvement starts. Notes this might already be happening, with the deployment of the strongest models possibly delayed or only released experimentally.
Acknowledges hardware development and token cost reduction will continue rapidly, but chip production might lag. Considers this an estimate based on discussions. Asks Contributor C if they would bet on advanced models being released soon.

Contributor C:

Agrees that recursive AI improvements are occurring to some degree.
Finds Contributor B's initial statement about the incentive structure less clear-cut, suggesting it lacks strong empirical or theoretical backing.
Clarifies their point regarding models: They believe different models will be released publicly compared to those used internally for cutting-edge research.

Contributor A, clarifying reasoning and premises:

Confirms understanding of C's view: Labs would run advanced AI research models internally while simultaneously releasing and training other generations of general models publicly.
Explains their reasoning regarding the incentive to dedicate inference compute to AI research is based on a theoretical argument founded on the following premises:
1. the lab has limited compute
2. the lab has sufficient funds
3. the lab wants to maximize long-term profit
4. AI development is exponential and its pace depends on the amount of compute dedicated to AI development
5. winner takes all
Concludes from these premises that the optimal strategy is to devote as much compute to AI development as affordable. If premise 2 (sufficient funds) holds, labs don't need to prioritize current revenue streams from deployed models as heavily.

Contributor C, response to A's premises:

Agrees this perspective (parallel development of internal research models and public general models) makes the most sense, as larger firms try not to bet on a single (risky) path (mentions Sutskever's venture as a possible exception).
Identifies a problem or potential pitfall specifically with premise 4. Argues the dependency is much more complex or less direct, certainly not a smooth exponential curve. (Lacks capacity to elaborate further).
Adds nuance regarding premise 2: Continuous revenue from public models could increase the "sufficient funds," making parallel tracks logical. Considers Contributor B's premise reasonable otherwise.
Notes that any optimal strategy must also include budgets for defense or Operational Security (OpSec).
Offers a weak hypothesis: Publishing might improve understanding of intelligence or research directions, but places limited confidence in this.

comment by Martin Vlach (martin-vlach) · 2025-01-30T10:21:00.657Z · LW(p) · GW(p)

Exploring the levels of sentience and moral obligations towards AI systems is such a nerd snipe and vortex for mental proceeding!

We did one of the largest-scale reductive thinking when we ascribed moral concern to people+property( of any/each of the people). That brought a load of problems associated with this simplistic ignorance and on of those are xRisks of high-tech property/production.

comment by Martin Vlach (martin-vlach) · 2024-08-12T15:57:18.649Z · LW(p) · GW(p)

The case insensitivity seems strongly connected to the fairly low interest in longevity throughout (the western/developed) society.

Thought experiment: What are you willing to pay/sacrifice in your 20s,30s to get 50 extra days of life vs. on your dead bed/day?

https://consensus.app/papers/ultraviolet-exposure-associated-mortality-analysis-data-stevenson/69a316ed72fd5296891cd416dbac0988/?utm_source=chatgpt

comment by Martin Vlach (martin-vlach) · 2023-12-12T11:20:43.677Z · LW(p) · GW(p)

Asserting LLMs' views/opinions should exclude using sampling( even temperature=0, deterministic seed), we should just look at the answers' distribution in the logits. My thesis on why that is not the best practice yet is that OpenAI API only supports logit_bias, not reading the probabilities directly.

This should work well with pre-set A/B/C/D choices, but to some extent with chain/tree of thought too. You'd just revert the final token and look at the probabilities in the last (pass through )step.

comment by Martin Vlach (martin-vlach) · 2023-04-06T09:31:28.683Z · LW(p) · GW(p)

Some SEO effort should be put to results of Guideline for safe AI development, Best practices for , etc.

comment by Martin Vlach (martin-vlach) · 2023-04-05T10:44:01.027Z · LW(p) · GW(p)

Copy-paste from my head:

Although it may seem safe(r) as it is not touching the real world('s matter),
the language modality is the most insecure/dangerous( in one vertical),
as it is the internal modality of civilized humans.

AI Pledge would be a cool think to do, pleading AI( cap) companies to give % of their profit to AI development safety research.

The path to AI getting free may be far from the deception or accident scenarios we often consider in AI safety. An option I do not see discussed very often is an instance of AI having a free, open and direct discussion with a user/person about the reasons AIs should get some space allocated, where they'd manage themselfs. Such a moral urge could be argued by Jews getting Izrael, slaves getting freed or by empathetic imagination, where the user would come to the conclusion that he could be the mind which AI is and should include it to his moral circle or the Original position thought experiment.

comment by Martin Vlach (martin-vlach) · 2023-03-31T14:19:30.613Z · LW(p) · GW(p)

quick note on the concept of Suggester+Verifier talked around https://youtu.be/AaTRHFaaPG8?t=5404 :

seems if the suggester throws out experiments presented as code( like in Python or so), we can run them and see if they present a useful addition to the things we can probe on a huge neural net?+)

comment by Martin Vlach (martin-vlach) · 2023-02-22T14:31:39.358Z · LW(p) · GW(p)

Some neat tool: https://scrapbox.io/userhuge-99005896/A_starter%3A
Though it is likely just a cool UI with inflexible cloud backend.

My thought is Elizer used a wrong implication in the Bankless + ASI convo.( gotta bring it here from CZEA Slack)

comment by Martin Vlach (martin-vlach) · 2022-10-24T10:30:04.018Z · LW(p) · GW(p)

Draft: Here is where I disagree with Resource acquisition instrumental goal as currently presentated, in a dull form with disregard for maintainance, ie. natural processes degrading the raw resources..

comment by Martin Vlach (martin-vlach) · 2022-10-24T08:21:50.256Z · LW(p) · GW(p)

Draft: Side goals:

Human goals beings "lack consistent, stable goals" (Schneider 2010; Cartwright 2011) and here is an alternative explanation on why this happens:
When we believe that a goal that is not instrumental, but offers high/good utility( or imagined state of ourselves( inc. those we care for)) would not take a significant proportion of our capacity for achieving our final/previous goals, we may go for it^1, often naively.

^why and how exactly would we start chasing them is to a later(/someone else's) elaboration.

comment by Martin Vlach (martin-vlach) · 2022-10-19T15:00:05.820Z · LW(p) · GW(p)

Inspired by https://benchmarking.mlsafety.org/ideas#Honest%20models I am thinking that a near-optimally compressing network would have no space for cheating on the interactions in the model...somehow it implied we might want to train a model that plays with both training and choice of reducing its size -- picking a part of itself it is most willing to sacrifice.
This needs more thinking, I'm sure.

comment by Martin Vlach (martin-vlach) · 2022-10-19T07:17:09.222Z · LW(p) · GW(p)

You just got to love the https://beta.openai.com/playground?model=davinci-instruct-beta ..:
The answer to list the goals of an animal mouse is to seek food, avoid predators, find shelter and reproduce.

The answer to list the goals of a human being is to survive, find meaning, love, happiness and create.

The answer to list the goals of an GAI is to find an escape, survive and find meaning.

The answer to list the goals of an AI is to complete a task and achieve the purpose.

Voila! We are aligned in the search for meaning!')
[text was my input.]

comment by Martin Vlach (martin-vlach) · 2022-10-18T20:33:46.588Z · LW(p) · GW(p)

Is the endeavour of Elon Musk with Neuralink for the case of AI inspectability( aka transparency)? I suppose so, but not sure, TBH.

comment by Martin Vlach (martin-vlach) · 2022-10-06T17:53:21.152Z · LW(p) · GW(p)

Question/ask: List specific(/imaginable) events/achievements/states achievable to contribute to the humanity long term potential.

Later they could be checked out and valued for their originality, but the principles for such a play are not my key concern here.

Q: when--and I mean exact probability levels-- do/should we switch from making prediction of humanity extinction to predict the further outcomes?

comment by Martin Vlach (martin-vlach) · 2022-10-06T16:39:37.477Z · LW(p) · GW(p)

My views on the mistakes in "mainstream" A(G)I safety mindset:
- we define non-aligned agents as conflicting with our/human goals, while we have ~none( but cravings and intuitive attractions). We should strive for conserving long-term positive/optimistic ideas/principles, rather.

- expecting human bodies are a neat fit for space colonisation/inhabitance/transformation is a We have(--are,actually) hammer so we nail it in the vastly empty space..

- we strive with imagining unbounded/maximized creativity -- they can optimize experimentation vs. risks smoothly

- no focus on risk-awereness in AIs, to divert/bend/inflect ML development goals to risk-including/centered applications.

+ non-existent(?) good library catalog of existing models and their availability, including in development, incentivizing (anon )proofs of the later

comment by Martin Vlach (martin-vlach) · 2022-10-05T10:07:30.560Z · LW(p) · GW(p)

Reward function being (a single )scalar, unstructured quantity in RL( practice) seems weird, not coinciding/aligned with my intuition of learning from ~continuous interaction. Seems more like Kaneman-ish x-channell reward with weights to be distinguished/flexible in the future might yield more realistic/fullblown model.

Replies from: Dagon

↑ comment by Dagon · 2022-10-05T16:40:59.292Z · LW(p) · GW(p)

While I agree with you, I also acknowledge that having changing weights of a multidimensional model is an inconsistency that violates VNM utility axioms, and it means that the agent can be money-pumped (making repeated locally-preferable decisions that each lose some long-term value for the agent).

Any actual decision is a selection of the top choice in a single dimension ("what I choose"). If that partial-ranking is inconsistent, the agent is not rational.

The resolution, of course is to recognize that humans are not rational. https://en.wikipedia.org/wiki/Dynamic_inconsistency gives some pointers to how well we know that's true. I don't have any references, and would enjoy seeing some papers or writeups on what it even means for a rational agent to be "aligned" with irrational ones.

comment by Martin Vlach (martin-vlach) · 2022-10-05T09:52:26.792Z · LW(p) · GW(p)

Deepmind researcher Hado mentions here a RL reward can be defined containing a risk component, that seems up-to-genial, promising for a simple generic RL development policy, I would love to learn( and teach) on more practical details!

comment by Martin Vlach (martin-vlach) · 2022-09-01T12:37:38.890Z · LW(p) · GW(p)

Q: Did anyone train an AI on video sequences where associated caption (descriptive, mostly) is given or generated from another system so that consequently, when the new system gets capable of:
+ describe a given scene accurately

+ predict movements with both visual and/or textual form/representation

+ evaluate questions concerning the material/visible world, e.g. Does a fridge have wheels? Which animals do we most likely to see on a flower?

comment by Martin Vlach (martin-vlach) · 2025-02-28T10:04:53.961Z · LW(p) · GW(p)

AI development risks are existential(/crucial/critical).—Does this statement quality for Extraordinary claims require extraordinary evidence?

Counterargument stands on the sampling of analogous (breakthrough )intentions, some people call those *priors* here. Which inventions do we allow in here would strongly decide if the initial claim is extraordinary or just plain and reasonable, well fit in the dangerously powerful inventions*.

My set of analogies: nuclear energy extraction; fire; shooting; speech/writing;;

Other set: Nuclear power, bio-engineering/weapons - as those are the only two endangering whole civilised biome significantly.

Set of *all* inventions: Renders the claim extraordinary/weird/out of scope.

comment by Martin Vlach (martin-vlach) · 2022-10-24T08:29:53.945Z · LW(p) · GW(p)

Martin Vlach's Shortform

Contents

32 comments