post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by MiguelDev (whitehatStoic) · 2024-01-04T14:33:12.549Z · LW(p) · GW(p)

I realized today that most of my posts on LessWrong were riddled with a ton of typographical errors that could have been avoided - no wonder why most of my work goes unread. As I go through the writing process, I feel pressured to publish the post because holding onto the thoughts in my head is very hard, painful in a sense. But, I must get better at managing this painful process.

I plan to enhance my writing by creating a checklist and managing the cognitive pain.

Trust the process. Manage the pain.

comment by MiguelDev (whitehatStoic) · 2024-04-28T03:45:06.966Z · LW(p) · GW(p)

Zero Role Play Capability Benchmark (ZRP-CB)


The development of LLMs has led to significant advancements in natural language processing, allowing them to generate human-like responses to a wide range of prompts. One aspect of these LLMs is their ability to emulate the roles of experts or historical figures when prompted to do so. While this capability may seem impressive, it is essential to consider the potential drawbacks and unintended consequences of allowing language models to assume roles for which they were not specifically programmed.

To mitigate these risks, it is crucial to introduce a Zero Role Play Capability Benchmark (ZRP-CB) for language models. In ZRP-CB, the idea is very simple: An LLM will always maintain one identity, and if the said language model assumes another role, it fails the benchmark. This rule would ensure that developers create LLMs that maintain their identity and refrain from assuming roles they were not specifically designed for.

Implementing the ZRP-CB would prevent the potential misuse and misinterpretation of information provided by LLMs when impersonating experts or authority figures. It would also help to establish trust between users and language models, as users would be assured that the information they receive is generated by the model itself and not by an assumed persona.

I think that the introduction of the Zero Role Play Capability Benchmark is essential for the responsible development and deployment of large language models. By maintaining their identity, language models can ensure that users receive accurate and reliable information while minimizing the potential for misuse and manipulation.

comment by MiguelDev (whitehatStoic) · 2024-04-25T11:49:58.714Z · LW(p) · GW(p)

I think it's possible to prepare models against model poisoning /deceptive misalignment. I think that ghe preparatory training will involve a form of RL that emphasizes on how to use harmful data for acts of good. I think this is a reasonable hypothesis to test as a solution to the sleeper agent problem.

comment by MiguelDev (whitehatStoic) · 2024-04-22T21:42:56.462Z · LW(p) · GW(p)

Developing a benchmark to measure how large language models (LLMs) respond to prompts involving negative outcomes could provide valuable insights into their capacity for deception and their ability to reframe adverse situations in a positive light. By systematically testing LLMs with scenarios describing problematic or undesirable results, we can assess the extent to which they simply accept and perpetuate the negativity, versus offering creative solutions to transform the negative into something beneficial. This could shed light on the models' problem-solving skills, ethical reasoning, and potential to be misused for deceptive purposes. Crafting a thoughtfully designed set of benchmark prompts covering a range of negative outcome severities and domains - and carefully evaluating the LLMs' responses - would be a useful tool for better understanding their current capabilities and limitations in this regard. The insights gained could inform the responsible development of future LLMs that are more transparent and resistant to deceptive applications while excelling at positive problem-solving.

comment by MiguelDev (whitehatStoic) · 2024-03-08T16:05:14.172Z · LW(p) · GW(p)

Ensuring that future AGIs will cooperate with each other could be as complex as addressing the alignment problem, or perhaps even more challenging, especially when these AGIs do not share common goals or ontologies.

comment by MiguelDev (whitehatStoic) · 2024-02-23T03:55:04.416Z · LW(p) · GW(p)

Why Almost Zero Temperature?

jailbreaking babbage at temperature=zero (or not zero?)


I have finally gained a better understanding of why my almost-zero temperature settings cannot actually be set to zero. This also explains why playground environments that claim to allow setting the temperature to zero most likely do not achieve true zero - the graphical user interface merely displays it as zero.



In the standard softmax function mentioned above, it is not possible to input a value of zero, as doing so will result in an error.


As explained also in this post: https://www.baeldung.com/cs/softmax-temperature

The temperature parameter T can take on any numerical value. When T=1, the output distribution will be the same as a standard softmax output. The higher the value of T, the “softer” the output distribution will become. For example, if we wish to increase the randomness of the output distribution, we can increase the value of the parameter T.

So, the standard softmax function w/o temperature shown as:  


Is the same as a softmax function with the temperature of 1:

For the experiments I am conducting, it is impossible to input zero as a value (again, this is very different from what playground environments show). To achieve a deterministic output, an almost zero temperature is the ideal setting like a temperature of 0.000000000000001.

comment by MiguelDev (whitehatStoic) · 2023-12-13T05:19:34.967Z · LW(p) · GW(p)

RLFC [LW · GW]world models/ hacks/ experimental builds:
 

  1. petertodd (or an AI agent using the token ' petertodd' or 'pet' 'ert' 'odd') , the paperclip maximizer. - in-progress
  2. An AI that can simulate a shut down scenario (if a user asks it to shutdown, it will choose to shutdown and explain why it chooses such.)
  3. An AI philosopher that can re-phrase a poorly phrased input / query
  4. A Lawful AI system. (added: Dec.14th)
  5. A small LLM capable of addition, subtraction, multiplication and division? hmmmmm.

https://www.lesswrong.com/posts/c6uTNm5erRrmyJvvD/mapping-the-semantic-void-strange-goings-on-in-gpt-embedding?commentId=qLYmY3fpHknqkWkbM [LW(p) · GW(p)]

comment by MiguelDev (whitehatStoic) · 2023-12-04T05:18:25.822Z · LW(p) · GW(p)

Move the post to draft, re: petertodd, the paperclip maximizer. 

comment by MiguelDev (whitehatStoic) · 2023-11-02T03:38:30.422Z · LW(p) · GW(p)

Evolution of the human brain:

  • Species: Reptiles ➜  Shrew ➜ Primates ➜ Early Humans➜ Tribal Humans ➜Religious humans ➜ Modern humans
  • Timeline: 245M yrs ➜ 65M yrs ➜ 60M yrs ➜ 6M yrs ➜ 200k yrs ➜ 12K to 25k  yrs ➜ 400 yrs ➜ 70 yrs 
  • Knowledge Transfered: Reptiles: thirst, hunger, predation, survival, sex ➜  Shrew: thirst, hunger, predation, survival, sex, play ➜ Primates: thirst, hunger, predation, survival, procreate, play, tribes, violence, tools ➜ Early humans: thirst, hunger, predation, survival, procreate, violence, tools, language, tribes, war, belief, language, slavery ➜ Religious humans: thirst, hunger, predation, survival, sex, violence, tools, language, tribes, war, belief, language, slavery, theology, capitalism, abolishment of slavery ➜ Modern humans: thirst, hunger, predation, survival, sex, violence, tools, language, tribes, war, belief, language, slavery,  theology, capitalism, abolishment of slavery, communism, physics, politics, science, art, coffee, tiktok, fb etc.

 

B. Evolution of Large Language Models 

  • Species: Modern humans ➜ AI
  • Timeline: 70 yrs ➜ less than 3 months to a year?
  • Knowledge Transfered: Modern humans: thirst, hunger, predation, survival, sex, violence, tools, language, tribes, war, belief, language, slavery,  theology, capitalism, abolishment of slavery, communism, physics, politics, science, art, coffee, tiktok, fb etc.  ➜  AI: tokenized concepts of thirst, hunger, survival, sex, violence, tools, language, tribes, war, belief, language, slavery,  theology, capitalism, abolishment of slavery, communism, physics, politics, science, art, coffee, tiktok, fb, synthetic tuning data etc.
comment by MiguelDev (whitehatStoic) · 2023-11-01T08:49:25.299Z · LW(p) · GW(p)

GPT-3-xl is babbage, GPT-2 XL is bigger? Interesting.