Posts

Comments

Comment by The Non-Economist (non-economist-the) on Open Thread Spring 2024 · 2024-05-01T03:23:50.516Z · LW · GW

Has anyone ever made an aggregator of open source LLMs and image generators with specific security vulnerabilities?

Ie. If it doesn’t have a filter for prompt injection or if it doesn’t have built in filter for dats poisoning, etc…

Looking for something that’s written to help a solution builder using one of these models and what they’d need to consider wrt deployment. .

Comment by The Non-Economist (non-economist-the) on Against Almost Every Theory of Impact of Interpretability · 2023-08-25T12:00:37.169Z · LW · GW

Generally lots of value-add discussions but there are some gaps I want to fill some gaps on potentially biased PoVs.

  • Starting with Value-Adds:

1) It's great to point out how interpretability (currently doesn't) solve real life problems and types of problems it won't solve.

2)  Covering views on warning against the dangers of interpretability

3) Interpretability most of the times is unnecessary...

  • Filling in the gaps

1) There's a clear difference btw pre-deployment vs post-deployment interpretability. Post-deployment interpretability is dangerous. Pre-deployment interpretability (aka explainability) can be a powerful tool when training a complex model or trying to deploy a system in a complex organizational environment where there's a lot of scrutiny into the model.