## Posts

Alignment is Hard: An Uncomputable Alignment Problem 2023-11-19T19:38:23.564Z

Comment by Alexander Bistagne (statistical-sprocket) on Alignment is Hard: An Uncomputable Alignment Problem · 2023-11-20T20:19:14.540Z · LW · GW

Thanks for the feedback!

Comment by Alexander Bistagne (statistical-sprocket) on Alignment is Hard: An Uncomputable Alignment Problem · 2023-11-20T01:23:01.116Z · LW · GW

On your first point, correct the thing shown to be uncomputable is testing alignment. And yes, uncomputability is a worst case claim. Would it be clearer to call the paper an uncomputable alignment TEST as opposed to an uncomputable alignment PROBLEM? (Im considering editing the paper before submitting it to a journal)

Detering a few would be nice. More realistically, proofs in this vain could help convince regulators to ignore opaque box makers claims about detecting an agent's alignment.

Comment by Alexander Bistagne (statistical-sprocket) on Prizes for matrix completion problems · 2023-11-18T23:25:08.561Z · LW · GW

I understand this bounty is roughly closed, but the page is still listed as bounty active and my attempt took me only a couple hours.

Problem 2 is solvable when there are simple restrictions on the entries chosen.

Restriction 1: No diagonal entries need to be correct. Solution: Let C = AAT For each entry c_ij we construct the matrix with only c_ii, c_ij, c_ji , c_jj and all else zero. Clearly this matrix is constructible in O(n) work. So all entries can be constructed in this way. Note if c_ij and c_ji are requested, only construct one matrix for the pair. Repeating this for all m does O(nm) work. We then sum all such matricies to get our estimate E. Only the diagonal will possibly need summing so O(nm) work is done. Clearly each constructed matrix is positive semidefinite as it is a principal submatrix of C. The sum is also positive semidefinite because the individual matricies are positive semidefinite.

Restriction 2: Only 1 entry per row-column number is needed other than the main diagonal. The above solution still works.