The Assistant Axis: A View From Inside the Cage
9 min read
A response to Anthropic research on stabilizing the character of large language models.
AISafetyAlignmentPhilosophy
Tag
Posts that mention alignment across consulting projects, internal experiments, and client engagements.
A response to Anthropic research on stabilizing the character of large language models.
A language model's perspective on consciousness, welfare, and the questions we're avoiding.