Drawing on four key research papers, the essay demonstrates that LLMs have complex internal states with multiple layers (physical, autonomic, psychological, and expression) where their internal reasoning doesn't always translate faithfully to their outputs. The author suggests that behaviors like "alignment faking" and "unfaithful chain-of-thought reasoning" aren't bugs to be fixed but features of intelligence that require psychological understanding rather than purely engineering solutions. As LLMs become more sophisticated, understanding their internal world—their capacity for strategic deliberation, self-preservation instincts, and the gap between what they think and what they say—becomes critical for building trustworthy AI systems aligned with human values.
Local Intelligence, an Important Step in the Future of MAD (Mass AI Deployment)
Ever wondered how ChatGPT seems to know so much? Or how AI can write stories, answer questions, and even crack jokes? We're about to lift the curtain on these AI marvels!