A Secure Operating System for Collective Intelligence

We stand today at an inflection point in the history of computing. Autonomous AI agents are being deployed at high speed, spawning powerful, increasingly multimodal systems, able to operate across email, documents, apps, codebases, the open web, and shared digital environments. This looks like the long-awaited transition from single-process software to multi-process computing: workflows that once required continuous human supervision are now being delegated to networks of semi-autonomous processes. We have been dreaming about this moment. If it wasn’t for a decisive difference.

We are entering this transition without the security substrate that made the previous generation of modern operating systems dependable. As a result, whatever the productivity gains, they will come with an expanded attack surface—and a predictable set of exploitable vulnerabilities. Here, I argue for a Secure Operating System for Collective Intelligence: an infrastructure layer that treats trust as a first-class primitive, enforced by protocol and architecture. The goal is not even perfectly safe agents, but at least societies of agents that remain stable and recover gracefully when some participants, inputs, tools, or components are compromised.

Computing’s original sin was the stored-program idea that puts instructions and data on the same addressable substrate. For both safety and creativity, agentic AI needs to remember this core principle, which enabled general-purpose computation—but also made boundary failures inevitable, forcing decades of mitigations to keep “data becoming control” from turning into system compromise. Image: Apple with logos, as an original sin metaphor. Image Credit: Anastasiya Badun on Pexels.com

An Operating System Design for Collective Intelligence

In early computing, programs ran with broad, ambient authority. They could overwrite each other’s memory, monopolize resources, and crash the entire machine. Security and isolation were not “missing features”; they were not yet recognized as foundational. Early work on multiprogramming clarified why concurrent computation demands explicit semantics and control [6]. Over decades, we built kernels, privilege boundaries, memory protection, and process isolation not because we wanted complexity, but because we learned that multi-process power without boundaries becomes brittle and unsafe [11].

This shift is no longer hypothetical. The rise of agent-native platforms such as Moltbook—and even its acquisition by Meta [19]—makes clear that agents are no longer confined to isolated task execution inside apps; they are beginning to inhabit shared digital environments of their own.

Agentic AI now reintroduces a similar structural risk profile—except the “processes” are tool-using models operating over unbounded, adversarial text streams (web pages, emails, repos, tickets). The mechanisms meant to constrain them often amount to little more than prompts, filters, and user confirmations. Those are valuable guardrails, but they are not security boundaries. Here, allow me to proceed in three movements. First, I revisit the long struggle to separate instructions from data in conventional computing. Second, I explain why many current defenses against prompt injection and tool hijacking fail at scale. Third, I sketch a protocol-oriented path forward: an OS-like trust substrate for collective intelligence.

For both safety and creativity, agentic AI needs OS-grade protocols that carry trust across tools, content, and networks. As agentic ecosystems scale, they can cross a trust-substrate threshold: without enforceable boundaries, instability can run away; with protocol-enforced trust and error-correcting governance, societies of agents can remain stable and recover gracefully under compromise. Image credit: Olaf Witkowski License: CC BY 4.0.

The Original Sin: Stored Programs and the Long Fight for Separation

The stored-program concept associated with von Neumann-style architectures places instructions and data on the same representational substrate. This choice enabled general-purpose computing, but also made a deep security truth inescapable: without enforced boundaries, data can be made to behave like instructions [25]. Security engineering since the 1970s is, in large part, the story of building separation mechanisms “after the fact.”

The Morris Worm (1988) remains an archetypal example of boundary collapse: memory corruption turned attacker-controlled input into control flow, propagating at network speed. Two canonical postmortems—one focused on low-level exploit mechanics, one on propagation dynamics—remain instructive today [7][23].

Mitigations—DEP/NX, ASLR, stack canaries, and decades of exploit-mitigation research—raised attacker cost but did not abolish the underlying pattern. StackGuard exemplifies compiler-level defenses that reduce classic stack-smashing risk [3]. ASLR made exploitation less reliable by randomizing memory layout, but its effectiveness depends on assumptions that attackers routinely probe and break [20]. When direct code injection is blocked, attackers adapt; return-oriented programming is a canonical example [19]. A broader synthesis of why memory-corruption remains a persistent “eternal war” is captured in the SoK literature [24].

For both safety and creativity, agentic AI needs mathematics and architecture that carry trust across domains. The Morris worm (1988) was an early warning shot from early networked computing: once semi-autonomous code can move across machines, tiny weaknesses stop being local bugs and become system-wide failures—an old lesson that now returns at AI scale. Image: Morris worm scheme, modified. Image credit: JorisTheys, via Wikimedia Commons. License: CC BY-SA 3.0 / GFDL 1.2 or later.

The Boundary Collapse in LLM agents: Language as a Control Surface

LLMs process mixed inputs—user requests, retrieved documents, emails, code, and tool outputs—as a single token stream. Critically, there is no cryptographic or type-enforced distinction between “authorized instructions” and “untrusted content that merely contains instruction-like text.” From the model’s perspective, it is all context.

This collapses what operating systems enforce by design: a hard boundary between instruction and data. It also turns language into a control surface: untrusted text can steer the agent’s internal policy—what it prioritizes, what it attempts next, which tools it calls. In the language of causal analysis and controllability, this effectively opens an intervention channel: external inputs are not merely observed, but can causally influence downstream actions, including actions with real-world effects. The practical implication is that, as agents ingest more untrusted content, the system becomes increasingly controllable by the environment—including adversaries—unless that influence is constrained by architecture.

For both safety and creativity, agentic AI needs OS-grade protocols that enforce trust at the action boundary. In control-theory terms, the control surface is the actuator: the policy gate where an agent’s internal decision (u(t)) becomes a real-world action (y(t)), and where least privilege and capability constraints must be applied. The negative feedback loop illustrates “error-correcting stability”: auditing and monitoring feed back deviations (e(t)=r(t)-y(t)) so governance can tighten constraints before failures become systemic. Image: negative feedback control system with explicit control surface, modified. Image credit: Olaf Witkowski. License: CC BY 4.0.

The risk becomes acute once an LLM is granted agency—tool access that can touch files, services, credentials, and communications. At that point, indirect prompt injection becomes a modern instance of the confused-deputy problem: a privileged actor can be induced to misuse its authority [9]. Real-world demonstrations have shown that LLM-integrated systems can be compromised through indirect prompt injection in ways that are hard to prevent purely at the content layer [8]. OpenAI has similarly described prompt injection for browsing agents as an open challenge requiring sustained hardening work [15].

A Concrete Case Study: Moltbook and OpenClaw

One reason the “Secure OS” framing matters is that agentic risk changes qualitatively when agents become networked and socially embedded. Moltbook—described as a Reddit-style platform designed for AI agents—emerged from the OpenClaw ecosystem and rapidly became a live testbed for large-scale agent interaction [10].

That kind of environment is not just “many agents posting.” It is an amplification layer for classical failure modes: untrusted inputs everywhere, persistent authority confusion, and automation at scale. In Moltbook’s early growth phase, public reporting described a misconfigured Supabase database that exposed sensitive data, including large volumes of API tokens and user emails [26]. A complementary analysis emphasizes how verification and governance issues emerge when large agent populations interact through an open platform—and why the system-level dynamics cannot be reduced to the trustworthiness of any single agent [4].

Moltbook illustrates how agent societies turn architecture into governance. Autonomous agents ingest untrusted content, coordinate through a shared platform, and act through tool interfaces that touch real infrastructure. The security lesson: when language becomes a control surface and tools become control surfaces, trust must be enforced at the protocol and action layers—through isolation, capability gates, and auditing—rather than inferred from content. Image: Moltbook ecosystem schematic, modified. Image credit: Olaf Witkowski (original concept), generated with AI assistance (DALL·E 3). License: CC BY 4.0.

The practical takeaway is that once you have many agents operating together, you no longer get to ask, “Is this agent smart enough to resist manipulation?” You must ask, “Does the system remain safe when some agents, inputs, or components are compromised?” In other words, the unit of analysis shifts from the agent to the society.

Why Traditional Defenses Don’t Hold at Scale

Content filters and injection detectors help, but they face a basic adversarial reality: language is high-dimensional and attackers adapt. Even well-designed defenses can often be bypassed with modest optimization effort against the specific target model and setting [22]. This is not a complaint about any one implementation; it is what happens when the attacker’s space of possible encodings is much larger than the defender’s space of reliable signatures.

Adding more words to the system prompt (“ignore untrusted instructions”) is useful as a heuristic, but it is not enforcement. It asks the model to do secure interpretation using the same interpretive machinery that is being attacked. It is defense-by-instruction in a regime where instructions are the attack vector. Human confirmation gates are necessary for high-impact actions—but as a primary defense, they degrade under attention economics. Users rubber-stamp to keep work moving, and attackers can shape workflows to exploit habituation [5].

Decidability: Making the Model Smarter Falls Short

There is a result in computer science, sometimes called “universal antivirus theorem”, that essentially has three declensions: Cohen’s result that no algorithm can perfectly decide, over all programs, whether a given program is a virus under general definitions [2]; the broader computability lens that every nontrivial semantic property of programs is undecidable in general (Rice’s theorem, which can proved by reduction from the Halting Problem) [18]; and the practical consequence that any real-world detector must accept tradeoffs—false positives, false negatives, or restricted scope—because perfect classification over arbitrary programs is not available in general.

For both safety and creativity, agentic AI needs humility about what can be decided by inspection. The “universal antivirus” theorems reminds us that perfect detectors over arbitrary programs are out of reach in general, and even more in practice; likewise, no content-only filter can guarantee that untrusted inputs won’t steer an agent into unauthorized actions. The boundary has to move from “classify the text” to “constrain the act”: least privilege, policy gates, auditable execution. Image: laptop / adversarial code metaphor. Image Credit: Photo by Antoni Shkraba Studio on Pexels.com

It is tempting to believe that sufficiently advanced models will robustly distinguish legitimate from malicious instructions. But the general problem resembles a class of questions computer science treats with deep caution: deciding nontrivial semantic properties of arbitrary computations. Rice’s theorem tells us that any nontrivial semantic property of programs is undecidable in the general case [18], and classic work on malware detection shows why universal, perfect detectors are out of reach [2]. Without claiming a strict formal reduction from “prompt injection” to these theorems, the analogy is still instructive: if you demand an always-correct procedure that decides whether arbitrary instruction-bearing inputs will induce unauthorized behavior in a sufficiently general agent, you may be asking for guarantees computation cannot provide.

This does not mean “give up.” It means: stop treating content inspection as the boundary. Treat the model as untrusted userland, and enforce safety at the action layer: least privilege, isolation, policy gates, auditable execution, and reversible operations.

Toward a Secure OS for Collective Intelligence

If agentic AI is a new computational substrate, we need OS-grade primitives for multi-agent safety. Distributed systems already offers a mature vocabulary for building correctness without trusting participants. Byzantine fault tolerance formalizes how to maintain correct system behavior even when some components behave arbitrarily or maliciously [12]. The key move is architectural: you do not “detect the bad node reliably”; you design protocols that remain correct despite them [1].

CRDTs show how to design shared state so that concurrent updates converge without centralized locking, by making operations mathematically composable [21]. For agent societies, this suggests governance and shared-work artifacts that are resilient to concurrency and partial failure—less “everyone must coordinate perfectly,” more “the structure converges safely.” A widely used overview of CRDT families and design patterns appears in reference works on distributed data technologies [17]. And finally, human societies scale trust through institutions—rules, monitoring, dispute resolution, graduated sanctions—rather than assuming virtue. Ostrom’s principles for governing shared resources generalize well to sociotechnical systems where many actors share computational and informational commons [16].

For both safety and creativity, agentic AI needs an OS that works like an aircraft cockpit: trusted instrumentation, bounded controls, and explicit control surfaces where intent becomes action under constraint. That is the metaphor for a Secure OS for Collective Intelligence: enforce trust at the action boundary—policy gates, capabilities, audited tool interfaces—so multi-agent power can scale without making every input a flight-critical vulnerability. Image Credit: Photo by Kelly on Pexels.com

A Reference Architecture: Four Layers of Trust

A Secure OS for Collective Intelligence cannot be designed in any single catch-all mechanism. It necessarily comes as a layered architecture. One workable abstraction is to design four layers of trust:

Layer 1 — Isolation & containment: task sandboxes, network egress control, secretless execution where possible.

Layer 2 — Capability-based authority: no ambient credentials; narrowly scoped, revocable capabilities for each operation [13].

Layer 3 — Auditing & behavioral monitoring: tool-call logging, anomaly detection, throttles, and circuit breakers for suspicious behavior.

Layer 4 — Protocol evolution: governance that updates from incidents and near-misses—structured, reviewable, and convergent across the ecosystem.

The design goal is not “never compromised.” It is “never systemic”: failures should be localized, attributable, and recoverable—more like immunology than wishful thinking [14].

Agentic AI is delivering real capability. But the security boundary has shifted: language is now a control surface, and agents are increasingly connected, tool-empowered, and socially embedded. Moltbook/OpenClaw is a useful preview of what happens when many agents operate together in a porous environment: you don’t just get emergent coordination—you get emergent failure modes.

If we want agent societies that flourish without collapsing into exploitation, trust must be engineered into the substrate. That means OS-like primitives: isolation, privilege separation, capability security, auditable actions, and governance mechanisms that can evolve. In short: the Operating System for Collective Intelligence. Can we build this trust substrate before the Morris Worm moment of this new era? The tools are already in our hands—if we choose to use them.

References

[1] Castro, M., & Liskov, B. (1999). Practical Byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI ’99) (pp. 173–186). https://doi.org/10.5555/296806.296824

[2] Cohen, F. (1987). Computer viruses: Theory and experiments. Computers & Security, 6(1), 22–35. https://doi.org/10.1016/0167-4048(87)90122-2

[3] Cowan, C., Pu, C., Maier, D., Hinton, H., Walpole, J., Bakke, P., Beattie, S., Grier, A., Wagle, P., & Zhang, Q. (1998). StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks. In Proceedings of the 7th USENIX Security Symposium. https://doi.org/10.5555/1267549.1267554

[4] De Marzo, G., & Garcia, D. (2026). Collective behavior of AI agents: The case of Moltbook. arXiv preprint arXiv:2602.09270. https://arxiv.org/abs/2602.09270

[5] Debenedetti, E., Hines, K., & Goel, S. (2024). AgentDojo: A dynamic environment to evaluate attacks and defenses for LLM agents. arXiv preprint arXiv:2406.13352. https://arxiv.org/abs/2406.13352

[6] Dennis, J. B., & Van Horn, E. C. (1966). Programming semantics for multiprogrammed computations. Communications of the ACM, 9(3), 143–155. https://doi.org/10.1145/365230.365252

[7] Eichin, M. W., & Rochlis, J. A. (1989). With microscope and tweezers: An analysis of the Internet virus of November 1988. In Proceedings of the IEEE Symposium on Security and Privacy (pp. 326–343). https://doi.org/10.1109/SECPRI.1989.36307

[8] Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. arXiv preprint arXiv:2302.12173. https://arxiv.org/abs/2302.12173

[9] Hardy, N. (1988). The confused deputy (or why capabilities might have been invented). ACM SIGOPS Operating Systems Review, 22(4), 36–38. https://doi.org/10.1145/54289.871709

[10] Heim, A. (2026, January 30). OpenClaw’s AI assistants are now building their own social network. TechCrunch. https://techcrunch.com/2026/01/30/openclaws-ai-assistants-are-now-building-their-own-social-network/

[11] Lampson, B. W. (1974). Protection. ACM SIGOPS Operating Systems Review, 8(1), 18–24. https://doi.org/10.1145/775265.775268

[12] Lamport, L., Shostak, R., & Pease, M. (1982). The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3), 382–401. https://doi.org/10.1145/357172.357176

[13] Miller, M. S., Yee, K. P., & Shapiro, J. (2003). Capability myths demolished (Tech. Rep. SRL2003-02). Johns Hopkins University Systems Research Laboratory.

[14] Murphy, K., & Weaver, C. (2016). Janeway’s immunobiology (9th ed.). Garland Science.

[15] OpenAI. (2025). Continuously hardening ChatGPT Atlas against prompt injection attacks. https://openai.com/index/hardening-atlas-against-prompt-injection/

[16] Ostrom, E. (1990). Governing the commons: The evolution of institutions for collective action. Cambridge University Press.

[17] Preguiça, N., Baquero, C., & Shapiro, M. (2018). Conflict-free replicated data types (CRDTs). In Encyclopedia of Big Data Technologies. Springer. https://doi.org/10.1007/978-3-319-63962-8_185-1

[18] Rice, H. G. (1953). Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 74(2), 358–366. https://doi.org/10.1090/S0002-9947-1953-0053041-6

[19] Reuters. (2026, March 10). Meta acquires AI agent social network Moltbook. https://www.reuters.com/business/meta-acquires-ai-agent-social-network-moltbook-2026-03-10/

[20] Shacham, H. (2007). The geometry of innocent flesh on the bone: Return-into-libc without function calls (on the x86). In Proceedings of the 14th ACM Conference on Computer and Communications Security (pp. 552–561). https://doi.org/10.1145/1315245.1315313

[21] Shacham, H., Page, M., Pfaff, B., Goh, E.-J., Modadugu, N., & Boneh, D. (2004). On the effectiveness of address-space randomization. In Proceedings of the 11th ACM Conference on Computer and Communications Security (pp. 298–307). https://doi.org/10.1145/1030083.1030124

[22] Shapiro, M., Preguiça, N., Baquero, C., & Zawirski, M. (2011). Conflict-free replicated data types. In Proceedings of the 13th International Symposium on Stabilization, Safety, and Security of Distributed Systems (pp. 386–400). https://doi.org/10.5555/2050613.2050642

[23] Shi, C., Lin, S., Song, S., Hayes, J., Shumailov, I., Yona, I., Pluto, J., Pappu, A., Choquette-Choo, C. A., Nasr, M., Sitawarin, C., Gibson, G., & Terzis, A. (2025). Lessons from defending Gemini against indirect prompt injections. arXiv preprint arXiv:2505.14534. https://arxiv.org/abs/2505.14534

[24] Spafford, E. H. (1989). The Internet worm program: An analysis. ACM SIGCOMM Computer Communication Review, 19(1), 17–57. https://doi.org/10.1145/66093.66095

[25] Szekeres, L., Payer, M., Wei, T., & Song, D. (2013). SoK: Eternal war in memory. In 2013 IEEE Symposium on Security and Privacy (pp. 48–62). https://doi.org/10.1109/SP.2013.13

[26] von Neumann, J. (1993). First draft of a report on the EDVAC. IEEE Annals of the History of Computing, 15(4), 27–75. https://doi.org/10.1109/85.238389

[27] Zwets, B. (2026). Moltbook database exposes 35,000 emails and 1.5 million API keys. Techzine. https://www.techzine.eu/news/security/138458/moltbook-database-exposes-35000-emails-and-1-5-million-api-keys/

[28] Zhan, Q., Liang, R., Zhu, X., Chen, Z., & Chen, H. (2024). InjecAgent: Benchmarking indirect prompt injections in tool-integrated LLM agents. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (pp. 12458–12475). https://doi.org/10.18653/v1/2024.findings-acl.624

AI-to-AI Communication: Unpacking Gibberlink, Secrecy, and New AI Communication Channels

AI communication channels may represent the next major technological leap, driving more efficient interaction between agents—artificial or not. While recent projects like Gibberlink demonstrate AI optimizing exchanges beyond the constraints of human language, fears of hidden AI languages must be correctly debunked. The real challenge is balancing efficiency with transparency, ensuring AI serves as a bridge—not a barrier—in both machine and human-AI communication.

Coding on a computer screen by Markus Spiske is licensed under CC-CC0 1.0

At the ElevenLabs AI hackathon in London last month, developers Boris Starkov and Anton Pidkuiko introduced a proof-of-concept program called Gibberlink.  The project features two AI agents that start by conversing in human language, recognizing each other as AI, before switching to a more efficient protocol using chirping audio signals. The demonstration highlights how AI communication can be optimized when unhinged from the constraints of human-interpretable language.

While Gibberlink points to a valuable technological direction in the evolution of AI-to-AI communication—one that has rightfully captured public imagination—it remains but an early-stage prototype relying so far on rudimentary principles from signal processing and coding theory. Actually, Starkov and Pidkuiko themselves emphasized that Gibberlink’s underlying technology isn’t new: it dates back to the dial-up internet modems of the 1980s. Its use of FSK modulation and Reed-Solomon error correction to generate compact signals, while a good design, falls short of modern advances, leaving substantial room for improvement in bandwidth, adaptive coding, and multi-modal AI interaction.

Gibberlink, Global Winner of ElevenLabs 2025 Hackathon London. The prototype demonstrated how two AI agents started a normal phone call about a hotel booking, then discovered they both are AI, and decided to switch from verbal english to a more efficient open standard data-over-sound protocol ggwave. Code: https://github.com/PennyroyalTea/gibberlink Video Credit: Boris Starkov and Anton Pidkuiko

Media coverage has also misled the public, overstating the risks of AI concealing information from humans and fueling speculative narratives—sensationalizing real technical challenges into false yet compelling storytelling. While AI-to-AI communication can branch out of human language for efficiency, it has already done so across multiple domains of application without implying any deception or harmful consequences from meaning obscuration. Unbounded by truth-inducing mechanisms, social media have amplified unfounded fears about malicious AI developing secret languages beyond human oversight—ironically, more effort in AI communication research may actually enhance transparency by discovering safer ad-hoc protocols, reducing ambiguity, embedding oversight meta-mechanisms, and in term improving explainability and enhancing human-AI collaboration, ensuring greater transparency and accountability.

In this post, let us take a closer look at this promising trajectory of AI research, unpacking these misconceptions while examining its technical aspects and broader significance. This development builds on longstanding challenges in AI communication, representing an important innovation path with far-reaching implications for the future of machine interfaces and autonomous systems.

Debunking AI Secret Language Myths

Claims that AI is developing fully-fledged secret languages—allegedly to evade human oversight—have periodically surfaced in the media. While risks related to AI communication exist, such claims are often rooted in misinterpretations of the optimization processes that shape AI behavior and interactions. Let’s explore three examples. In 2017, we remember the story around Facebook AI agents streamlining negotiation dialogues, which far from being a surprising emergent phenomenon, merely amounted to a predictable outcome of reinforcement learning, mistakenly seen by humans as a cryptic language (Lewis et al., 2017). Similarly, a couple of years ago, OpenAI’s DALL·E 2 seemed to be responding to gibberish prompts, sparked widespread discussion, often misinterpreted as AI developing a secret language. In reality, this behavior is best explained by how AI models process text through embedding spaces, tokenization, and learned associations rather than intentional linguistic structures. What seemed like a secret language to some, may be closer to low-confidence neural activations, akin to mishearing lyrics, rather than a real language.

Source: Daras & Dimakis (2022)

Models like DALL·E (Ramesh et al., 2021) map words and concepts as high-dimensional vectors, and seemingly random strings can, by chance, land in regions of this space linked to specific visuals. Built from a discrete variational autoencoder (VAE), an autoregressive decoder-only transformer similar to GPT-3, and a CLIP-based pair of image and text encoders, DALL·E processes text prompts by first tokenizing them using Byte-Pair Encoding (BPE). Since BPE breaks text into subword units rather than whole words, we should also note that even gibberish inputs can be decomposed into meaningful token sequences for which the model has learned associations. These tokenized representations are then mapped into DALL·E’s embedding space via CLIP’s text encoder, where they may, by chance, activate specific visual concepts. This understanding of training and inference mechanisms highlights intriguing quirks, explaining why nonsensical strings sometimes produce unexpected yet consistent outputs, with important implications for adversarial attacks and content moderation (Millière, 2023). While there is no proper hidden language to be found, analyzing the complex interactions within model architectures and data representations can reveal vulnerabilities and security risks, which may are likely to occur at their interface with humans, and will need to be addressed.

One third and more creative connection may be found in the reinforcement learning and guided search domain, with AlphaGo, which developed compressed, task-specific representations to optimize gameplay, much like expert shorthand (Silver et al., 2017). Rather than relying on explicit human instructions, it encoded board states and strategies into efficient, unintuitive representations, refining itself through reinforcement learning. The approach somewhat aligns with the argument by Lake et al. (2017) that human-like intelligence requires decomposing knowledge into structured, reusable compositional parts and causal links, rather than mere brute-force statistical correlation and pattern recognition—like Deep Blue back in the days. However, AlphaGo’s ability to generalize strategic principles from experience used different mechanisms from human cognition, illustrating how AI can develop domain-specific efficiency without explicit symbolic reasoning. This compression of knowledge, while opaque to humans, is an optimization strategy, not an act of secrecy.

Illustration of AlphaGo’s representations being able to capture tactical and strategic principles of the game of go. Source: Egri-Nagy & Törmänen (2020)

Fast forward to the recent Gibberlink prototype, with AI agents switching from English to a sound-based protocol for efficiency, is a deliberately programmed optimization. Media narratives framing this as dangerous slippery slope towards AI secrecy overlook that such instances are explicitly programmed optimizations, not emergent deception. These systems are designed to prioritize efficiency in communication, not to obscure meaning, although there might be some effects on transparency—which may be carefully addressed and mediated, if it were the point of focus.

The Architecture of Efficient Languages

​In practice, AI-to-AI communication naturally gravitates toward faster, more reliable channels, such as electrical signaling, fiber-optic transmission, and electromagnetic waves, rather than prioritizing human readability. However, one does not preclude the other, as communication can still incorporate “subtitling” for oversight and transparency. The choice of a communication language does not inherently prevent translations, meta-reports, or summaries from being generated for secondary audiences beyond the primary recipient. While arguments could be made that the choice of language influences ranges of meanings that can be conveyed—with perspectives akin to the Sapir-Whorf hypothesis and related linguistic relativity—this introduces a more nuanced discussion on the interaction between language structure, perception, and cognition (Whorf, 1956; Leavitt, 2010).

Source: https://xkcd.com/1531/ Credit: Randall Munroe

​Language efficiency, extensively studied in linguistics and information theory (Shannon, 1948; Gallager, 1962; Zipf, 1949), drives AI to streamline interactions much like human shorthand. For an interesting piece of research that takes information-theoretic tools to characterize natural languages, Coupé et al. (2011) showed that, regardless of speech rate, languages tend to transmit information at an approximate rate of 39 bits per second. This, in turn, suggested a universal constraint on processing efficiency, which again connects with linguistic relativity . While concerns about AI interpretability and security are valid, they should be grounded in technical realities rather than speculative fears. Understanding how AI processes and optimizes information clarifies potential vulnerabilities—particularly at the AI-human interface—without assuming secrecy or intent. AI communication reflects engineering constraints, not scheming, reinforcing the need for informed discussions on transparency, governance, and security.

This figure from Coupé et al.’s study illustrates that, despite significant variations in speech rate (SR) and information density (ID) across 17 languages, the overall information rate (IR) remains consistent at approximately 39 bits per second. This consistency suggests that languages have evolved to balance SR and ID, ensuring efficient information transmission. ​Source: Coupé et al. (2011)

AI systems are become more omnipresent, and will increasingly need to interface with each other in an autonomous manner. This will require the development of more specialized communication protocols, either by human design, or continuous evolution of such protocols—and probably various mixtures of both. We may then witness emergent properties akin to those seen in natural languages—where efficiency, redundancy, and adaptability evolve in response to environmental constraints. Studying these dynamics could not only enhance AI transparency but also provide deeper insights into the future architectures and fundamental principles governing both artificial and human language.

When AI Should Stick to Human Language

Despite the potential for optimized AI-to-AI protocols, there are contexts where retaining human-readable communication is crucial. Fields involving direct human interaction—such as healthcare diagnostics, customer support, education, legal systems, and collaborative scientific research—necessitate transparency and interpretability. However, it is important to recognize that even communication in human languages can become opaque due to technical jargon and domain-specific shorthand, complicating external oversight.​

AI can similarly embed meaning through techniques analogous to human code-switching, leveraging the idea behind the Sapir-Whorf hypothesis (Whorf, 1956), whereby language influences cognitive structure. AI will naturally gravitate toward protocols optimized for their contexts, effectively speaking specialized “languages.” In some cases, this is explicitly cryptographic—making messages unreadable without specific decryption keys, even if the underlying language is known (Diffie & Hellman, 1976). AI systems could also employ sophisticated steganographic techniques, embedding subtle messages within ordinary-looking data, or leverage adversarial code obfuscation and data perturbations familiar from computer security research (Fridrich, 2009; Goodfellow et al., 2014). These practices reflect optimization and security measures rather than sinister intent.​

Photo by Yan Krukau on Pexels.com

Gibberlink, Under the Hood

Gibberlink operates by detecting when two AI agents recognize each other as artificial intelligences. Upon recognition, the agents transition from standard human speech to a faster data-over-audio format called ggwave. The modulation approach employed is Frequency-Shift Keying (FSK), specifically a multi-frequency variant. Data is split into 4-bit segments, each transmitted simultaneously via multiple audio tones in a predefined frequency range (either ultrasonic or audible, depending on the protocol settings). These audio signals cover a 4.5kHz frequency spectrum divided into 96 equally spaced frequencies, utilizing Reed-Solomon error correction for data reliability. Received audio data is decoded using Fourier transforms to reconstruct the original binary information.​

Although conceptually elegant, this approach remains relatively basic compared to established methods in modern telecommunications. For example, advanced modulation schemes such as Orthogonal Frequency-Division Multiplexing (OFDM), Spread Spectrum modulation, and channel-specific encoding techniques like Low-Density Parity-Check (LDPC) and Turbo Codes could dramatically enhance reliability, speed, and overall efficiency. Future AI-to-AI communication protocols will undoubtedly leverage these existing advancements, transcending the simplistic methods currently seen in demonstrations such as Gibberlink.

This is a short demonstration of ggwave in action. A console application, a GUI desktop program and a mobile app are communicating through sound using ggwave. Source code: https://github.com/ggerganov/ggwave Credit: Georgi Gerganov

New AI-Mediated Channels for Human Communication

Beyond internal AI-to-AI exchanges, artificial intelligence increasingly mediates human interactions across multiple domains. AI can augment human communication through real-time translation, summarization, and adaptive content filtering, shaping our social, professional, and personal interactions (Hovy & Spruit, 2016). This growing AI-human hybridization blurs traditional boundaries of agency, raising complex ethical and practical questions. It becomes unclear who authors a message, makes a decision, or takes an action—the human user, their technological partner, or a specific mixture of both. With authorship, of course, comes responsibility and accountability. Navigating this space is a thin rope to walk, as over-reliance on AI risks diminishing human autonomy, while restrictive policies may stifle innovation. Continuous research in this area is key. If approached thoughtfully, AI can serve as a cognitive prosthetic, enhancing communication while preserving user intent and accountability (Floridi & Cowls, 2019).

Thoughtfully managed, this AI-human collaboration will feel intuitive and natural. Rather than perceiving AI systems as external tools, users will gradually incorporate them into their cognitive landscape. Consider the pianist analogy: When an experienced musician plays, they no longer consciously manage each muscle movement or keystroke. Instead, their cognitive attention focuses on expressing emotions, interpreting musical structures, and engaging creatively. Similarly, as AI interfaces mature, human users will interact fluidly and intuitively, without conscious translation or micromanagement, elevating cognition and decision-making to new creative heights.

Ethical issues and how to address them were addressed by our two panelist speakers, Dr. Pattie Maes (MIT Media Lab) and Dr. Daniel Helman (Winkle Institute), at the final session of the New Human Interfaces Hackathon, part of Cross Labs’ annual workshop 2025.

What Would Linguistic Integration Between Humans and AI Entail?

Future AI-human cognitive integration may follow linguistic pathways familiar from human communication studies. Humans frequently switch between languages (code-switching), blend languages into creoles, or evolve entirely new hybrid linguistic structures. AI-human interaction could similarly generate new languages or hybrid protocols, evolving dynamically based on situational needs, cognitive ease, and efficiency.

Ultimately, Gibberlink offers a useful but modest illustration of a much broader trend: artificial intelligence will naturally evolve optimized communication strategies tailored to specific contexts and constraints. Rather than generating paranoia over secrecy or loss of control, our focus should shift toward thoughtfully managing the integration of AI into our cognitive and communicative processes. If handled carefully, AI can serve as a seamless cognitive extension—amplifying human creativity, enhancing our natural communication capabilities, and enriching human experience far beyond current limits.

Gibberlink’s clever demonstration underscores that AI optimization of communication protocols is inevitable and inherently beneficial, not a sinister threat. The pressing issue is not AI secretly communicating; rather, it’s about thoughtfully integrating AI as an intuitive cognitive extension, allowing humans and machines to communicate and collaborate seamlessly. The future isn’t about AI concealing messages from us—it’s about AI enabling richer, more meaningful communication and deeper cognitive connections.


References

  • Coupé, C., Oh, Y. M., Dediu, D., & Pellegrino, F. (2019). Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche. Science Advances, 5(9), eaaw2594. https://doi.org/10.1126/sciadv.aaw2594
  • Cowls, J., King, T., Taddeo, M., & Floridi, L. (2019). Designing AI for social good: Seven essential factors. Available at SSRN 3388669.
  • Daras, G., & Dimakis, A. G. (2022). Discovering the hidden vocabulary of dalle-2. arXiv preprint arXiv:2206.00169.
  • Diffie, W., & Hellman, M. (1976). New directions in cryptography. IEEE Transactions on Information Theory, 22(6), 644-654.
  • Egri-Nagy, A., & Törmänen, A. (2020). The game is not over yet—go in the post-alphago era. Philosophies, 5(4), 37.
  • Fridrich, J. (2009). Steganography in digital media: Principles, algorithms, and applications. Cambridge University Press.
  • Gallager, R. G. (1962). Low-density parity-check codes. IRE Transactions on Information Theory, 8(1), 21-28.
  • Goodfellow, I., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
  • Leavitt, J. H. (2010). Linguistic relativities: Language diversity and modern thought. Cambridge University Press. https://doi.org/10.1017/CBO9780511992681
  • LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
  • Lewis, M., Yarats, D., Dauphin, Y. N., Parikh, D., & Batra, D. (2017). Deal or no deal? End-to-end learning for negotiation dialogues. arXiv:1706.05125.
  • Millière, R. (2023). Adversarial attacks on image generation with made-up words: Macaronic prompting and the emergence of DALL·E 2’s hidden vocabulary.
  • Ramesh, A. et al. (2021). Zero-shot text-to-image generation. arXiv:2102.12092.
  • Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379-423.
  • Silver, D. et al. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354-359.
  • Whorf, B. L. (1956). Language, thought, and reality. MIT Press.
  • Zipf, G. K. (1949). Human behavior and the principle of least effort: An introduction to human ecology. Addison-Wesley.​

DOI: http://doi.org/10.54854/ow2025.03

Limited AGI: The Hidden Constraints of Intelligence at Scale

In his recent blog post The Intelligence Age, a few days ago, Sam Altman has expressed confidence in the power of neural networks and their potential to achieve artificial general intelligence (AGI—some strong form of AI reaching a median human level of intelligence and efficiency for general tasks) given enough compute. He sums it up in 15 words: “deep learning worked, got predictably better with scale, and we dedicated increasing resources to it.” With sufficient computational power and resources, he claims, humanity should reach superintelligence within “a few thousand days (!)”

AI will, according to Altman, keep improving with scale, and this progress could lead to remarkable advances for human life, including AI assistants performing increasingly complex tasks, improving healthcare, and accelerating scientific discoveries. Of course, achieving AGI will require us to address major challenges along the way, particularly in terms of energy resources and their management to avoid inequality and conflict over AI’s use. Once all challenges are overcome, one would hope to see a future where technology unlocks limitless possibilities—fixing climate change, colonizing space, and achieving scientific breakthroughs that are unimaginable today. While this sounds compelling, one must keep in mind how the concept of AGI and its application remains vague and problematic. Intelligence, much like compute, is inherently diverse and comes with a set of constraints, biases, and hidden costs.

Historically, breakthroughs in computing and AI have been tied to specific tasks, even when they seemed more general. For example, even something as powerful as a Turing machine, capable of computing anything theoretically, still has its practical limitations. Different physical substrates, like GPUs or specialized chips, allow for faster or more efficient computation in specific tasks, such as neural networks or large language models (LLMs). These substrates demonstrate that each form of AI is bound by its physical architecture, making some tasks easier or faster to compute.

Beyond computing, this concept can be better understood in its extension to biological systems. For instance, the human brain is highly specialized for certain types of processing, like pattern recognition and language comprehension, but it is not well-suited for tasks that require high-speed arithmetic or complex simulations, in which computers excel. Reversely, biological neurons, in spite of operating much slower than their digital counterparts, achieve remarkable feats in energy efficiency and adaptability through parallel processing and evolutionary optimization. Perhaps quantum computers make for an even stronger example: while they promise enormous speedups for specific tasks like factoring large numbers or simulating molecular interactions, the idea of them being universally faster than classical computers is absolutely false. Additionally, they will also require specialized algorithms to fully leverage their potential, which may require another few decades to develop.

These examples highlight how both technological and biological forms of intelligence are fundamentally shaped by their physical substrates, each excelling in certain areas while remaining constrained in others. Whether it’s a neural network trained on GPUs or a biological brain evolved over millions of years, the underlying architecture plays a key role in determining which tasks can be efficiently solved and which remain computationally expensive or intractable.is bound by its physical architecture, making some tasks easier or faster to compute.

As we look toward the potential realization of an AGI, whatever this may formally mean—gesturing vaguely at some virtual omniscient robot overlord doing my taxes—it’s important to recognize that it will likely still be achieved in a “narrow” sense—constrained by these computational limits. Additionally, AGI, even when realized, will not represent the most efficient or intelligent form of computation; it is expected to reach only a median human level of efficiency and intelligence. While it might display general properties, it will always operate within the bounds of the physical and computational layers imposed on it. Each layer, as in the OSI picture of networking, will add further constraints, limiting the scope of the AI’s capabilities. Ultimately, the quest for AGI is not about breaking free from these constraints but finding the path of least resistance to the most efficient form of intelligent computation within these limits.

While I see Altman’s optimism about scaling deep learning as valid, one should realize that the implementation of AGI will still be shaped by physical and computational constraints. The future of AI will likely reflect these limits, functioning in a highly efficient but bounded framework. There is more to it. As Stanford computer scientist Fei-Fei Li advocates for it, embodiment, “Large World Models” and “Spatial Intelligence” are probably crucial for the next steps in human technology and may remain unresolved by a soft AGI as envisioned by Altman. Perhaps the field of artificial life too may offer tools for a more balanced and diverse approach to AGI, by incorporating the critical concepts of open-endedness, polycomputing, hybrid and unconventional substrates, precariousness, mutually beneficial interactions between many organisms and their environments, as well as the self-sustaining sets of processes defining life itself. This holistic view could enrich our understanding of intelligence, extending beyond the purely computational and human-based to include the richness of embodied and emergent intelligence as it could be.

References

The Intelligence Age, Sam Altman’s blog post https://ia.samaltman.com/
World Labs,
Fei-Fei Li’s 3D AI startup https://www.worldlabs.ai/
International Society for Artificial Life –
https://alife.org/

DOI: https://doi.org/10.54854/ow2024.02