The End of General Compute

General compute is reaching a limit because deployment cares about details that a datacenter cannot see. A cluster does not know whether it is helping a doctor, a designer, or a child learning mathematics. It sees tensors, queues, memory pressure, bandwidth, heat. That abstraction was useful. It gave AI research a general substrate: GPUs, CUDA, cloud schedulers, large clusters, and a software culture that rewarded flexibility while the science was still unstable.

Deployment changes the scale of description. When a model enters a hospital, a factory, a laboratory, or a school, the useful unit is no longer raw compute. It is a loop of perception, memory, action, liability, and human judgment.

I call the next layer domain-proximal compute. Domain-proximal says something operational: computation should move toward the constraints of the domain it serves. Near the data. Near the instruments. Near the people who can tell whether an answer is merely plausible or actually useful.

A datacenter monolith dispersing into bounded domain-proximal cognitive units for local networks, servers, laboratories, robotic action, biological signals, governed models, and institutions.

From monolithic compute to bounded cognitive units: local networks, instruments, robots, institutions, biological signals, and governed memory. Generated illustration created for olafwitkowski.com.

The chip industry is already giving a hardware version of the same argument. Google’s first TPU was a custom ASIC for neural-network inference: a large matrix-multiply unit, software-managed memory, deterministic execution, and large gains against contemporary CPU and GPU baselines [1]. Google has now separated eighth-generation TPUs into training and inference chips [2]. NVIDIA, whose advantage has long been the programmable breadth of CUDA, also presented Rubin CPX as a CUDA GPU built for massive-context inference [3]. Even the generalist stack is learning to specialize.

The end of general compute does not mean the disappearance of GPUs, clouds, or general-purpose systems. It means generality becomes the bootstrapping condition, not the final unit of organization. The mature system breaks into parts.

Hardware competition is the visible layer. Cognitive granularity is the deeper one. Intelligence work breaks into many kinds of local computation: retrieve this patient history under these rules; check this proof against this formal library; predict this defect from this sensor stream; summarize this meeting without leaking this document; train this technician without erasing the craft that made the training data valuable.

General compute can run these jobs. It will often be the right place to start. But it should not be the ontology. The long-term architecture of artificial collective intelligence will be made of bounded cognitive services that communicate, defer, remember, and refuse.

Artificial Life helps sharpen this intuition. ALife has long studied life-like organization in media beyond carbon: software worlds, robots, chemical systems, and physical materials. Recent work on emergent computation in physical substrates makes the point from another angle: computation can be discovered, shaped, and used in matter, rather than only imposed on it from above [4]. Other recent work explores self-replicating programs arising on simple computational substrates [5]. For ACI, the lesson is substrate and organization. Boundaries matter. Energy, repair, reproduction of structure, and local adaptation matter.

A serious ACI layer should treat computation as a living design problem. A cognitive service needs a membrane: who can call it, what it can remember, what it can change, and when it must stop. It needs metabolism: how much energy, money, data, and attention it consumes. It needs signaling: how it reports uncertainty, provenance, and conflict to neighboring services and to humans. It needs death: a way to retire a model, revoke a memory, or cut a connection that has become harmful.

A cognitive organ diagram with four surrounding governance properties: membrane, metabolism, signaling, and retirement.

A bounded cognitive unit needs membrane, metabolism, signaling, and retirement. Without these, compute becomes hard to govern once it enters real domains. Original SVG illustration created for olafwitkowski.com; no third-party image assets used.

The older lineage of augmentation fits naturally here. Engelbart treated computers as a way to increase the capacity of humans to approach complex problems, rather than remove humans from the problem [6]. Hutchins showed that real cognition often lives across crews, instruments, marks on paper, habits, and shared procedures [7]. ACI belongs in that lineage. It should make distributed cognition more capable without enclosing it inside one opaque provider, one model, or one metric.

Domain-proximal compute is a practical route toward that. In a lab, the model should know the instrument protocol and the uncertainty of the measurement. In a factory, it should learn why an operator slows a line before the alarm fires. In a city, it should respect jurisdiction, maintenance schedules, public accountability, and the right to contest a decision. These constraints are part of computation.

The central design question becomes: what is the right size of a cognitive unit?

Too large, and it becomes a black box that absorbs everything. Too small, and it cannot carry context. The interesting unit may be something like an organ: specialized enough to do real work, bounded enough to govern, connected enough to participate in a larger body.

ACI names this design problem. Generalist compute gave us foundation models. Domain-proximal compute can give us cognitive organs. Some will sit in clouds, some near instruments, some inside local institutions, some on devices, some in experimental substrates we do not yet know how to name. The goal is to place intelligence where it can act, be inspected, and remain answerable to the people whose worlds it enters.

The next test for AI is hard and concrete: can a computational system inherit skill without erasing the conditions that produced it?

References

[1] Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., et al. (2017). “In-Datacenter Performance Analysis of a Tensor Processing Unit.” Proceedings of ISCA 2017. arXiv:1704.04760.

[2] Google Cloud announced eighth-generation TPUs split into training and inference variants in April 2026. See “Google Cloud announces eighth-generation TPUs, boasting AI training and inference leaps,” ITPro, 2026.

[3] NVIDIA. (2025). “NVIDIA Unveils Rubin CPX: A New Class of GPU Designed for Massive-Context Inference.” NVIDIA Newsroom, September 9, 2025.

[4] Heiney, K., Tufte, G., & Nichele, S. (2020). “On Artificial Life and Emergent Computation in Physical Substrates.” arXiv:2009.04518.

[5] Agüera y Arcas, B., Alakuijala, J., Evans, J., Laurie, B., Mordvintsev, A., Niklasson, E., Randazzo, E., & Versari, L. (2024). “Computational Life: How Well-formed, Self-replicating Programs Emerge from Simple Interaction.” arXiv:2406.19108.

[6] Engelbart, D. C. (1962). Augmenting Human Intellect: A Conceptual Framework. Stanford Research Institute.

[7] Hutchins, E. (1995). Cognition in the Wild. MIT Press.

The End of General Compute

References

Continue reading

A Secure Operating System for Collective Intelligence

AI-to-AI Communication: Unpacking Gibberlink, Secrecy, and New AI Communication Channels

The Innovation Algorithm: DeepSeek, Japan, and How Constraints Drive AI Breakthroughs