Refocusing the Journey: New Insights and Directions
Initial Goals and Approaches
Eight months ago, I began developing Superposition: Quantum Cities for the Integrated Media and Design program at NYU Tandon School of Engineering. The initial goal was straightforward: create an environment where students could explore alternate versions of the Brooklyn Navy Yard using generative AI. The vision was to allow students to see how history, place, and people could be reimagined while maintaining coherence and consistency across different versions.
I started with two supporting projects: Patterns of Consistent Environment Evolution and Coherent Story Growth. These were initially seen as side components—tools that would aid in ensuring both architectural coherence and narrative consistency.
Lessons Learned at the Coal Face
As the project progressed, I quickly realized that ensuring coherence—that is, making sure every character, event, and interaction logically connected to the shared foundational reality—was far more complex than anticipated. It wasn’t enough to create alternate versions of the Brooklyn Navy Yard; these versions had to be deeply believable, with each detail adding to the richness of the world rather than introducing inconsistencies.
One of the key lessons was that I had spent too much time on automatically generating the graph schema instead of preparing the source data. I initially followed an enterprise approach designed for massive corpora—an approach well-suited to the great folks creating tools for large-scale, enterprise use-cases. However, my work is more niche, with relatively small datasets that required a more hands-on approach. This led me to shift my focus toward enriching the source documents with more metadata and supporting data to ensure the input was better structured and more informative.
This shift also taught me about the importance of retrieval, verification, and generation. My direction forward began focusing on developing a multi-agent approach with agents that have long-term memory, each capable of handling different functions such as retrieval, verification, and generation effectively.
Realizations and New Concepts
The relentless pace of innovation in areas like agentic frameworks, GraphRAG, agent tool use, agent memory, and structured outputs from LLMs reshaped my understanding of what was needed to move forward. Without the time I spent diving into these aspects, I would not have fully appreciated their critical roles in advancing this work.
In Patterns of Consistent Environment Evolution, I realized that I needed 253 agents, each representing one of Alexander’s patterns. These agents would need to hold knowledge about their specific pattern and also interact with other agents to ensure the generated content respects architectural coherence. Coherent Story Growth required a similar approach—generative agents creating new narrative elements and consistency-checking agents ensuring that these additions fit seamlessly into the existing story world.
Moreover, I began integrating multi-modal embeddings (text, image, audio, video) as attributes of nodes and relationships within the knowledge graphs. This use of GraphRAG, extended into a multimodal form, helped create a more interconnected and dynamic representation of the environments and narratives, and allows for including more supporting data that would not be feasible in text only.
One of the most profound realizations has been a new way of looking at Large Language Models (LLMs). Many current efforts try to make LLMs handle tasks that explicit code can easily solve, such as counting letters in a word. But LLMs are better suited as what I call “Fuzzy Functions”—functions where I specify input structure, output, and intent, but let the LLM handle the implementation details. The key here is intent—harnessing LLMs to understand user intentions and turn vague ideas into coherent actions. This is where I see the future of human-computer interaction evolving, with LLMs serving as the bridge between structured programming and human creativity.
Next Steps
Moving forward, my focus will be on three key areas:
- Enhancing Metadata in Source Data: Adding richer metadata directly to foundational documents to enrich the initial graph and support more meaningful retrieval and generation.
- Agent-Based Consistency and Generation: Developing agents capable of generating new content, verifying consistency, and using long-term memory to ensure coherent evolution of both architectural and narrative elements.
- Leveraging Multi-Modality for Richer Worlds: Emphasizing multi-modal embeddings to create immersive worlds—integrating visuals, sounds, and other media into the Quantum Cities simulation. This multi-modal approach allows for richer inputs to the core system, enabling assessment of visual content, supporting the creation of more immersive, multi-modal outputs, and enhancing the overall experience much like what we used to call MULTIMEDIA.
These eight months have profoundly deepened my commitment to the Quantum Cities goal, changing my understanding of what it takes to build speculative story worlds that are rich, coherent, and educational. The Superposition: Quantum Cities project remains my guiding star, but it has taken a long time in a dynamic environment to understand how my approach needs to change.
The reorientation towards agents, metadata-rich source data, multi-modal data integration, and LLMs as Fuzzy Functions isn’t a diversion from the original vision; it’s an evolution of that vision, now clearer and more ambitious than ever.
Stay tuned as we continue refining these tools, experimenting with agents that have memory, and pushing the boundaries of what’s possible in immersive, educational storytelling.