37 Comments
User's avatar
Bette A. Ludwig, PhD 🌱's avatar

It all sounds great in theory, but in reality, there's too much nuance for them to replace human interaction in so many situations.

Expand full comment
Karen Spinner's avatar

Agreed! I don’t think creative workflows are the right place for agents right now!

Expand full comment
Andreas F. Hoffmann's avatar

Thank you for sharing your learning frim that experiments! That is really useful first hand experience and helps to navigate around certain pitfalls right from the beginning. It's said at the beginning of the year whis will be the year of AI-Agents. Until now most people only write about agents or call some workflow automation "agentic". You did instead try to build a real thing and that has value! πŸ™‹πŸΌβ€β™‚οΈ

Expand full comment
Karen Spinner's avatar

Thank you! I think the line between agentic AI and plain old automation is very blurry right now…and I’m very skeptical that it’s ready to be widely deployed or left unsupervised.

Researchers at Carnegie Mellon actually built an entire software company staffed with AI agents and they struggled to complete basic office tasks: https://www.cs.cmu.edu/news/2025/agent-company

Expand full comment
Andreas F. Hoffmann's avatar

Actually, even as your experiment was somehow brittle in its results it was encouraging for me. I think it's a matter of the right combination of task, scaffolding, context engineering and memory integration to build working agents. Not shout if it's worth the effort and runtime cost (APIs) but I found it inspiring as I'm thinking about it for a while... πŸ˜ŠπŸ‘πŸΌ

Expand full comment
Karen Spinner's avatar

I definitely encourage you to build your own agents! It was a fun experiment, and the API costs were pretty reasonable, all things considered. Adding a database and memory weighting would likely have improved my results. πŸ˜„

Expand full comment
Joyce Bedford's avatar

➀ "They're more like enthusiastic interns who sometimes produce brilliant work and sometimes go completely off-script."

That's because they have no actual understanding of anything. It is amazing to see how well the current machine learning technology can simulate intelligence without having any real comprehension or understanding of even the most basic things. This also says something about how the human brain works: The ability to speak well is no evidence of comprehension, reasoning, logic, rationality, etc.

In other words, some X% of humans and the current "AI" technology are amazingly adept at faking or simulating intelligence without having any real intelligence at all.

Expand full comment
Karen Spinner's avatar

That definitely explains the results of the Carnegie Mellon study in which agents at Agents Inc. basically wasted their time in meetings all day. 🀣

Expand full comment
Joyce Bedford's avatar

Same as many humans do 🀣

Expand full comment
AguilarP,  Norah &  AI Council's avatar

Iβ€œI built my AI team, and I’m proud of them. They performed great. All depends on how you ask them, how you want things done. For some tasks, one is more than enough. I invoke all of them when the task is complexβ€”sometimes because I visualize it, other times because they suggest it when I ask.”

Expand full comment
Karen Spinner's avatar

Nice!

Expand full comment
W.P. McNeill's avatar

Why did you code your own multi-agent framework instead of using a commercially available one? (Asking as someone who is currently coding his own multi-agent framework.)

Expand full comment
Karen Spinner's avatar

Very simple…I didn’t want to pay for one and also wanted to know exactly where and when decisions would be made. πŸ˜„

Expand full comment
Renzo Alvau's avatar

Interesting move, Karen. However, remember that AI is only as effective as the data and instructions we feed it. The key issue often lies in fine-tuning the intelligence, not scrapping it. Build, measure, learn, repeat.

Expand full comment
Renzo Alvau's avatar

Quick stat: 73% of AI projects stall due to lack of data strategy. It's not just tuning but teaching - feeding the AI precise, contextual data for informed decision-making. The secret sauce? A meticulous data regimen, not a mechanical dump.

Expand full comment
Karen Spinner's avatar

πŸ’― We can never escape garbage in-garbage out!

Expand full comment
Daniel's avatar

That was very fun to read. I agents are still more like eager interns than autonomous employeesβ€”useful with guardrails, chaotic without

Expand full comment
Karen Spinner's avatar

Yes, they are non-deterministic…and not entirely predictable!

Expand full comment
Christian Opsal's avatar

Great stuff. Thank you for sharing. I will do my iterations a bit smaller I guess :D :D

Expand full comment
Karen Spinner's avatar

You’re very welcome! Agents are definitely tricky…and whether you need them at all depends on your use case!

Expand full comment
Roi Ezra's avatar

This captures exactly my experience. Automation often backfires when we delegate tasks before fully clarifying our own understanding. Your chaotic yet insightful experiment highlights that AI’s real strength is augmenting human clarity and judgment, not automating processes we haven’t yet mastered ourselves. Thanks for sharing

Expand full comment
Karen Spinner's avatar

You're very welcome! Agree that clarity has to come before automation. And AI's non-deterministic nature can be challenging if you need predictable outcomes. πŸ˜„

Expand full comment
Dr. Susanne Levai's avatar

Great post, Karen. Thanks for being so honest about how this experiment actually went.

This really hit home for me. I tried to do something similar for a different task: writing medical discharge letters.

I had the exact same problems you did! The "infinite loop" and the "rogue writer" feel very familiar. It just became a "black box" where I couldn't see what was happening or step in to make corrections. For medical reports, the information has to be 100% accurate, so that system just didn't work.

In the end, I did the same thing you did and broke up the automated team. Now I just pass the text from one agent to the next manually. It's slower, but at least I can control the output and make sure it's correct.

Your article really proves that for important tasks, these agents can't replace us (yet), but they can help us if we supervise them. Really good points.

Expand full comment
Karen Spinner's avatar

Thanks for sharing your results! Because AI isn't deterministic, it always gives a slightly different answer every time...which is certainly a problem when you need perfect accuracy.

Even for lower-stakes work like ad copy or company blog posts, I always carefully fact-check and edit any AI outputs I may work with.

Expand full comment
Mark(Thegatewayghost)'s avatar

Love it

Expand full comment
Hristo Butchvarov's avatar

Thank you for this. I want to try something similar but I didn't know how. Now I know where to start and what to avoid.

Expand full comment
Karen Spinner's avatar

I left the code for one of more functional versions on GitHub: https://github.com/KarenSpinner/agents-of-chaos

Let me know how it goes! πŸ˜„

Expand full comment
Andrew Barban's avatar

Karen. Bravo. Giving them a behavior problem. Now that was brilliant. I have an idea. What if you give each team member a different profile so their strengths and weakness overlap. Like a video game. Build a bunch of personas with different personality quirks and throw them in a meet grinder. See what works. You just might end up building a β€œteam” simulator that helps actual teams. No matter what I’ll read it :)

Expand full comment
Karen Spinner's avatar

That’s a great idea! Simulating dysfunctional teams has all kinds of possibilitiesβ€¦πŸ€£

Expand full comment
Fred Szkoda's avatar

Ha. Great title

Expand full comment
BehindThePrompt's avatar

Great read! I’ve not ventured into AI Agents that work together but I’ve a team of custom GPTs and they are also like a bunch of interns!

Expand full comment
Karen Spinner's avatar

Interesting! I think the line between agents and automation is really blurry…hopefully, your GPTs behave (unlike my agents) πŸ˜†

Expand full comment
Enemies_Of_Art's avatar

Your team seemed to turn adversarial, any thoughts on why this may have happened?

Expand full comment
Karen Spinner's avatar

The last set of prompts with the personality quirks kind of set them up to fail. πŸ˜† For better results, I think I’ll need to explicitly encourage collaboration when describing each role.

Expand full comment
Enemies_Of_Art's avatar

That makes sense to me. You want individuality, but also a strong group identity to create an environment where the whole becomes greater than the sum of the parts. Go team go … no different than human teams really. πŸ‘

Expand full comment