Thank you for sharing your learning frim that experiments! That is really useful first hand experience and helps to navigate around certain pitfalls right from the beginning. It's said at the beginning of the year whis will be the year of AI-Agents. Until now most people only write about agents or call some workflow automation "agentic". You did instead try to build a real thing and that has value! ππΌββοΈ
Thank you! I think the line between agentic AI and plain old automation is very blurry right nowβ¦and Iβm very skeptical that itβs ready to be widely deployed or left unsupervised.
Researchers at Carnegie Mellon actually built an entire software company staffed with AI agents and they struggled to complete basic office tasks: https://www.cs.cmu.edu/news/2025/agent-company
Actually, even as your experiment was somehow brittle in its results it was encouraging for me. I think it's a matter of the right combination of task, scaffolding, context engineering and memory integration to build working agents. Not shout if it's worth the effort and runtime cost (APIs) but I found it inspiring as I'm thinking about it for a while... πππΌ
I definitely encourage you to build your own agents! It was a fun experiment, and the API costs were pretty reasonable, all things considered. Adding a database and memory weighting would likely have improved my results. π
β€ "They're more like enthusiastic interns who sometimes produce brilliant work and sometimes go completely off-script."
That's because they have no actual understanding of anything. It is amazing to see how well the current machine learning technology can simulate intelligence without having any real comprehension or understanding of even the most basic things. This also says something about how the human brain works: The ability to speak well is no evidence of comprehension, reasoning, logic, rationality, etc.
In other words, some X% of humans and the current "AI" technology are amazingly adept at faking or simulating intelligence without having any real intelligence at all.
IβI built my AI team, and Iβm proud of them. They performed great. All depends on how you ask them, how you want things done. For some tasks, one is more than enough. I invoke all of them when the task is complexβsometimes because I visualize it, other times because they suggest it when I ask.β
Why did you code your own multi-agent framework instead of using a commercially available one? (Asking as someone who is currently coding his own multi-agent framework.)
Interesting move, Karen. However, remember that AI is only as effective as the data and instructions we feed it. The key issue often lies in fine-tuning the intelligence, not scrapping it. Build, measure, learn, repeat.
Quick stat: 73% of AI projects stall due to lack of data strategy. It's not just tuning but teaching - feeding the AI precise, contextual data for informed decision-making. The secret sauce? A meticulous data regimen, not a mechanical dump.
This captures exactly my experience. Automation often backfires when we delegate tasks before fully clarifying our own understanding. Your chaotic yet insightful experiment highlights that AIβs real strength is augmenting human clarity and judgment, not automating processes we havenβt yet mastered ourselves. Thanks for sharing
You're very welcome! Agree that clarity has to come before automation. And AI's non-deterministic nature can be challenging if you need predictable outcomes. π
Great post, Karen. Thanks for being so honest about how this experiment actually went.
This really hit home for me. I tried to do something similar for a different task: writing medical discharge letters.
I had the exact same problems you did! The "infinite loop" and the "rogue writer" feel very familiar. It just became a "black box" where I couldn't see what was happening or step in to make corrections. For medical reports, the information has to be 100% accurate, so that system just didn't work.
In the end, I did the same thing you did and broke up the automated team. Now I just pass the text from one agent to the next manually. It's slower, but at least I can control the output and make sure it's correct.
Your article really proves that for important tasks, these agents can't replace us (yet), but they can help us if we supervise them. Really good points.
Thanks for sharing your results! Because AI isn't deterministic, it always gives a slightly different answer every time...which is certainly a problem when you need perfect accuracy.
Even for lower-stakes work like ad copy or company blog posts, I always carefully fact-check and edit any AI outputs I may work with.
Karen. Bravo. Giving them a behavior problem. Now that was brilliant. I have an idea. What if you give each team member a different profile so their strengths and weakness overlap. Like a video game. Build a bunch of personas with different personality quirks and throw them in a meet grinder. See what works. You just might end up building a βteamβ simulator that helps actual teams. No matter what Iβll read it :)
The last set of prompts with the personality quirks kind of set them up to fail. π For better results, I think Iβll need to explicitly encourage collaboration when describing each role.
That makes sense to me. You want individuality, but also a strong group identity to create an environment where the whole becomes greater than the sum of the parts. Go team go β¦ no different than human teams really. π
It all sounds great in theory, but in reality, there's too much nuance for them to replace human interaction in so many situations.
Agreed! I donβt think creative workflows are the right place for agents right now!
Thank you for sharing your learning frim that experiments! That is really useful first hand experience and helps to navigate around certain pitfalls right from the beginning. It's said at the beginning of the year whis will be the year of AI-Agents. Until now most people only write about agents or call some workflow automation "agentic". You did instead try to build a real thing and that has value! ππΌββοΈ
Thank you! I think the line between agentic AI and plain old automation is very blurry right nowβ¦and Iβm very skeptical that itβs ready to be widely deployed or left unsupervised.
Researchers at Carnegie Mellon actually built an entire software company staffed with AI agents and they struggled to complete basic office tasks: https://www.cs.cmu.edu/news/2025/agent-company
Actually, even as your experiment was somehow brittle in its results it was encouraging for me. I think it's a matter of the right combination of task, scaffolding, context engineering and memory integration to build working agents. Not shout if it's worth the effort and runtime cost (APIs) but I found it inspiring as I'm thinking about it for a while... πππΌ
I definitely encourage you to build your own agents! It was a fun experiment, and the API costs were pretty reasonable, all things considered. Adding a database and memory weighting would likely have improved my results. π
β€ "They're more like enthusiastic interns who sometimes produce brilliant work and sometimes go completely off-script."
That's because they have no actual understanding of anything. It is amazing to see how well the current machine learning technology can simulate intelligence without having any real comprehension or understanding of even the most basic things. This also says something about how the human brain works: The ability to speak well is no evidence of comprehension, reasoning, logic, rationality, etc.
In other words, some X% of humans and the current "AI" technology are amazingly adept at faking or simulating intelligence without having any real intelligence at all.
That definitely explains the results of the Carnegie Mellon study in which agents at Agents Inc. basically wasted their time in meetings all day. π€£
Same as many humans do π€£
IβI built my AI team, and Iβm proud of them. They performed great. All depends on how you ask them, how you want things done. For some tasks, one is more than enough. I invoke all of them when the task is complexβsometimes because I visualize it, other times because they suggest it when I ask.β
Nice!
Why did you code your own multi-agent framework instead of using a commercially available one? (Asking as someone who is currently coding his own multi-agent framework.)
Very simpleβ¦I didnβt want to pay for one and also wanted to know exactly where and when decisions would be made. π
Interesting move, Karen. However, remember that AI is only as effective as the data and instructions we feed it. The key issue often lies in fine-tuning the intelligence, not scrapping it. Build, measure, learn, repeat.
Quick stat: 73% of AI projects stall due to lack of data strategy. It's not just tuning but teaching - feeding the AI precise, contextual data for informed decision-making. The secret sauce? A meticulous data regimen, not a mechanical dump.
π― We can never escape garbage in-garbage out!
That was very fun to read. I agents are still more like eager interns than autonomous employeesβuseful with guardrails, chaotic without
Yes, they are non-deterministicβ¦and not entirely predictable!
Great stuff. Thank you for sharing. I will do my iterations a bit smaller I guess :D :D
Youβre very welcome! Agents are definitely trickyβ¦and whether you need them at all depends on your use case!
This captures exactly my experience. Automation often backfires when we delegate tasks before fully clarifying our own understanding. Your chaotic yet insightful experiment highlights that AIβs real strength is augmenting human clarity and judgment, not automating processes we havenβt yet mastered ourselves. Thanks for sharing
You're very welcome! Agree that clarity has to come before automation. And AI's non-deterministic nature can be challenging if you need predictable outcomes. π
Great post, Karen. Thanks for being so honest about how this experiment actually went.
This really hit home for me. I tried to do something similar for a different task: writing medical discharge letters.
I had the exact same problems you did! The "infinite loop" and the "rogue writer" feel very familiar. It just became a "black box" where I couldn't see what was happening or step in to make corrections. For medical reports, the information has to be 100% accurate, so that system just didn't work.
In the end, I did the same thing you did and broke up the automated team. Now I just pass the text from one agent to the next manually. It's slower, but at least I can control the output and make sure it's correct.
Your article really proves that for important tasks, these agents can't replace us (yet), but they can help us if we supervise them. Really good points.
Thanks for sharing your results! Because AI isn't deterministic, it always gives a slightly different answer every time...which is certainly a problem when you need perfect accuracy.
Even for lower-stakes work like ad copy or company blog posts, I always carefully fact-check and edit any AI outputs I may work with.
Love it
Thank you for this. I want to try something similar but I didn't know how. Now I know where to start and what to avoid.
I left the code for one of more functional versions on GitHub: https://github.com/KarenSpinner/agents-of-chaos
Let me know how it goes! π
Karen. Bravo. Giving them a behavior problem. Now that was brilliant. I have an idea. What if you give each team member a different profile so their strengths and weakness overlap. Like a video game. Build a bunch of personas with different personality quirks and throw them in a meet grinder. See what works. You just might end up building a βteamβ simulator that helps actual teams. No matter what Iβll read it :)
Thatβs a great idea! Simulating dysfunctional teams has all kinds of possibilitiesβ¦π€£
Ha. Great title
Great read! Iβve not ventured into AI Agents that work together but Iβve a team of custom GPTs and they are also like a bunch of interns!
Interesting! I think the line between agents and automation is really blurryβ¦hopefully, your GPTs behave (unlike my agents) π
Your team seemed to turn adversarial, any thoughts on why this may have happened?
The last set of prompts with the personality quirks kind of set them up to fail. π For better results, I think Iβll need to explicitly encourage collaboration when describing each role.
That makes sense to me. You want individuality, but also a strong group identity to create an environment where the whole becomes greater than the sum of the parts. Go team go β¦ no different than human teams really. π