41 Comments
User's avatar
Jos P's avatar

Wow this is awesome. I would love to beta test.

Karen Spinner's avatar

Thank you! 🙏 I’ll DM a link when it’s ready.

Cris Cafiero's avatar

Genuinely fascinating. Thanks for building out your process and thinking framework for us to read!

Karen Spinner's avatar

You’re very welcome! 🤗

Jenny Ouyang's avatar

Karen, what did we say? You’re already exploring beyond StackDigest, and in the most unexpected way! Like Sam said, this could be super useful for researchers and investors. It’s honestly striking how expansive your imagination is!

Karen Spinner's avatar

Thank you! 🤗 I learned a lot about ML when I set up the analytics in StackDigest…I just had to recycle/reuse! 😁

Dr Sam Illingworth's avatar

Karen, this is awesome. So great to see the lessons you learnt and shared with StackDigest being applied here. Also super interesting to see the emerging trends here, most of which are quite technical rather than e.g. social sciences or education. What would be really cool to track is how many of these preprints made it into full peer-reviewed papers as well... 🙏

Karen Spinner's avatar

Thank you! ❤️ Agree that looking at preprint success rates would be interesting. 🤔 I’m also looking at other data sources…should this evolve beyond the prototype stage, I’d love to let users find trends in less technical studies as well.

Dr Sam Illingworth's avatar

I think this whole project will be so useful to many audiences. Academics for sure, but also investors, and policymakers looking to get ahead of the curve, as the preprints are what will likely be in public use 6-18 months from now...

Karen Spinner's avatar

Will be launching a pilot newsletter featuring insights surfaced by the tool to see who responds!🤞

Dr Sam Illingworth's avatar

Well you already have one guaranteed sub! 💪

Karen Spinner's avatar

Woo hoo! 🎉

Jenny Ouyang's avatar

This is an awesome idea Sam!

Katie Barnes's avatar

This is so cool! Thanks for sharing your journey and thought process.

Karen Spinner's avatar

You’re very welcome! 🤗

Caitlin Marie Connors's avatar

Would love to be a beta tester! Thank you so much for the transparent and thorough reporting of the build.

Karen Spinner's avatar

Just saw this, will DM you the access code! 🙏

Caitlin Marie Connors's avatar

Thanks muchly Karen.

Joshua Davis 🤝's avatar

Wow super detailed and insightful. Can't wait to see where this goes. Would love to test the beta 👋

Karen Spinner's avatar

Awesome! I’ll tag when it’s ready. 🙏

Suhrab Khan's avatar

Fascinating approach! Leveraging AI to identify meaningful patterns in research could reshape how we track emerging trends across domains.

Karen Spinner's avatar

Thank you! Looking forward to seeing where this goes!

Daria Cupareanu's avatar

100% I want to be a beta tester. I actually started working on an n8n automation to do exactly this since a lot of my work relied on research, but didn't get to finish it. So yeeees, please count me in. Awesome stuff, Karen!

Karen Spinner's avatar

Amazing! I will add you to the list! 🙏

Finn Tropy's avatar

Awesome project, Karen!

Reminds me of Andrej Karpathy's project from 2021 - see https://github.com/karpathy/arxiv-sanity-preserver. He was using scikit-learn (TF-IDF vectorizer and SVM training), but your design is using OpenAI APIs for embeddings, which is what I would use as well.

In my previous job before retirement, I built a similar system that indexed corporate PDF documents from Confluence and JIRA ticket data, utilizing Google's Vertex AI APIs in a manner similar to what you are doing here.

I can see that a lot of the lessons you learned from StackDigest are directly applicable to this new project.

Congrats again - awesome project and great status report!

Karen Spinner's avatar

Thank you for the kind words! 🤗 I tried scikit-learn while I was building semantic search for Stack Digest, and its huge dependencies broke my production environment. 😆

I bet your project applying ML to corporate docs turned up some interesting insights…duplication of effort was a common theme when I worked with enterprise clients on their content plans!

So far the new project is behaving in my local environment…deploying to production next week! 🤞

Mark S. Carroll's avatar

This is seriously impressive work. Pulling insights out of 11,000 abstracts is the kind of obsessive curiosity that actually moves the field forward. The healthcare example was the moment it clicked for me.

This is not a toy. It is a research amplifier that solves a real discovery bottleneck for anyone who writes, builds, or invests in AI.

I hope you keep going with this because the demand for clear trend mapping is only getting louder. Looking forward to seeing Future Scan take shape.

Karen Spinner's avatar

Appreciate the kind words! 🤗 Will definitely be sharing my progress!

Mark S. Carroll's avatar

Looking forward to it!

Joe Mills's avatar

Here goes Karen! You go, girl!

Karen Spinner's avatar

Thank you! 🙏

Karen Brasch 🚁's avatar

This is fantastic and quite timely. Doing similar research for different topic the less efficient way and am going to try this out. Will report back.

Karen Spinner's avatar

Awesome! I’ll tag you when the prototype is online.

Chris Tottman's avatar

You're brilliant! Truly

jaycee's avatar

IHS has something similar called goldfire.

Karen Spinner's avatar

I’ll take a peek!

jaycee's avatar

Apparently it's no longer owned by IHS. https://accuristech.com/solutions/

Karo (Product with Attitude)'s avatar

Haha, it took you a week to build something new, you're such a powerhouse! Simply incredible.

Karen Spinner's avatar

I stumbled onto a new idea pretty fast! 🤣