Karen, what did we say? You’re already exploring beyond StackDigest, and in the most unexpected way! Like Sam said, this could be super useful for researchers and investors. It’s honestly striking how expansive your imagination is!
Karen, this is awesome. So great to see the lessons you learnt and shared with StackDigest being applied here. Also super interesting to see the emerging trends here, most of which are quite technical rather than e.g. social sciences or education. What would be really cool to track is how many of these preprints made it into full peer-reviewed papers as well... 🙏
Thank you! ❤️ Agree that looking at preprint success rates would be interesting. 🤔 I’m also looking at other data sources…should this evolve beyond the prototype stage, I’d love to let users find trends in less technical studies as well.
I think this whole project will be so useful to many audiences. Academics for sure, but also investors, and policymakers looking to get ahead of the curve, as the preprints are what will likely be in public use 6-18 months from now...
100% I want to be a beta tester. I actually started working on an n8n automation to do exactly this since a lot of my work relied on research, but didn't get to finish it. So yeeees, please count me in. Awesome stuff, Karen!
Reminds me of Andrej Karpathy's project from 2021 - see https://github.com/karpathy/arxiv-sanity-preserver. He was using scikit-learn (TF-IDF vectorizer and SVM training), but your design is using OpenAI APIs for embeddings, which is what I would use as well.
In my previous job before retirement, I built a similar system that indexed corporate PDF documents from Confluence and JIRA ticket data, utilizing Google's Vertex AI APIs in a manner similar to what you are doing here.
I can see that a lot of the lessons you learned from StackDigest are directly applicable to this new project.
Congrats again - awesome project and great status report!
Thank you for the kind words! 🤗 I tried scikit-learn while I was building semantic search for Stack Digest, and its huge dependencies broke my production environment. 😆
I bet your project applying ML to corporate docs turned up some interesting insights…duplication of effort was a common theme when I worked with enterprise clients on their content plans!
So far the new project is behaving in my local environment…deploying to production next week! 🤞
This is seriously impressive work. Pulling insights out of 11,000 abstracts is the kind of obsessive curiosity that actually moves the field forward. The healthcare example was the moment it clicked for me.
This is not a toy. It is a research amplifier that solves a real discovery bottleneck for anyone who writes, builds, or invests in AI.
I hope you keep going with this because the demand for clear trend mapping is only getting louder. Looking forward to seeing Future Scan take shape.
Wow this is awesome. I would love to beta test.
Thank you! 🙏 I’ll DM a link when it’s ready.
Genuinely fascinating. Thanks for building out your process and thinking framework for us to read!
You’re very welcome! 🤗
Karen, what did we say? You’re already exploring beyond StackDigest, and in the most unexpected way! Like Sam said, this could be super useful for researchers and investors. It’s honestly striking how expansive your imagination is!
Thank you! 🤗 I learned a lot about ML when I set up the analytics in StackDigest…I just had to recycle/reuse! 😁
Karen, this is awesome. So great to see the lessons you learnt and shared with StackDigest being applied here. Also super interesting to see the emerging trends here, most of which are quite technical rather than e.g. social sciences or education. What would be really cool to track is how many of these preprints made it into full peer-reviewed papers as well... 🙏
Thank you! ❤️ Agree that looking at preprint success rates would be interesting. 🤔 I’m also looking at other data sources…should this evolve beyond the prototype stage, I’d love to let users find trends in less technical studies as well.
I think this whole project will be so useful to many audiences. Academics for sure, but also investors, and policymakers looking to get ahead of the curve, as the preprints are what will likely be in public use 6-18 months from now...
Will be launching a pilot newsletter featuring insights surfaced by the tool to see who responds!🤞
Well you already have one guaranteed sub! 💪
Woo hoo! 🎉
This is an awesome idea Sam!
This is so cool! Thanks for sharing your journey and thought process.
You’re very welcome! 🤗
Would love to be a beta tester! Thank you so much for the transparent and thorough reporting of the build.
Just saw this, will DM you the access code! 🙏
Thanks muchly Karen.
Wow super detailed and insightful. Can't wait to see where this goes. Would love to test the beta 👋
Awesome! I’ll tag when it’s ready. 🙏
Fascinating approach! Leveraging AI to identify meaningful patterns in research could reshape how we track emerging trends across domains.
Thank you! Looking forward to seeing where this goes!
100% I want to be a beta tester. I actually started working on an n8n automation to do exactly this since a lot of my work relied on research, but didn't get to finish it. So yeeees, please count me in. Awesome stuff, Karen!
Amazing! I will add you to the list! 🙏
Awesome project, Karen!
Reminds me of Andrej Karpathy's project from 2021 - see https://github.com/karpathy/arxiv-sanity-preserver. He was using scikit-learn (TF-IDF vectorizer and SVM training), but your design is using OpenAI APIs for embeddings, which is what I would use as well.
In my previous job before retirement, I built a similar system that indexed corporate PDF documents from Confluence and JIRA ticket data, utilizing Google's Vertex AI APIs in a manner similar to what you are doing here.
I can see that a lot of the lessons you learned from StackDigest are directly applicable to this new project.
Congrats again - awesome project and great status report!
Thank you for the kind words! 🤗 I tried scikit-learn while I was building semantic search for Stack Digest, and its huge dependencies broke my production environment. 😆
I bet your project applying ML to corporate docs turned up some interesting insights…duplication of effort was a common theme when I worked with enterprise clients on their content plans!
So far the new project is behaving in my local environment…deploying to production next week! 🤞
This is seriously impressive work. Pulling insights out of 11,000 abstracts is the kind of obsessive curiosity that actually moves the field forward. The healthcare example was the moment it clicked for me.
This is not a toy. It is a research amplifier that solves a real discovery bottleneck for anyone who writes, builds, or invests in AI.
I hope you keep going with this because the demand for clear trend mapping is only getting louder. Looking forward to seeing Future Scan take shape.
Appreciate the kind words! 🤗 Will definitely be sharing my progress!
Looking forward to it!
Here goes Karen! You go, girl!
Thank you! 🙏
This is fantastic and quite timely. Doing similar research for different topic the less efficient way and am going to try this out. Will report back.
Awesome! I’ll tag you when the prototype is online.
You're brilliant! Truly
Thank you!
IHS has something similar called goldfire.
I’ll take a peek!
Apparently it's no longer owned by IHS. https://accuristech.com/solutions/
Haha, it took you a week to build something new, you're such a powerhouse! Simply incredible.
I stumbled onto a new idea pretty fast! 🤣