Noryen
Tech Stack:

When I was building Biometryx, I ran into a problem I didnât expect.
In a health related context AI can become dangerous fast.
The model can:
- suggest medication
- mention dosages
- give advice that looks correct but isnât
I wouldnât know unless I manually tested every edge case. And even then - I wouldnât catch falty output before my users.
It was not scalable. And definitely not safe.
How is my AI actually behaving in production?
Most of the time we focus on prompts, UX, speed ⌠But we rarely ask the question - how does my AI behave in real-world scenario?
Are we going to be digging through raw JSON logs? Debugging by copy-pasting prompts into ChatGPT? Or ⌠do we simply cross fingers and hope for the best ?
I looked into existing solutions.
They were:
- heavy
- expensive
- built for large teams
I just needed something simple:
- log outputs
- flag risky ones
- compare models
So I built it.
AI isnât deterministic. You canât fully predict what it will say. And if youâre building in sensitive areas, âprobably safeâ is not good enough.
You can try Noryen here


