Hello, What鈥檚 the best way to automatically track prompt changes (committed via Git) and evaluate their outputs against predefined ground truths? The idea is to have a workflow that gets triggered whenever a developer updates a prompt in Git. This workflow should run evaluations comparing the new prompt outputs against a set of ground truth responses to ensure consistency or detect regressions.
(I tried hard to find it but couldnt) there was a repo that was advocating for more granular diffs for strings (aka more granular than diff by line) and this could probably help in creating the kind of workflow youre suggesting. Regardless git diff --word-diff may help https://app.warp.dev/block/gKsKe5zzOHsiIzTZG3fQZT Also maybe architecting things such that there is file specifically dedicated to prompts would enable flows such as when X_FILE changes run Y_JOB tapping into John G. suggestion.
I second this problem, I could see a good deal of value for providing a solution to this
