Hi Team, has anyone faced any issue in eval where hallucination or qa correctness gets logged as NOT PARSABLE or gets auto changed to from correct/facutal to NOT PARSABLE.
it is logging span api level that also complete chain steps are missing. would love to get traces, span, input, output similar to python langchain instrumentation
HI Mikyo thanks for sharing it. I got gist of it. I have seen eval taking time beyond 1 min upto 90 seconds depending on traces/ retrieved documents, so it will be better to record start time on each iteration and use that last start time as input in next iteration instead of just hardcoded duration of running every n mins/seconds