One follow up question - what's the best way to report correctness/success of the experiment by these labels?
Right now, I found docs that point me to this:
import phoenix as px
client = px.Client()
# Get the current dataset version
dataset = client.get_dataset(id="...", version_id="...")
df = dataset.as_dataframe()
df.head()
experiment = client.get_experiment(experiment_id="...")