hey Mikyo, yeah I checked the docs and I saw a discrepancy. The journal I linked above has these fields:
The columns of the dataframe are:
:id.id:: the query ID
:timestamp.iso_8601:: the time at which the query was made
:feature.text:prompt: the query text
:feature.[float].embedding:prompt: the embedding representation of the query
:prediction.text:response: the final response presented to the user
:feature.[str].retrieved_document_ids:prompt: the list of IDs of the retrieved documents
:feature.[float].retrieved_document_scores:prompt: the lists of cosine similarities between the query and retrieved documents
Plus:
:tag.float:user_feedback: approval or rejection from the user (-1 means thumbs down, +1 means thumbs up)
:tag.str:openai_relevance_0: a binary classification (relevant vs. irrelevant) by GPT-4 predicting whether the first retrieved document is relevant to the query
:tag.str:openai_relevance_1: a binary classification (relevant vs. irrelevant) by GPT-4 predicting whether the second retrieved document is relevant to the query
But the docs have:
class Schema(
prediction_id_column_name: Optional[str] = None,
timestamp_column_name: Optional[str] = None,
feature_column_names: Optional[List[str]] = None,
tag_column_names: Optional[List[str]] = None,
prediction_label_column_name: Optional[str] = None,
prediction_score_column_name: Optional[str] = None,
actual_label_column_name: Optional[str] = None,
actual_score_column_name: Optional[str] = None,
prompt_column_names: Optional[EmbeddingColumnNames] = None
response_column_names: Optional[EmbeddingColumnNames] = None
embedding_feature_column_names: Optional[Dict[str, EmbeddingColumnNames]] = None,
excluded_column_names: Optional[List[str]] = None,
)which appears to have some overlap but not be identical. So with my query list that I need to transform into a dataset, I just want to understand which column headings must be provided and what values they should have