Questions About Llama Index Search and Retrieval Tutorial Steps

·Nov 02, 2023 09:56 PM

Hey, I am trying to follow the llama_index_search_and_retrieval_tutorial.ipynb notebook to build a way to visualize some q&a over my dataset. I would like to conduct a very simple visualization of the embeddings and see where the queries are landing. My understanding is that it is possible to do this with Phoenix. Based on this notebook though, there a few steps that I am wondering if they could be omitted:

1.
generating the centroids in step five. is this really necessary with a smaller dataset?
2.
Running the LLM-assisted evaluations
3.
Computing ranking metrics

Also, there is a chunk of code where a query set is downloaded and converted into a dataframe:

query_df = pd.read_parquet(
    "http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/llm/llama-index/arize-docs/query_data_complete3.parquet",
)
query_df.head()

Since I have my own dataset, I was wondering if the recommended schema for this dataset is listed somewhere or if I should just copy it from the dataframe above, Thank you for your help!