Questions About Llama Index Search and Retrieval Tutorial Steps
Hey, I am trying to follow the llama_index_search_and_retrieval_tutorial.ipynb notebook to build a way to visualize some q&a over my dataset. I would like to conduct a very simple visualization of the embeddings and see where the queries are landing. My understanding is that it is possible to do this with Phoenix. Based on this notebook though, there a few steps that I am wondering if they could be omitted:
- 1.
generating the centroids in step five. is this really necessary with a smaller dataset?
- 2.
Running the LLM-assisted evaluations
- 3.
Computing ranking metrics
Also, there is a chunk of code where a query set is downloaded and converted into a dataframe:
query_df = pd.read_parquet(
"http://storage.googleapis.com/arize-assets/phoenix/datasets/unstructured/llm/llama-index/arize-docs/query_data_complete3.parquet",
)
query_df.head()
Since I have my own dataset, I was wondering if the recommended schema for this dataset is listed somewhere or if I should just copy it from the dataframe above, Thank you for your help!
