New Python SDK v7.7.0: Enhanced Features and Bug Fixes Released

·Nov 03, 2023 04:39 PM

✨New Python SDK v7.7.0 release. This is a big one! In this release, we bring you big changes that will dramatically streamline ingesting generative LLM models as well as advance RAG support: 1. Support for ingesting CORPUS/knowledge base datasets. This is key for the correct support of RAG applications. 2. Relax an ingestion requirement: prompt and response are no longer required for GENERATIVE_LLM models. 3. Embedding vectors are no longer required to send a prompt or a response. While having prompt embeddings is somewhat common, especially in RAG applications, having embeddings for the responses is not common at all. You can now choose whether or not to send embeddings for prompts and responses. Here are some examples for the Pandas SDK: (NEW) Example of logging a CORPUS dataset:

# Logging the Corpus dataset
response = arize_client.log(  
        dataframe=corpus_df, # Refers to the above dataframe with the example row 
        model_id="search-and-retrieval-with-corpus-dataset",
        model_type=ModelTypes.GENERATIVE_LLM,
        environment=Environments.CORPUS,
        schema=CorpusSchema(
            document_id_column_name='document_id',
            document_text_embedding_column_names=EmbeddingColumnNames(
                vector_column_name='text_vector',
                data_column_name='text'
            ),
            document_version_column_name='document_version'
        ),
)

Example sending Prompt & Response with Embeddings

# Declare prompt & response embedding columns
prompt_columns=EmbeddingColumnNames(
    vector_column_name="document_vector",
    data_column_name="document"
),
response_columns=EmbeddingColumnNames(
    vector_column_name="summary_vector",
    data_column_name="summary"
)

(NEW) Example sending Prompt & Response without Embeddings

# Declare prompt & response text columns
prompt_columns="document"
response_columns="summary"

In addition, we sprinkled some bug fixes to sweeten the release! 🎁 New features:

Add CORPUS support
Accept strings for prompt and response
Make prompt and response optional
Add support for a list of strings features in single record log

🐛 Bug Fixes:

Don’t create a view of a Pandas dataframe

⚡1

# Logging the Corpus dataset response = arize_client.log( dataframe=corpus_df, # Refers to the above dataframe with the example row model_id="search-and-retrieval-with-corpus-dataset", model_type=ModelTypes.GENERATIVE_LLM, environment=Environments.CORPUS, schema=CorpusSchema( document_id_column_name='document_id', document_text_embedding_column_names=EmbeddingColumnNames( vector_column_name='text_vector', data_column_name='text' ), document_version_column_name='document_version' ), )

# Declare prompt & response embedding columns prompt_columns=EmbeddingColumnNames( vector_column_name="document_vector", data_column_name="document" ), response_columns=EmbeddingColumnNames( vector_column_name="summary_vector", data_column_name="summary" )