Hey, how hard is it to add a custom embedding model to auto-embeddings? https://docs.arize.com/phoenix/concepts/generating-embeddings I am advising a medical startup and general purpose models do a pretty poor job for them, I have manually written a connection to Biobert and it seems to perform better. I鈥檓 happy to do the implementation (it is a niche case), just trying to understand if there is an easy place to hook into. Thanks!
Hi Ilya, do you have a huggingface link to biobert? All you need is the hugginface model handle
Oh, cool, is there a doc on how to add a model if I have a HF handle? Thanks!
For instance this is a code example:
from arize.pandas.embeddings import EmbeddingGenerator, UseCases
generator = EmbeddingGenerator.from_use_case(
use_case=UseCases.NLP.SUMMARIZATION,
model_name="distilbert-base-uncased",
tokenizer_max_length=512,
batch_size=100
)
df["document_vector"] = generator.generate_embeddings(text_col=df["document"])
df["summary_vector"] = generator.generate_embeddings(text_col=df["summary"])You just need to change the model_name to your HF handle, and change the usecase. You can read about other options in the link as well
Thanks! This solves my issue!
Awesome
