Mohsen M.

Commented on Caching in LLM Apps: Risks to User Data Privacy an...·Posted inDiscussions

If you are self-hosting the model, the caching can be implemented in the inference server. For instance vllm production stack git hub repo provides LMCache which is for managing the kv caching. You can make the caching possible in proxies lile litellm, however ghe inference server is based on my experience a better solution.