If you are self-hosting the model, the caching can be implemented in the inference server. For instance vllm production stack git hub repo provides LMCache which is for managing the kv caching. You can make the caching possible in proxies lile litellm, however ghe inference server is based on my experience a better solution.