I am trying to run llama2 on an ec2 instance with GPU T4 and encountering some errors. the current GPU has the memory of 14 Gb. what is the best way of deploying llama2 on aws without constraints and issues with memory?
glad to be here