If you are running th quantized version of llama, it should work on an ec2 instance. However, if you are trying to run the 16bit or 32bit versions, you will run out of memory.