Advancing Long-Context LLMs
Recently, there's been a lot of advancement of model architecture in Transformer-based LLMs to optimize long-context capabilities across all stages from pre-training to inference. This paper provides a great overview of the methodologies for enhancing Transformer architecture modules.
Read: https://arxiv.org/abs/2311.12351