I saw this thread on long context and more advanced haystack testing. There is definitely another level of testing and analysis needed that would help us understand the limits of the context window better.
We've been thinking about this a lot ourselves, pretty interesting test and results.
https://x.com/thomasahle/status/1763408041960231010?s=20