[2311.00684] Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation