[2403.08699] Implicit Regularization of Gradient Flow on One-Layer Softmax Attention