[2310.09762] Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer