[2309.08489] Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network