[2311.04923] Is one brick enough to break the wall of spoken dialogue state tracking?