In everyday life, speech is part of a multichannel system involved in conveying emotion. Understanding how it operates in that context requires suitable data, consisting of multimodal records of emotion drawn from everyday life. This paper reflects the experience of two teams active in collecting and labelling data of this type. It sets out the core reasons for pursuing a multimodal approach, reviews issues and problems for developing relevant databases, and indicates how we can move forward both in terms of data collection and approaches to labelling.