Motor behaviour analysis is essential to biomedical research and clinical diagnostics as it provides a non-invasive strategy for identifying motor impairment and its change caused by interventions. State-of-the-art instrumented movement analysis is time- and cost-intensive, because it requires the placement of physical or virtual markers. As well as the effort required for marking the keypoints or annotations necessary for training or fine-tuning a detector, users need to know the interesting behaviour beforehand to provide meaningful keypoints. Here, we introduce unsupervised behaviour analysis and magnification (uBAM), an automatic deep learning algorithm for analysing behaviour by discovering and magnifying deviations. A central aspect is unsupervised learning of posture and behaviour representations to enable an objective comparison of movement. Besides discovering and quantifying deviations in behaviour, we also propose a generative model for visually magnifying subtle behaviour differences directly in a video without requiring a detour via keypoints or annotations. Essential for this magnification of deviations, even across different individuals, is a disentangling of appearance and behaviour. Evaluations on rodents and human patients with neurological diseases demonstrate the wide applicability of our approach. Moreover, combining optogenetic stimulation with our unsupervised behaviour analysis shows its suitability as a non-invasive diagnostic tool correlating function to brain plasticity.
Data availability
The rat data can be downloaded at https://hci.iwr.uni-heidelberg.de/compvis_files/Rats.zip. The optogenetics data can be downloaded at https://hci.iwr.uni-heidelberg.de/compvis_files/Optogenetics.zip. The mice data can be downloaded at https://hci.iwr.uni-heidelberg.de/compvis_files/Mice.zip. The human dataset cannot be publicly released because of privacy issues (please contact the authors if needed).
Code availability
The code for training and evaluating our models is publicly available on GitHub at the following address: https://github.com/utabuechler/uBAM (ref. 59).
This work was supported in part by German Research Foundation (DFG) projects 371923335 and 421703927 to B.O. as well as the Branco Weiss Fellowship Society in Science and the Swiss National Foundation Grant (Nr. 192678) to ASW.
Author information
Authors and Affiliations
B.B., U.B. and B.O. developed uBAM. B.B. and U.B. implemented and evaluated the framework and M.D. and P.R. the VAE. A.-S.W., L.F. and F.H. conducted the biomedical experiments and validated the results. B.B., U.B. and B.O. prepared the figures with input from A.-S.W. All authors contributed to writing the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Machine Intelligence thanks Ahmet Arac, Sven Dickinson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Qualitative comparison with the state-of-the-art in motion magnification.
To compare our results with Oh et al.(30), we show five clips from different impaired subjects before and after magnification for both methods. First, we re-synthesize the healthy reference behavior to change the appearance to that of the impaired subject so differences in posture can be studied directly, first row (see Method). The second row is the query impaired sequence. Third and forth rows show the magnified frame using the method by Oh et al.(30) and our approach, respectively. The magnified results, represented by magenta markers, show that Oh et al. corrupts the subject appearance, while our method emphasises the differences in posture without altering the appearance. (Details in Supplementary).
Extended Data Fig. 2 Quantitative comparison with the state-of-the-art in motion magnification.
a: mean-squared difference (white = 0) between the original query frame and its magnification using our method and the approach proposed by Oh et al.(30). For impaired subjects, our method modifies only the leg posture, while healthy subjects are not altered. Oh et al.(30) mostly changes the background and alters impaired and healthy indiscriminately. b: Measuring the fraction of frames with important deviation from healthy reference behaviour for each subject and video sequence and plotting the distribution of these scores. c, mean and standard deviation of deviation scores per cohort and approach. (Details in Supplementary).
Extended Data Fig. 3 Abnormality posture before and after magnification.
We show that our magnification supports spotting abnormal postures by applying a generic classifier on our behaviour magnified frames. This doubles the amount of detected abnormal postures without introducing a substantial number of false positives. In particular, we use a one-class linear-svm on ImageNet features trained only on one group (that is healthy) and predict abnormalities on healthy and impaired before and after magnification. The ratio of abnormalities is unaltered within the healthy cohort ( ~ 2%) while it doubles in the impaired cohort (5.7% to 11.7%) showing that our magnification method can detect and magnify small deviations, but that it does not artificially introduce abnormalities. (Details in Supplementary).
Extended Data Fig. 4 Qualitative evaluation of our posture encoding on the rat grasping dataset.
Projection from our posture encoding to a 2D embedding of 1000 randomly chosen postures using tSNE. Similar postures are located close to each other and the grasping action can be reconstructed by following the circle clockwise (best viewed by zooming in on the digital version of this figure). (Details in Supplementary).
Extended Data Fig. 5 Comparison with PCA of posture encoding.
a: A single video clip projected onto the two most important factors of variation using PCA directly on RGB input (left) and our representation (right). Consecutive frames are connected by straight lines colourised according to the time within the video. Every four frames we plot the original frame. PCA is able to sort the frames over time automatically, showing that each cycle is overlapping with the previous one. Our representation better separates different postures thus reflected by the circular shape of the embedding. b: same as a but including more videos. Each colour represent a different subject. In this case, PCA is strongly biased towards the subject appearance. Thus it separates subjects and does not allow to compare behaviour. c: We reduce the appearance bias by normalising per video with the mean appearance. The result still shows subject separation and no similarity of posture across subjects. d: Using our posture representation and applying PCA on Eπ instead of directly on video frames shows no subject bias and only similar postures are near in the 2D space. (Details in Supplementary).
Extended Data Fig. 6 Disentanglement comparison with simple baseline.
We transfer posture from a subject (row) to others with different appearance (columns). a: A baseline model which uses the average video frames as appearance. The appearance is subtracted from each frame to extract the posture. b: Disentanglement using our custom VAE for extracting posture and appearance. Checking for consistency in posture along a row and for similarity in appearance along a column shows that disentanglement is a hard problem: a pixel-based representation cannot solve the task, while our model produces more detailed and realistic images. (Details in Supplementary).
Extended Data Fig. 7 DeepLabCut trainset size.
We train DLC models on a growing number of training samples. The model is evaluated as described in Fig. 2 of the main manuscript. Note the limited gain in performance despite annotation increasing by more than an order of magnitude. (Details in Supplementary).
Extended Data Fig. 8 Comparison with R3D.
Besides JAABA and DLC we also compare our method with R3D which is another non-parametric model, very popular for video classification. We extract R3D features and evaluate the representation using the same protocol as our method. Our model is more suited to behaviour analysis. More information regarding the evaluation protocol can be found in the Methods section of the main manuscript. (Details in Supplementary).
Extended Data Fig. 9 Regress Key-points.
We show qualitative results for the key-point regression from our posture representation to key-points and ene-to-end inferred key-points for DLC. This experiment was computed on 14 keypoints, however we only show 6 for clarity: wrist (yellow), start of the first finger (purple), tip of each finger. The ground-truth location is shown with a circle and the detection inferred by the model with a cross. Even though our representation was not trained on keypoint detection, for some frames we can recover keypoints as good as, or even better, than DLC which was trained end-to-end on the task. We study the gap in performance in more detail in the Supplementary (Supplementary Figure 3).
Extended Data Fig. 10 Typical high/low scoring grasps with optogenetics.
Given the classifier that produced Fig. 5b, we score all testing sequences from the same animal and show two typical sequences with high/low classification scores. The positive score indicates that the sequence was predicted as light-on, the negative that it was predicted as light-off. Both sequences are correctly classified as indicated by the ground-truth (‘GT’) and classifier score (‘SVM-Score’). The sequence on the left shows a missed grasp, consistent with a light-on inhibitory behaviour, while the same animal performs a successful grasp in the sequence on the right for the light-off. Obviously, the classifier cannot see the fiber optics, since we cropped this area out before passing it to the classifier. (Details in Supplementary).
Supplementary information
Supplementary Information
Supplementary Figs. 1–3, Tables 1–6 and Discussion.
Rights and permissions
About this article
Cite this article
Brattoli, B., Büchler, U., Dorkenwald, M. et al. Unsupervised behaviour analysis and magnification (uBAM) using deep learning. Nat Mach Intell 3, 495–506 (2021). https://doi.org/10.1038/s42256-021-00326-x
Issue Date:
DOI: https://doi.org/10.1038/s42256-021-00326-x
