Abstract
Data processing often transforms a complex signal using a set of different preprocessing algorithms to a single value as the outcome of a final decision function. Still, it is challenging to understand and visualize the interplay between the algorithms performing this transformation. Especially when dimensionality reduction is used, the original data structure (e.g., spatio-temporal information) is hidden from subsequent algorithms. To tackle this problem, we introduce the backtransformation concept suggesting to look at the combination of algorithms as one transformation which maps the original input signal to a single value. Therefore, it takes the derivative of the final decision function and transforms it back through the previous processing steps via backward iteration and the chain rule. The resulting derivative of the composed decision function in the sample of interest represents the complete decision process. Using it for visualizations might improve the understanding of the process. Often, it is possible to construct a feasible processing chain with affine mappings which simplifies the calculation for the backtransformation and the interpretation of the result a lot. In this case, the affine backtransformation provides the complete parameterization of the processing chain. This article introduces the theory, provides implementation guidelines, and presents three application examples.
Similar content being viewed by others
Notes
Further methods are presented but they are tailored to functional magnetic resonance imaging (fMRI) data.
The respective derivatives are constant for every sample and as such not depending on it.
The notation of data and its components differs from the notation in classification tasks. Here, we look at one data sample \(x^{(0)}\) with its different processing stages \(x^{(l)}\) and the respective changes in each component of the data \({\left( x^{(l)}_{gh}\right) }\). The double index notation is applied to account for different axes in the data as in time series (different sensors and time points) or images.
With \(n_{k+1}:=1\) it holds that \(\frac{\partial F_l}{\partial y^{(l)}}\in \mathbb {R}^{n_l\times n_{l+1}}\) and the dimensions of \(B_l\) are a consequence of the recursion. Another reason for the dimensions of \(B_l\) is that \(B_l\) corresponds to the mapping of \(x^{(l)}\) to the scalar output \(x^{\text {out}}\).
Note that no matrix inversion is required even though one might expect that, because the goal is to find out what the original mapping was doing with the data which sounds like an inverse approach.
A weighted sum of classifiers preserves linearity/differentiability. A majority vote will result in a non-differentiable classifier but when the score is the sum of the voters for the selected class, the resulting function will still be locally linear/differentiable.
Nevertheless, the resulting graphics look reasonable.
A standard extended 10–20 electrode layout has been chosen with 128 electrodes: http://www.brainproducts.com/filedownload.php?path=downloads/actiCAP-128-channel-Standard-2_1201.
References
Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459. doi:10.1002/wics.101
Aksoy S, Haralick RM (2001) Feature normalization and likelihood-based similarity measures for image retrieval. Pattern Recognit Lett 22(5):563–582. doi:10.1016/S0167-8655(00)00112-4
Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, Müller KR (2010) How to explain individual classification decisions. J Mach Learn Res 11:1803–1831
Blankertz B, Tomioka R, Lemm S, Kawanabe M, Müller KR (2008) Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal Process Mag 25(1):41–56. doi:10.1109/MSP.2008.4408441
Blankertz B, Lemm S, Treder M, Haufe S, Müller KR (2011) Single-trial analysis and classification of ERP components—a tutorial. NeuroImage 56(2):814–825. doi:10.1016/j.neuroimage.2010.06.048
Chang CC, Lin CJ (2011) LIBSVM. ACM Trans Intell Syst Technol 2(3):1–27. doi:10.1145/1961189.1961199
Chen Ch, Härdle W, Unwin A (2008) Handbook of data visualization. Springer Handbooks of Computational Statistics, Springer
Clarke F (1990) Optimization and nonsmooth analysis. Society for Industrial and Applied Mathematics, Philadelphia. doi:10.1137/1.9781611971309
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive–aggressive algorithms. J Mach Learn Res 7:551–585
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87. doi:10.1145/2347736.2347755
Feess D, Krell MM, Metzen JH (2013) Comparison of sensor selection mechanisms for an ERP-based brain-computer interface. PLoS One 8(7):e67,543. doi:10.1371/journal.pone.0067543
Ghaderi F, Straube S (2013) An adaptive and efficient spatial filter for event-related potentials. In: Proceedings of the 21st European signal processing conference (EUSIPCO)
Griewank A, Walther A (2008) Evaluating derivatives: principles and techniques of algorithmic differentiation. Society for Industrial and Applied Mathematics, Philadelphia
Haufe S, Meinecke F, Görgen K, Dähne S, Haynes JD, Blankertz B, Bießmann F (2014) On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage 87:96–110. doi:10.1016/j.neuroimage.2013.10.067
Johanshahi M, Hallett M (eds) (2003) The Bereitschaftspotential: movement-related cortical potentials. Kluwer Academic/Plenum Publishers, New York
Jutten C, Herault J (1991) Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process 24(1):1–10. doi:10.1016/0165-1684(91)90079-X
Kirchner EA, Kim SK, Straube S, Seeland A, Wöhrle H, Krell MM, Tabie M, Fahle M (2013) On the applicability of brain reading for predictive human–machine interfaces in robotics. PLoS One 8(12):e81,732. doi:10.1371/journal.pone.0081732
Krell MM (2015) Generalizing, decoding, and optimizing support vector machine classification. PhD thesis, University of Bremen, Bremen. http://nbn-resolving.de/urn:nbn:de:gbv:46-00104380-12
Krell MM, Wöhrle H (2015) New one-class classifiers based on the origin separation approach. Pattern Recogn Lett 53:93–99. doi:10.1016/j.patrec.2014.11.008
Krell MM, Straube S, Seeland A, Wöhrle H, Teiwes J, Metzen JH, Kirchner EA, Kirchner F (2013) pySPACE—a signal processing and classification environment in Python. Front Neuroinform 7(40). doi:10.3389/fninf.2013.00040
Krell MM, Tabie M, Wöhrle H, Kirchner EA (2013b) Memory and processing efficient formula for moving variance calculation in EEG and EMG signal processing. In: Proceedings of international congress on neurotechnology, electronics and informatics (NEUROTECHNIX 2013), ScitePress, Vilamoura, Portugal, pp 41–45. doi:10.5220/0004633800410045
Krell MM, Feess D, Straube S (2014a) Balanced relative margin machine the missing piece between FDA and SVM classification. Pattern Recogn Lett 41:43–52. doi:10.1016/j.patrec.2013.09.018
Krell MM, Straube S, Wöhrle H, Kirchner F (2014b) Generalizing, optimizing, and decoding support vector machine classification. In: ECML/PKDD 2014 PhD session proceedings, Nancy
LaConte S, Strother S, Cherkassky V, Anderson J, Hu X (2005) Support vector machines for temporal classification of block design fMRI data. NeuroImage 26(2):317–329. doi:10.1016/j.neuroimage.2005.01.048
Lagerlund TD, Sharbrough FW, Busacker NE (1997) Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular value decomposition. J Clin Neurophysiol 14(1):73–82
Lal TN, Schröder M, Hinterberger T, Weston J, Bogdan M, Birbaumer N, Schölkopf B (2004) Support vector channel selection in BCI. IEEE Eng Med Biol Soc 51(6):1003–1010. doi:10.1109/TBME.2004.827827
Le QV, Ranzato M, Monga R, Devin M, Chen K, Corrado GS, Dean J, Ng AY (2012) Building high-level features using large scale unsupervised learning. In: International conference on machine learning
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. doi:10.1109/5.726791
Lew E, Chavarriaga R, Zhang H, Seeck M, del Millan J (2012) Self-paced movement intention detection from human brain signals: invasive and non-invasive EEG. In: 2012 annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 3280–3283
Lin HT, Lin CJ, Weng RC (2007) A note on Platts probabilistic outputs for support vector machines. Mach Learn 68(3):267–276. doi:10.1007/s10994-007-5018-6
Metzen JH, Kirchner EA (2011) Rapid adaptation of brain reading interfaces based on threshold adjustment. In: Proceedings of the 2011 conference of the German classification society (GfKl-2011), Frankfurt, Germany, p 138
Mika S, Rätsch G, Müller KR (2001) A mathematical programming approach to the kernel fisher algorithm. In: Advances in neural information processing systems 13 (NIPS 2000), MIT Press, pp 591–597
Oppenheim AV, Schafer RW (2009) Discrete-time signal processing, 3rd edn. Prentice Hall Press, Upper Saddle River
Platt JC (2000) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers, vol 10. MIT Press, Cambridge, pp 61–74
Press W (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge
Rieger J, Kosar K, Lhotska L, Krajca V (2004) Eeg data and data analysis visualization. In: Barreiro J, Martn-Snchez F, Maojo V, Sanz F (eds) Biological and medical data analysis, lecture notes in computer science, vol 3337. Springer, Berlin, pp 39–48. doi:10.1007/978-3-540-30547-7_5
Rivet B, Souloumiac A, Attina V, Gibert G (2009) xDAWN algorithm to enhance evoked potentials: application to brain–computer interface. IEEE Trans Biomed Eng 56(8):2035–2043. doi:10.1109/TBME.2009.2012869
Rockafellar RT, Wets RJB (2009) Variational analysis, vol 317. Springer, Berlin, Heidelberg
Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of the 2012 IEEE Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, pp 3642–3649
Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471. doi:10.1162/089976601750264965
Seeland A, Wöhrle H, Straube S, Kirchner EA (2013) Online movement prediction in a robotic application scenario. In: 6th international IEEE EMBS conference on neural engineering (NER), San Diego, USA, pp 41–44. doi:10.1109/NER.2013.6695866
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222. doi:10.1023/B:STCO.0000035301.49549.88
Steinwart I, Christmann A (2008) Support vector machines. Springer, New York
Straube S, Feess D (2013) Looking at ERPs from another perspective: polynomial feature analysis. Perception 42 ECVP abstract supplement:220
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: International conference on learning representations
Tabie M, Kirchner EA (2013) EMG onset detection—comparison of different methods for a movement prediction task based on EMG. In: Alvarez S, Solé-Casals J, Fred A, Gamboa H (eds) Proceedings of the 6th international conference on bio-inspired systems and signal processing (BIOSIGNALS-13). SciTePress, Barcelona, Spain, pp 242–247. doi:10.5220/0004250102420247
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Varewyck M, Martens JP (2011) A practical approach to model selection for support vector machines with a Gaussian kernel. IEEE Trans Syst Man Cybern Part B Cybern 41(2):330–340. doi:10.1109/TSMCB.2010.2053026
Verhoeye J, de Wulf R (1999) An image processing chain for land-cover classification using multitemporal ERS-1 data. Photogramm Eng Remote Sens 65(10):1179–1186
Woehrle H, Krell MM, Straube S, Kim SK, Kirchner EA, Kirchner F (2015) An adaptive spatial filter for user-independent single trial detection of event-related potentials. IEEE Trans Biomed Eng. doi:10.1109/TBME.2015.2402252
Acknowledgments
The authors thank David Feess, Marc Tabie, Anett Seeland, Frank Kirchner, Su Kyoung Kim, Hendrik Wöhrle, and Bertold Bongardt for highly valuable discussions and input. This work was supported by the German Federal Ministry of Economics and Technology (BMWi, Grants FKZ 50 RA 1012 and FKZ 50 RA 1011).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Krell, M.M., Straube, S. Backtransformation: a new representation of data processing chains with a scalar decision function. Adv Data Anal Classif 11, 415–439 (2017). https://doi.org/10.1007/s11634-015-0229-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-015-0229-3
Keywords
- Affine transformations
- Function composition
- Processing chain interpretation
- Processing chain visualization