Abstract
Pharmacokinetics (PK) is a branch of pharmacology dedicated to the study of the time course of drug concentrations, from absorption to excretion from the body. PK dynamic models are often based on homogeneous, multi-compartment assumptions, which allow to identify the PK parameters and further predict the time evolution of drug concentration for a given subject. One key characteristic of these time series is their high variability among patients, which may hamper their correct stratification. In the present work, we address this variability by estimating the PK parameters and simultaneously clustering the corresponding subjects using the time series. We propose an expectation maximization algorithm that clusters subjects based on their PK drug responses, in an unsupervised way, collapsing clusters that are closer than a given threshold. Experimental results show that the proposed algorithm converges fast and leads to meaningful results in synthetic and real scenarios.
Similar content being viewed by others
References
Azzimonti L, Ieva F, Paganoni AM (2013) Nonlinear nonparametric mixed-effects models for unsupervised classification. Comput Stat 28(4):1549–1570
Beal SL, Sheiner LB (1980) The NONMEM system. Am Stat 34:118–119
Beal SL, Sheiner LB, Boeckmann AJ (1993) NONMEM users guide. Technical report, University of California, San Francisco
Carvalho AM, Adão P, Mateus P (2014) Hybrid learning of Bayesian multinets for binary classification. Pattern Recognit 47:3438–3450
Carvalho AM, Roos T, Oliveira AL, Myllymki P (2011) Discriminative learning of Bayesian networks via factorized conditional log-likelihood. J Mach Learn Res 12:2181–2210
Davidian M, Giltinan DM (2003) Nonlinear models for repeated measurement data: an overview and update. J Agric Biol Environ Stat 8:387–419
Dayneka NL, Garg V, Jusko WJ (1993) Comparison of four basic models of indirect pharmacodynamic responses. J Pharmacokinet Biopharm 21(4):457–478
Delyon B, Lavielle M, Moulines E (1999) Convergence os a stochastic approximation version of the EM procedure. Ann Stat 27:94–128
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 1:1–38
Derendorf H, Lesko LJ, Chaikin P, Colburn WA, Lee P, Miller R, Powell R, Rhodes G, Stanski D, Venitz J (2000) Pharmacokinetic/pharmacodynamic modeling in drug research and development. J Clin Pharmacol 40(12 Pt 2):1399–1418
Gueorguieva I, Ogungbenro K, Graham G, Glatt S, Aarons L (2007) A program for individual and population optimal design for univariate and multivariate response pharmacokinetic-pharmacodynamic models. Comput Methods Programs Biomed 86(1):51–61
Kuhn E, Lavielle M (2005) Maximum likelihood estimation in nonlinear mixed effects models. Comput Stat Data Anal 49:1020–1038
Lee J, Lee H, Jang K, Lim KS, Shin D, Yu KS (2014) Evaluation of the pharmacokinetic and pharmacodynamic drug interactions between cilnidipine and valsartan, in healthy volunteers. Drug Des Dev Ther 8:1781–1788
Lee PID, Amidon GL (1996) Pharmacokinetic analysis: a practical approach. CRC Press, Boca Raton
Lindstrom MJ, Bates DM (1988) Newton–Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. J Am Stat Assoc 84:1014–1022
Mager DE, Wyska E, Jusko WJ (2003) Diversity of mechanism-based pharmacodynamic models. Drug Metab Dispos 31(5):510–518
Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim 22:341–362
Rissanen J (1997) Stochastic complexity in learning. J Comput Syst Sci 55:89–95
Roden DM, George AL Jr (2002) The genetic basis of variability in drug responses. Nat Rev Drug Discov 1:37–44
Sheiner LB, Rosenberg B, Marathe VV (1977) Estimation of population characteristics of pharmacokinetic parameters from routine clinical data. J Pharmacokinet Biopharm 5:445–479
Trout H, Mentré F, Panhard X, Kodjo A, Escaut L, Pernet P, Gobert JG, Vittecoq D, Knellwolf AL, Caulin C, Bergmann JF (2004) Enhanced saquinavir exposure in HIV1-infected patients with diarrhea and/or wasting syndrome. Antimicrob Agents Chemother 48:538–545
Walker G (1996) An em algorithm for non-linear random effects models. Biometrics 52:934–944
Wei GC, Tanners MZ (1991) Applications of multiple imputation to the analysis of censored regression data. Biometrics 47:1297–1309
Wright SJ (2015) Coordinate descent algorithms. Math Program 151:3–34
Wu L (2002) A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. J Am Stat Assoc 97:955–964
Wu L (2004) Exact and approximate inferences for nonlinear mixed-effects models with missing covariates. J Am Stat Assoc 99:700–709
Acknowledgements
The authors would like to express their appreciation to Paulo Mateus for many useful inputs and valuable comments. Special thanks go to the anonymous reviewers, who significant contributed to the quality of this manuscript with their valuable and well-aimed comments. This work was supported by national funds through FCT, Fundação para a Ciência e a Tecnologia, under contracts LAETA (UID/EMS/50022/2013) and IT (UID/EEA/50008/2013), and by projects InteleGen (PTDC/DTP-FTO/1747/2012), PERSEIDS (PTDC/EMS-SIS/0642/2014) and internal IT project QBigData. SV acknowledges support by Program Investigador FCT (IF/00653/2012) from FCT, co-funded by the European Social Fund (ESF) through the Operational Program Human Potential (POPH). This work was partially supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No. 633974 (SOUND project).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Experimental setup
As described in Sect. 3.1, the M-step maximizes the likelihood over the parameters \(\beta _{1\ell }\) and \(\beta _{2\ell }\), for each cluster \(\ell \), resorting to a numerical method; indeed, it is overwhelming to obtain the analytical solution of the transcendental system of equations that provided the maxima. Next, we detail a simple optimization and stopping criteria for coordinate descent method used in the experiments presented in Sect. 4.
1.1 Optimization
To improve the convergence rate of coordinate descent method we made an improvement in the method by intercalating the (one-dimensional) Newtown iterations for variables \(\beta _{1\ell }\) and \(\beta _{2\ell }\), thus, solving Eqs. (4) and (5) simultaneously. With such modification, the method converged significantly faster, without deteriorating the results. Recall that Newton(b, h) is an iterative method to find a root of a function h given an initial approximation b, such that, in the d-th iteration of Newton’s method we have
We can recast the M-step for maximizing \(\beta _{1\ell }\) and \(\beta _{2\ell }\) as the problem of finding \(b_1\) that maximizes \(h_1(b_1,b_2)\), given \(b_2\) fixed, and \(b_2\) that maximizes \(h_2(b_1,b_2)\), given \(b_1\) fixed, simultaneously. A simple approach to address this problem is to interleave the iterations of Newton’s method as \(b_1^{(0)}, b_2^{(0)}, b_1^{(1)}, b_2^{(1)},\dots \) so that
We took this approach where
Note that k is used for the k-th iteration of the EM algorithm, whereas d (in the above case \(d=0\)) is used for the d-th iteration of the Newton’s method. So the previous estimates of \(\beta _{1\ell }\) and \(\beta _{2\ell }\) by the EM algorithm, are the initial approximation for the roots in Newton’s method. In this case, we let
with \(h^{(k)}_{1\ell }(b_1,b_2)\) defined as
and \(h^{(k)}_{2\ell }(b_1,b_2)\) defined as
Observe that functions \(h^{(k)}_{1\ell }\) and \(h^{(k)}_{2\ell }\) as defined in Sect. 3.1 have only one argument, while here they have two for the purpose of applying the envisage optimization. With this optimization we start with \(b^{(0)}_1=\beta ^{(k)}_{1\ell }\) and \(b^{(0)}_2=\beta ^{(k)}_{2\ell }\), then we iterate the Newton’s method until convergence in, say, d iterations, and finally, we update \(\beta _{1\ell }\) and \(\beta _{2\ell }\) as \(\beta ^{(k+1)}_{1\ell }=b^{(d)}_1\) and \(\beta ^{(k+1)}_{2\ell }=b^{(d)}_2\).
1.2 Stopping criterion
In our implementation, the \(\text {Newton}(b,h)\) method is stopped at iteration d if
Moreover, if the number of iterations exceeds \(10^4\) the method is aborted (with a warning to the user). However, in practice, this iteration limit was never achieved in the experiments performed in Sect. 4.
In our particular setting, as we are computing at each step both \(\beta _{1\ell }\) and \(\beta _{2\ell }\), we stop when both approximations fulfill the previous criteria.
1.3 Knee analysis
For the clustered curves to have biological meaning (e.g., concentrations need to be positive, rates cannot become exponentially faster, etc.), we imposed that
If Newton’s method ends with \(\beta _{1\ell }>\beta _{2\ell }\) then we swap \(\beta _{1\ell }\) with \(\beta _{2\ell }\); in practice, this never happened in the experiments performed in Sect. 4. Nonetheless, if \(\beta _{1\ell }\notin (0,5)\) or \(\beta _{2\ell }\notin (0,5)\), we perform a knee analysis to elicit a value within the allowed range.
In a general setting of \(\text {Newton}(b,h)\), assume that the method converges in iteration d and \(b^{(d)}\) is out of some allowed range. It might be very well the case \(b^{(d)}\) is out of range but the images of h within the allowed range are very closed to \(h(b^{(d)})\). In this case, the method does not converge inside the range because \(b^{(d)}>b^{(d-1)}\) but \(h(b^{(d)})\approx h(b^{(d-1)})\). Therefore, the method should be stopped when
which is called a knee analysis.
In our particular case, we perform this knee analysis to avoid \(\beta _{1\ell } \notin (0,5)\) and \(\beta _{2\ell } \notin (0,5)\). If, even performing this analysis, we have \(\beta _{1\ell } \notin (0,5)\) then we keep the previous parameters approximation, that is, \(\beta _{1\ell }^{(k+1)}=\beta _{1\ell }^{(k)}\), and the EM algorithm proceeds. The same analysis is carried out for \(\beta _{2\ell }\).
Appendix 2: Implementation details
In this appendix, we present implementation details related to the proposed EM algorithm.
1.1 Initial number of clusters
The initial number of clusters is an user-defined parameter. However, if this value is not given, we set \(M = \frac{N}{3}\).
1.2 Stopping criterion
Usually, an EM algorithm stops when the difference between consecutive values of the parameters reaches some threshold. In our case, the impact of the parameters \(\beta _{1\ell }\) and \(\beta _{2\ell }\) overwhelms the impact of all remaining parameters in the definition of the clusters; recall that \(\beta _{1\ell }\) and \(\beta _{2\ell }\) are exponents in Eq. (2). For this reason, only \(\beta _{1\ell }\) and \(\beta _{2\ell }\) are considered in the stopping criterion of the proposed EM algorithm. Specifically, the EM algorithm stops when
for all clusters \(\ell \in \{1,\dots ,M\}\).
1.3 Cluster thresholds
The thresholds for disregarding and merging clusters were defined (empirically) as \(\overline{W}=0.025\) and \(\overline{L}=1\).
Rights and permissions
About this article
Cite this article
Tomás, E., Vinga, S. & Carvalho, A.M. Unsupervised learning of pharmacokinetic responses. Comput Stat 32, 409–428 (2017). https://doi.org/10.1007/s00180-016-0707-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-016-0707-x