Unsupervised learning of pharmacokinetic responses | Computational Statistics Skip to main content
Log in

Unsupervised learning of pharmacokinetic responses

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Pharmacokinetics (PK) is a branch of pharmacology dedicated to the study of the time course of drug concentrations, from absorption to excretion from the body. PK dynamic models are often based on homogeneous, multi-compartment assumptions, which allow to identify the PK parameters and further predict the time evolution of drug concentration for a given subject. One key characteristic of these time series is their high variability among patients, which may hamper their correct stratification. In the present work, we address this variability by estimating the PK parameters and simultaneously clustering the corresponding subjects using the time series. We propose an expectation maximization algorithm that clusters subjects based on their PK drug responses, in an unsupervised way, collapsing clusters that are closer than a given threshold. Experimental results show that the proposed algorithm converges fast and leads to meaningful results in synthetic and real scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Azzimonti L, Ieva F, Paganoni AM (2013) Nonlinear nonparametric mixed-effects models for unsupervised classification. Comput Stat 28(4):1549–1570

    Article  MathSciNet  MATH  Google Scholar 

  • Beal SL, Sheiner LB (1980) The NONMEM system. Am Stat 34:118–119

    Article  Google Scholar 

  • Beal SL, Sheiner LB, Boeckmann AJ (1993) NONMEM users guide. Technical report, University of California, San Francisco

  • Carvalho AM, Adão P, Mateus P (2014) Hybrid learning of Bayesian multinets for binary classification. Pattern Recognit 47:3438–3450

    Article  Google Scholar 

  • Carvalho AM, Roos T, Oliveira AL, Myllymki P (2011) Discriminative learning of Bayesian networks via factorized conditional log-likelihood. J Mach Learn Res 12:2181–2210

    MathSciNet  MATH  Google Scholar 

  • Davidian M, Giltinan DM (2003) Nonlinear models for repeated measurement data: an overview and update. J Agric Biol Environ Stat 8:387–419

    Article  Google Scholar 

  • Dayneka NL, Garg V, Jusko WJ (1993) Comparison of four basic models of indirect pharmacodynamic responses. J Pharmacokinet Biopharm 21(4):457–478

    Article  Google Scholar 

  • Delyon B, Lavielle M, Moulines E (1999) Convergence os a stochastic approximation version of the EM procedure. Ann Stat 27:94–128

    Article  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 1:1–38

    MathSciNet  MATH  Google Scholar 

  • Derendorf H, Lesko LJ, Chaikin P, Colburn WA, Lee P, Miller R, Powell R, Rhodes G, Stanski D, Venitz J (2000) Pharmacokinetic/pharmacodynamic modeling in drug research and development. J Clin Pharmacol 40(12 Pt 2):1399–1418

    Google Scholar 

  • Gueorguieva I, Ogungbenro K, Graham G, Glatt S, Aarons L (2007) A program for individual and population optimal design for univariate and multivariate response pharmacokinetic-pharmacodynamic models. Comput Methods Programs Biomed 86(1):51–61

    Article  Google Scholar 

  • Kuhn E, Lavielle M (2005) Maximum likelihood estimation in nonlinear mixed effects models. Comput Stat Data Anal 49:1020–1038

    Article  MathSciNet  MATH  Google Scholar 

  • Lee J, Lee H, Jang K, Lim KS, Shin D, Yu KS (2014) Evaluation of the pharmacokinetic and pharmacodynamic drug interactions between cilnidipine and valsartan, in healthy volunteers. Drug Des Dev Ther 8:1781–1788

    Article  Google Scholar 

  • Lee PID, Amidon GL (1996) Pharmacokinetic analysis: a practical approach. CRC Press, Boca Raton

    Google Scholar 

  • Lindstrom MJ, Bates DM (1988) Newton–Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. J Am Stat Assoc 84:1014–1022

    MathSciNet  MATH  Google Scholar 

  • Mager DE, Wyska E, Jusko WJ (2003) Diversity of mechanism-based pharmacodynamic models. Drug Metab Dispos 31(5):510–518

    Article  Google Scholar 

  • Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim 22:341–362

    Article  MathSciNet  MATH  Google Scholar 

  • Rissanen J (1997) Stochastic complexity in learning. J Comput Syst Sci 55:89–95

    Article  MathSciNet  MATH  Google Scholar 

  • Roden DM, George AL Jr (2002) The genetic basis of variability in drug responses. Nat Rev Drug Discov 1:37–44

    Article  Google Scholar 

  • Sheiner LB, Rosenberg B, Marathe VV (1977) Estimation of population characteristics of pharmacokinetic parameters from routine clinical data. J Pharmacokinet Biopharm 5:445–479

    Article  Google Scholar 

  • Trout H, Mentré F, Panhard X, Kodjo A, Escaut L, Pernet P, Gobert JG, Vittecoq D, Knellwolf AL, Caulin C, Bergmann JF (2004) Enhanced saquinavir exposure in HIV1-infected patients with diarrhea and/or wasting syndrome. Antimicrob Agents Chemother 48:538–545

    Article  Google Scholar 

  • Walker G (1996) An em algorithm for non-linear random effects models. Biometrics 52:934–944

    Article  MathSciNet  MATH  Google Scholar 

  • Wei GC, Tanners MZ (1991) Applications of multiple imputation to the analysis of censored regression data. Biometrics 47:1297–1309

    Article  Google Scholar 

  • Wright SJ (2015) Coordinate descent algorithms. Math Program 151:3–34

    Article  MathSciNet  MATH  Google Scholar 

  • Wu L (2002) A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. J Am Stat Assoc 97:955–964

    Article  MATH  Google Scholar 

  • Wu L (2004) Exact and approximate inferences for nonlinear mixed-effects models with missing covariates. J Am Stat Assoc 99:700–709

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to express their appreciation to Paulo Mateus for many useful inputs and valuable comments. Special thanks go to the anonymous reviewers, who significant contributed to the quality of this manuscript with their valuable and well-aimed comments. This work was supported by national funds through FCT, Fundação para a Ciência e a Tecnologia, under contracts LAETA (UID/EMS/50022/2013) and IT (UID/EEA/50008/2013), and by projects InteleGen (PTDC/DTP-FTO/1747/2012), PERSEIDS (PTDC/EMS-SIS/0642/2014) and internal IT project QBigData. SV acknowledges support by Program Investigador FCT (IF/00653/2012) from FCT, co-funded by the European Social Fund (ESF) through the Operational Program Human Potential (POPH). This work was partially supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No. 633974 (SOUND project).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandra M. Carvalho.

Appendices

Appendix 1: Experimental setup

As described in Sect. 3.1, the M-step maximizes the likelihood over the parameters \(\beta _{1\ell }\) and \(\beta _{2\ell }\), for each cluster \(\ell \), resorting to a numerical method; indeed, it is overwhelming to obtain the analytical solution of the transcendental system of equations that provided the maxima. Next, we detail a simple optimization and stopping criteria for coordinate descent method used in the experiments presented in Sect. 4.

1.1 Optimization

To improve the convergence rate of coordinate descent method we made an improvement in the method by intercalating the (one-dimensional) Newtown iterations for variables \(\beta _{1\ell }\) and \(\beta _{2\ell }\), thus, solving Eqs. (4) and (5) simultaneously. With such modification, the method converged significantly faster, without deteriorating the results. Recall that Newton(bh) is an iterative method to find a root of a function h given an initial approximation b, such that, in the d-th iteration of Newton’s method we have

$$\begin{aligned} b^{(0)}=b\text { and }b^{(d+1)}=b^{(d)}-\frac{h(b^{(d)})}{h'(b^{(d)})}. \end{aligned}$$

We can recast the M-step for maximizing \(\beta _{1\ell }\) and \(\beta _{2\ell }\) as the problem of finding \(b_1\) that maximizes \(h_1(b_1,b_2)\), given \(b_2\) fixed, and \(b_2\) that maximizes \(h_2(b_1,b_2)\), given \(b_1\) fixed, simultaneously. A simple approach to address this problem is to interleave the iterations of Newton’s method as \(b_1^{(0)}, b_2^{(0)}, b_1^{(1)}, b_2^{(1)},\dots \) so that

$$\begin{aligned} b_1^{(d+1)}=b_1^{(d)}-\frac{h_{1}(b_{1}^{(d)},b_{2}^{(d)})}{\frac{\partial h_{1}}{\partial b_1}(b_{1}^{(d)},b_{2}^{(d)})} \text { and } b_2^{(d+1)}=b_2^{(d)}-\frac{h_{2}(b_{1}^{(d)},b_{2}^{(d)})}{\frac{\partial h_{2}}{\partial b_2}(b_{1}^{(d)},b_{2}^{(d)})}. \end{aligned}$$

We took this approach where

$$\begin{aligned} b_1^{(0)}=\beta ^{(k)}_{1\ell }\text { and }b_2^{(0)}=\beta ^{(k)}_{2\ell }. \end{aligned}$$

Note that k is used for the k-th iteration of the EM algorithm, whereas d (in the above case \(d=0\)) is used for the d-th iteration of the Newton’s method. So the previous estimates of \(\beta _{1\ell }\) and \(\beta _{2\ell }\) by the EM algorithm, are the initial approximation for the roots in Newton’s method. In this case, we let

$$\begin{aligned} h_1(b_1,b_2)=h^{(k)}_{1\ell }(b_1,b_2)\text { and }h_2(b_1,b_2)=h^{(k)}_{2\ell }(b_1,b_2), \end{aligned}$$

with \(h^{(k)}_{1\ell }(b_1,b_2)\) defined as

$$\begin{aligned} \displaystyle \sum _{i=1}^N\sum _{j=1}^n\Big (-\frac{\alpha ^{(k+1)}_\ell }{v^{(k)}_\ell }X_{i\ell }^{(k)}t_je^{-b_1t_j} \left( y_{ij}-\alpha _\ell ^{(k+1)}(e^{-b_1t_j}-e^{-b_2t_j})\right) \Big ), \end{aligned}$$

and \(h^{(k)}_{2\ell }(b_1,b_2)\) defined as

$$\begin{aligned} \displaystyle \sum _{i=1}^N\sum _{j=1}^n\Big (\frac{\alpha _\ell ^{(k+1)}}{v^{(k)}_\ell }X_{i\ell }^{(k)}t_je^{-b_2t_j} \left( y_{ij}-\alpha _\ell ^{(k+1)}(e^{-b_1t_j}-e^{-b_2t_j})\right) \Big ). \end{aligned}$$

Observe that functions \(h^{(k)}_{1\ell }\) and \(h^{(k)}_{2\ell }\) as defined in Sect. 3.1 have only one argument, while here they have two for the purpose of applying the envisage optimization. With this optimization we start with \(b^{(0)}_1=\beta ^{(k)}_{1\ell }\) and \(b^{(0)}_2=\beta ^{(k)}_{2\ell }\), then we iterate the Newton’s method until convergence in, say, d iterations, and finally, we update \(\beta _{1\ell }\) and \(\beta _{2\ell }\) as \(\beta ^{(k+1)}_{1\ell }=b^{(d)}_1\) and \(\beta ^{(k+1)}_{2\ell }=b^{(d)}_2\).

1.2 Stopping criterion

In our implementation, the \(\text {Newton}(b,h)\) method is stopped at iteration d if

$$\begin{aligned} \mid b^{(d)}-b^{(d-1)} \mid <10^{-10}. \end{aligned}$$

Moreover, if the number of iterations exceeds \(10^4\) the method is aborted (with a warning to the user). However, in practice, this iteration limit was never achieved in the experiments performed in Sect. 4.

In our particular setting, as we are computing at each step both \(\beta _{1\ell }\) and \(\beta _{2\ell }\), we stop when both approximations fulfill the previous criteria.

1.3 Knee analysis

For the clustered curves to have biological meaning (e.g., concentrations need to be positive, rates cannot become exponentially faster, etc.), we imposed that

$$\begin{aligned} \beta _{1\ell }, \beta _{2\ell }\in (0,5)\text { and }\beta _{1\ell }\le \beta _{2\ell }. \end{aligned}$$

If Newton’s method ends with \(\beta _{1\ell }>\beta _{2\ell }\) then we swap \(\beta _{1\ell }\) with \(\beta _{2\ell }\); in practice, this never happened in the experiments performed in Sect. 4. Nonetheless, if \(\beta _{1\ell }\notin (0,5)\) or \(\beta _{2\ell }\notin (0,5)\), we perform a knee analysis to elicit a value within the allowed range.

In a general setting of \(\text {Newton}(b,h)\), assume that the method converges in iteration d and \(b^{(d)}\) is out of some allowed range. It might be very well the case \(b^{(d)}\) is out of range but the images of h within the allowed range are very closed to \(h(b^{(d)})\). In this case, the method does not converge inside the range because \(b^{(d)}>b^{(d-1)}\) but \(h(b^{(d)})\approx h(b^{(d-1)})\). Therefore, the method should be stopped when

$$\begin{aligned} \mid h(b^{(d)})-h(b^{(d-1)}) \mid <10^{-10}, \end{aligned}$$

which is called a knee analysis.

In our particular case, we perform this knee analysis to avoid \(\beta _{1\ell } \notin (0,5)\) and \(\beta _{2\ell } \notin (0,5)\). If, even performing this analysis, we have \(\beta _{1\ell } \notin (0,5)\) then we keep the previous parameters approximation, that is, \(\beta _{1\ell }^{(k+1)}=\beta _{1\ell }^{(k)}\), and the EM algorithm proceeds. The same analysis is carried out for \(\beta _{2\ell }\).

Appendix 2: Implementation details

In this appendix, we present implementation details related to the proposed EM algorithm.

1.1 Initial number of clusters

The initial number of clusters is an user-defined parameter. However, if this value is not given, we set \(M = \frac{N}{3}\).

1.2 Stopping criterion

Usually, an EM algorithm stops when the difference between consecutive values of the parameters reaches some threshold. In our case, the impact of the parameters \(\beta _{1\ell }\) and \(\beta _{2\ell }\) overwhelms the impact of all remaining parameters in the definition of the clusters; recall that \(\beta _{1\ell }\) and \(\beta _{2\ell }\) are exponents in Eq. (2). For this reason, only \(\beta _{1\ell }\) and \(\beta _{2\ell }\) are considered in the stopping criterion of the proposed EM algorithm. Specifically, the EM algorithm stops when

$$\begin{aligned} \frac{\mid \beta _{1\ell }^{(k+1)}-\beta _{1\ell }^{(k)}\mid }{\mid \beta _{1\ell }^{(k+1)}\mid }\le 10^{-6}\text { and } \frac{\mid \beta _{2\ell }^{(k+1)}-\beta _{2\ell }^{(k)}\mid }{\mid \beta _{2\ell }^{(k+1)}\mid }\le 10^{-6}, \end{aligned}$$

for all clusters \(\ell \in \{1,\dots ,M\}\).

1.3 Cluster thresholds

The thresholds for disregarding and merging clusters were defined (empirically) as \(\overline{W}=0.025\) and \(\overline{L}=1\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tomás, E., Vinga, S. & Carvalho, A.M. Unsupervised learning of pharmacokinetic responses. Comput Stat 32, 409–428 (2017). https://doi.org/10.1007/s00180-016-0707-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-016-0707-x

Keywords

Navigation