The investigation of the dynamics of national disciplinary profiles is at the forefront in quantitative investigations of science. We propose a new approach to investigate the complex interactions among scientific disciplinary profiles. The approach is based on recent pseudo-likelihood techniques introduced in the framework of machine learning and complex systems. We infer, in a Bayesian framework, the network topology and the related interdependencies among national disciplinary profiles. We analyse data extracted from the Incites database which relate to the national scientific production of most productive world countries at disciplinary level over the period 1992–2016.

It is a product of Clarivate Analytics. Further information are available at https://clarivate.com/products/incites/.
The elaborations reported in this paper are based on indicators exported the 2018-02-26 from InCites dataset updated at 2018-02-10 which includes Web of Science content indexed through 2017-12-31.
The analysed countries are: Argentina (ARG), Australia (AUS), Austria (AUT), Belgium (BEL), Brazil (BRA), Bulgaria (BGR), Canada (CAN), Chile (CHL), China Mainland (CHN), Colombia (COL), Croatia (HRV), Denmark (DNK), Egypt (EGY), Finland (FIN), France (FRA), Germany (DEU), Greece (GRC), Hong Kong (HKG), Hungary (HUN), India (IND), Iran (IRN), Ireland (IRL), Israel (ISR), Italy (IT), Japan (JPN), Malaysia (MYS), Mexico (MEX), Netherlands (NLD), New Zealand (NZL), Norway (NOR), Pakistan (PAK), Poland (POL), Portugal (PRT), Romania (ROU), Russia (RUS), Saudi Arabia (SAU), Singapore (SGP), Slovenia (SVN), South Africa (ZAF), South Korea (KOR), Spain (ESP), Sweden (SWE), Switzerland (CHE) Taiwan (TWN), Thailand (THA), Turkey (TUR), Ukraine (UKR), United Kingdom (GBR), Usa (USA).
The proportionality constant for \(Z_i(\{ J\})\) and \(<\mathbf{s }_i\cdot \mathbf{s }_j>_{i,\{J\}}\) is the same.
Author information
In this appendix we discuss the methodology used in this work to obtain the parameters of the maximum Log-Likelihood function introduced in the paper. Firstly, we discuss the general grounds of the validity of the method used. Secondly, we deal with the application to the specific case.
Given the set of data, \(\{\textbf{s} ^{\mu},\mu=1,2,\ldots ,M \}\), assuming that the observed data set are independent, and once defined the generative model, the Log-Likelihood function, \(l(\{J\})\) becomes
where \(\mu =1,\ldots ,M\) is the label for a set of data. The inference problem consists in determining the set of parameters \(\{ J\}\) which maximizes the function in Eq. 5.
We consider here the expression of the cost function (Hamiltonian) for a multicomponent variable \(\mathbf{s }_i=(s_i^1,\ldots , s_i^{\gamma },\ldots ,s_i^D)\), \(H(\{ \mathbf{s }\}|\{ J\})\), given by
with \(J_{ij}=J_{ji}\). The symbol “\(\cdot\)” in Eq. 6 states for a scalar product. The presence of a scalar product ensures that orthogonal or quasi-orthogonal vectors (i.e. countries which have a number of publications whatever large but in different fields) will have a small weight in the cost function. The sum is extended to all couples of nodes (i, j) with \(i \ne j\). The partition function \(Z(\{ J\})\) is
the sum is extended to all possible configurations in the phase space of the set of variables \(\{ \mathbf{s }\}\).
The calculation of the above partition function is too demanding from a computational point of view already for a small number of variables. For this reason, we resort to the pseudo-likelihood approximation (Aurell and Ekeberg 2012, Tyagi et al. 2016, Marruzzo et al. 2017). It consists in maximizing a Pseudo-Log-Likelihood function based on the local conditional Log-Likelihood function at each node (see Eq. 10) in place of the Log-Likelihood function. It is possible to show that the estimation of the parameters obtained by a Pseudo-Log-Likelihood maximization is consistent with the one obtained by the maximization of the Log-Likelihood function, that is the two functions are maximized by the same set of parameters. The hypothesis under which this statement holds, i.e. the strict concavity of the Pseudo-Log-Likelihood function with respect to the elements of the set of parameters, is not too strict (see Hyvarinen 2006). Furthermore it is possible to show that under such a hypothesis the Pseudo-Log-Likelihood maximization is exact (i.e. equivalent to the Log-Likelihood maximization) in the case of infinite sampling (Aurell and Ekeberg 2012). An important advantage of the Pseudo-Log-Likelihood function is that it is possible to maximize it in polynomial time.
According to the Pseudo-Log-Likelihood approach, we consider the likelihood built on the local conditional probability on each variable i, one by one. Instead of Eq. (5), the cost function (Eq. 6), is first rewritten as
where \(\mathbf{s }_{\backslash i}\) indicates the set of all input-variables except the ith. The functions \(\mathbf{A }_i(\{J\})=\frac{1}{2} \sum _{j }^{1,N} J_{ij} \mathbf{s }_j\) and \(\mathbf{B }_{i,k}(\{J\})=\frac{1}{2} \sum _{j \ne i}^{1,N} J_{ij} \mathbf{s }_j\) have been introduced in Eq. 8. The cost functions \(H_i (\mathbf{s }_i | \{ \mathbf{s }_{\backslash i} \},\{ J\})\) and \(H_{\backslash i} (\{ \mathbf{s }_{\backslash i}\}|\{ J\})\) are implicitly defined in the same equation. Analogously we can rewrite the partition function as
The local conditional probability at the ith node is
and the local partition function is \(Z_i(\{ J\})=\sum _{\{\mathbf{s }_i\}}e^{-H_i (s_i | \{ \mathbf{s }_{\backslash i} \},\{ J\})}\). By defining \(l'( \mathbf{s }_i|\{ \mathbf{s }_{\backslash _i}\}|\{ J\})=\log [p(\mathbf{s }_i | \{\mathbf{s }_{\backslash i} \},\{ J\})]\), the Pseudo-Log-Likelihood function is defined as
The gradient of the Pseudo-Log-Likelihood function with respect to the parameter \(J_{ij}\) is given by
where \(<\ \ >_{i,\{J\}}\) states for ensemble average calculated over the probability distribution \(p(\mathbf{s }_i | \{\mathbf{s }_{\backslash i} \},\{ J\})\). Looking now at the gradient of the Log-Likelihood function, l(J) we observe that it is possible to rephrase the term \(\frac{1}{Z(\{ J\})}\frac{\partial }{\partial J_{ij}}Z(\{ J\})\) as
Finally we obtain
By comparing Eqs. 12 and 14 it is possible to infer that in the limit \(M \rightarrow \infty\), i) both the gradients go to zero for the set of parameters \(\{ J\}\) generating the observed data, ii) \(\frac{\partial }{\partial J_{ij}} \lambda (\{ J\}) \rightarrow \frac{\partial }{\partial J_{ij}} l(\{ J\})\). This finally establishes the consistency of the maximum Pseudo-Log-Likelihood estimator. We observe, furthermore, its coincidence with the maximum Log-Likelihood estimator in the limit \(M \rightarrow \infty\).
The gradient of the Log-Pseudo-Likelihood function can be calculated exactly, thus facilitating the computational solution of the inference problem. The explicit expression of \(\frac{\partial }{\partial J_{ij}} \lambda (\{ J\})\) is reported in the following.
To deal with a lower number of parameters in place of maximizing the Pseudo-Log-Likelihood function, given by the sum of the single-node Pseudo-Log-Likelihood functions (Eq. 11), we maximize each single-node Pseudo-Log-Likelihood function. Since the couplings should be symmetric the final estimate of the \(J_{ij}\) parameter is obtained by taking the average \((J_{ij}+J_{ji})/2\).
Using a standard Pseudo-Log-Likelihood maximization some coupling can be largely overestimated. To avoid such a drawback we used a \(l_2\) regularizer (Ravikumar 2010), i.e. in place of maximizing the \(\lambda (\{ J\})\) function we maximize the function \(\lambda (\{ J\})-l_2(\sum _{i,j}J_{ij}^2)^{1/2}\), where \(l_2\) is a suitable chosen constant.
The maximization of the single-node Pseudo-Log-Likelihood functions has been performed by means of the MATLAB fminunc package by selecting a trust-region optimization algorithm.
In the following we first rephrase the expression of the Log-Likelihood function by isolating the contribution of the ith node and compare with the expression of the Log-Pseudo-Likelihood function to a deeper understanding of the differences between them. We finally calculate the gradient of the Pseudo-Likelihood function with respect to \(J_{ij}\). The sum \(\sum _{\{ \mathbf{s }_i\}}e^{- \mathbf{s }_i \cdot \mathbf{A }_i(\{J\})}\) in Eq. 9 has been calculated by assuming that the values of the ith input variable can continuously vary in the interval \([-\,1,1]\), obtaining
The proportionality constant in Eq. 15, equal to the inverse of the total number of all possible \(\mathbf{s }_i\) configurations, does not influence the following derivations and it will be not explicitely considered. Similarly, it is possible to write the function \(Z_{\backslash i}(\{ J\})=\sum _{\{ \mathbf{s }_{\backslash i}\}}e^{-H_{\backslash i} (\{ \mathbf{s }_{\backslash i}\}|\{ J\})}\), by exploiting the function \(\mathbf{B }_{i,k}(\{ J\})\) defined above, obtaining
By iterating this procedure to the remaining variables it is finally possible to write the partition function as the product
The Log-Likelihood function becomes
The Pseudo-Log-Likelihood function, defined in Eqs. 10 and 11, takes now the expression
The difference between the Log-Likelihood function and the Pseudo-Log-Likelihood function clearly appears by comparing Eqs. 18 and 19.
We can now explicitly calculate the gradient of the Pseudo-Log-Likelihood with respect to the set of parameters \(J_{ij}\). From Eq. 12, we need to calculate the quantity \(<\mathbf{s }_i^{\mu }\cdot \mathbf{s }_j^{\mu }>_{i,\{ J\}}\). It is (for sake of clarity the index \(\mu\) is omitted)
The expression of \(Z_i(\{ J\})\) is reported in Eq. 15. It is possible to rewrite it, for a given index \(\gamma\), as \(Z_i\propto \frac{2\ \ \sinh (A_i^{\gamma })}{A_i^{\gamma }} \prod _{\alpha \ne \gamma }^{1,D} \frac{2\ \ \sinh (A_i^{\alpha })}{A_i^{\alpha }}\). By inserting this latter expression in Eq. 20, we obtain
and finallyFootnote 4
When we are dealing with interrelations among two different disciplines, labeled as γ and δ, in place of Eq. (6), the Hamiltonian of the system is \(H=-\frac{1}{2}\sum _{i,j} J_{ij}^{\gamma \delta }(s_i^{\gamma }s_j^{\delta }+s_i^{\delta }s_j^{\gamma })\). In this case, Eqs. (15), (21) and (22) should be changed consistently. This, however, does not introduce any further drawbacks. For example Eq. (A11) becomes
where \(A_i^{\gamma }(\{ J^{\gamma \delta }\})=\frac{1}{2}\sum _{j}^{1,N}J_{ij}^{\gamma \delta }s_j^{\gamma }\).
Daraio, C., Fabbri, F., Gavazzi, G. et al. Assessing the interdependencies between scientific disciplinary profiles. Scientometrics 116, 1785–1803 (2018). https://doi.org/10.1007/s11192-018-2816-5
