The Monte Carlo replica method ============================== Formalism ~~~~~~~~~ The Monte Carlo replica approach (MCfit), which in turn was inspired by the NNPDF analysis of the quark and gluon substructure of protons. This method aims to construct a sampling of the probability distribution in the space of the experimental data, which then translates into a sampling of the probability distribution in the space of the EFT coefficients through an optimisation procedure where the best-fit values of the coefficients for each replica, :math:`\boldsymbol{c}^{(k)}`, are determined. Given an experimental measurement of a hard-scattering cross-section, denoted by :math:`\sigma_i^{\rm (exp)}`, with total uncorrelated uncertainty :math:`\delta_{i}^{\rm (stat)}` and :math:`n_{\rm sys}` correlated systematic uncertainties :math:`\delta^{\rm (sys)}_{i,\alpha}`, the :math:`N_{\rm rep}` artificial Monte Carlo (MC) replicas of the experimental data are generated as .. math:: \sigma_{i}^{(\mathrm{art})(k)} = \sigma_{i}^{\rm (exp)}\left( 1 + r_{i}^{(k)}\delta_{i}^{\rm (stat)} + \sum_{\alpha=1}^{n_{\rm sys}}r_{i,\alpha}^{(k)}\delta^{\rm (sys)}_{i,\alpha}\right) \quad k=1,\ldots,N_{rep} where the index :math:`i` runs from 1 to :math:`n_{\rm dat}` and :math:`r_{i}^{(k)}`, :math:`r_{i,\alpha}^{(k)}` are univariate Gaussian random numbers. Correlations between data points induced by systematic uncertainties are accounted for by ensuring that :math:`r^{(k)}_{i,\alpha}=r^{(k)}_{i',\alpha}`. It can be show that central values, variances, and covariances evaluated by averaging over the MC replicas reproduce the corresponding experimental values. A fit to the :math:`n_{\rm op}` degrees of freedom :math:`\boldsymbol{c}/\Lambda` is then performed for each of the MC replicas :math:`\sigma_{i}^{(\mathrm{art})(k)}` generated. The best-fit values are determined from the minimisation of the cost function .. math:: E^{(k)}({\boldsymbol c})\equiv \frac{1}{n_{\rm dat}}\sum_{i,j=1}^{n_{\rm dat}}\left( \sigma^{(\rm th)}_i\left( {\boldsymbol c}^{(k)}\right ) -\sigma^{{(\rm art)}(k)}_i\right ) ({\rm cov}^{-1})_{ij} \left ( \sigma^{(\rm th)}_j\left ( {\boldsymbol c}^{(k)} \right )-\sigma^{{(\rm art)}(k)}_j\right ) where :math:`\sigma^{(\rm th)}_i( {\boldsymbol c}^{(k)} )` indicates the theoretical prediction for the `i`-th cross-section evaluated with the `k`-th set of EFT coefficients. This process results in a collection of :math:`{\boldsymbol c}^{(k)}` best-fit coefficient values from which estimators such as expectation values, variances, and correlations are evaluated. The overall fit quality is then evaluated using the :math:`\chi^2` definition, where the central experimental values are compared to the mean theoretical prediction computed by the resulting fit replicas. As mentioned in :doc:`Uncertanties treatment`, various theoretical uncertainties are also included in the :math:`\chi^2` definition for some datasets. A consistent treatment of theoretical uncertainties in the fitting procedure means that these are not only included in the fit via the covariance matrix in :math:`\chi^2` definition, but also in the corresponding replica generation. In other words, the replicas are sampled according to a multi-Gaussian distribution defined by the total covariance matrix which receives contributions both of experimental and of theoretical origin. We therefore account for such errors in the generation of Monte Carlo replicas :math:`\sigma_{i}^{(\mathrm{art})(k)}`, :cite:`thennpdfcollaboration2019parton`. There are numerous advantages of using the MCfit method for global EFT analyses: * it does not require specific assumptions about the underlying probability distribution of the fit parameters, and in particular does not rely on the Gaussian approximation. * the computational cost scales in a much milder way with the number of operators :math:`n_{\rm op}` included in the fit as compared to NS. * Thirdly, it can be used to assess the impact of new datasets in the fit `a posteriori` with the Bayesian reweighting formalism. Optimisation ~~~~~~~~~~~~ In the top quark sector analysis of :cite:`Hartland:2019bjb`, the minimisation of Eq.~\eqref{eq:chi2definition} was achieved by a gradient descent method which relies on local variations of the error function. This choice is advantageous since :math:`E^{(k)}` is at most a quartic form of the fit parameters, and therefore evaluating its gradient is computationally efficient. In the current version of the code ( see :cite:`ethier2021combined` analysis) we allow for a more complex parameter space, using as optimiser a trust-region algorithm ``trust-constr`` available in the ``SciPy`` package. An advantage of this method is that it allows one to provide the optimiser with any combination of constraints on the coefficients, including existing bounds. This is a rather useful feature, since in many cases of interest one would like to restrict the EFT parameter space based on theoretical considerations, such as when accounting for the LEP EWPOs or in the top-philic scenario. Initial sampling range and bounds ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For each MC replica fit, the initial values of the fit coefficients :math:`{\boldsymbol c}^{(k)}` are initialised at random within a pre-defined range. This sampling range, as well as the boundaries imposed on the minimisation procedure for the poorly constrained parameters, are taken to be the same as those used in the :doc:`NS` procedure. That is, the sampling ranges for the global fits are derived from a one-parameter :math:`\chi^2` scanning procedure subsequently inflated to cover a sufficiently large parameter hyper-volume. Cross-validation ~~~~~~~~~~~~~~~~ Given the large dimensionality of the considered EFT parameter space, it is conceivable that the optimiser algorithm ends up fitting the statistical fluctuations of the experimental data rather than the underlying physical law. One way to prevent the minimiser from over-fitting the data is to use look-back cross-validation stopping. In this way, each replica dataset is randomly split with equal probability into two disjoint sets, known as the `training` and `validation` sets. Only the data points in the `training` set are then used to compute the figure of merit being minimised, while the data points in the `validation` set are monitored alongside the fit. The random assignment of the data points to the `training` or `validation` sets is different for each MC replica, and the splitting only occurs for experiments that contain more than 5 bins in the distribution. The fit is run for a fixed large number of iterations, and then the optimal stopping point of the fit is then determined as the iteration for which the figure of merit evaluated on the `validation` set, :math:`E^{(k)}_{\rm val}`, exhibits a global minimum. All in all, it is found that the risk of over-fitting is small and that MCfit results with and without cross-validation applied are reasonably similar. Quality selection criteria ~~~~~~~~~~~~~~~~~~~~~~~~~~ One disadvantage of optimisation strategies such as MCfit is that as the parameter space space is increased, the minimiser might sometimes converge on a local, rather than on the global, minimum. This is specially problematic in the quadratic EFT fits which often display quasi-degenerate minima. For this reason, it is important to implement post-fit quality selection criteria that indicate when a fitted replica should be kept and when it should be discarded. Here, a MC replica is kept if the total error function of the replica dataset, :math:`E_{\rm tot}^{(k)}`, satisfies .. math:: E_{\rm tot}^{(k)}\le 3