.. SPDX-FileCopyrightText: © 2022 the SimWeights contributors .. .. SPDX-License-Identifier: BSD-2-Clause Weighting in Monte Carlo Simulation =================================== This page provides an introduction to the principle of biased sampling in Monte Carlo simulation and the definition of weights in IceCube. Monte Carlo Integration ----------------------- Monte Carlo is a method for calculating the value of definite integrals using random numbers. Consider the multidimensional integral .. math:: I = \int_\Omega f(\bar{x}) \mathrm{d}\bar{x} where :math:`\bar{x}` is an m-dimensional vector and :math:`\Omega` is a subset of :math:`\mathbb{R}^m` with a volume of .. math:: V = \int_\Omega \mathrm{d}\bar{x} The integral can be approximated by the statement .. math:: I \approx \frac{V}{N} \sum_{i=1}^N f(\bar{x}_i) where :math:`\bar{x}_1, ..., \bar{x}_N` is sampled from :math:`\Omega`. If :math:`\bar{x}_i` are points sampled from a grid evenly spaced across :math:`\Omega` then this is known as a Riemann sum. However, if :math:`\bar{x}_i` are points randomly sampled from :math:`\Omega` then this is known as Monte Carlo integration. In general Riemann sums are more efficient for 1 dimension and Monte Carlo is more efficient at higher dimensions. Biased Sampling --------------- There is no reason that the values of :math:`\bar{x}_i` need to be sampled from a uniform distribution on :math:`\Omega`. It is often advantageous to sample from some other probability distribution function, which is denoted by :math:`p(\bar{x_i})`. In this case the integral becomes .. math:: I \approx \frac{1}{N} \sum_{i=1}^{N} \frac{f(\bar{x}_i)}{p(\bar{x}_i)} Note that if :math:`p` is the uniform distribution then :math:`p(\bar{x}_i) = 1 / V` which simplifies to the statement above for :math:`I`. If two samples were produced from two different distributions: :math:`N_1` events drawn from :math:`p_1` and :math:`N_2` events drawn from :math:`p_2` then the total pdf for the combined sample :math:`\bar{x}_1, ..., \bar{x}_{N_1+N_2}` becomes .. math:: p(\bar{x}_i) = \frac{N_1 \cdot p_1(\bar{x}_i) + N_2 \cdot p_2(\bar{x}_i)}{N_1 + N_2} So that the Monte Carlo integral statement becomes .. math:: I \approx \sum_{i=1}^{N_1+N_2} \frac{f(\bar{x}_i)}{N_1 \cdot p_1(\bar{x}_i) + N_2 \cdot p_2(\bar{x}_i)} To make things easier to to keep track of we can introduce a quantity called the generation bias :math:`g(\bar{x}_i)` such that .. math:: I \approx \sum_{i=1}^{N} g(\bar{x}_i) \cdot f(\bar{x}_i) where the generation bias generalized to M samples is defined as .. math:: g(\bar{x}_i) = \left({\sum_{j=1}^M N_j \cdot p_j(\bar{x}_i)}\right)^{-1} Note that this result holds even for samples drawn from disjoint surfaces. As long as the pdf for the :math:`j^{th}` sample is defined such that :math:`p_j(\bar{x_i}) = 0` for events outside of :math:`\Omega_j`, then the Monte Carlo integral will give the correct answer for integration on the surface of the union of all the :math:`\Omega_j`.