Weighting in Monte Carlo Simulation

This page provides an introduction to the principle of biased sampling in Monte Carlo simulation and the definition of weights in IceCube.

Monte Carlo Integration

Monte Carlo is a method for calculating the value of definite integrals using random numbers. Consider the multidimensional integral

\[I = \int_\Omega f(\bar{x}) \mathrm{d}\bar{x}\]

where \(\bar{x}\) is an m-dimensional vector and \(\Omega\) is a subset of \(\mathbb{R}^m\) with a volume of

\[V = \int_\Omega \mathrm{d}\bar{x}\]

The integral can be approximated by the statement

\[I \approx \frac{V}{N} \sum_{i=1}^N f(\bar{x}_i)\]

where \(\bar{x}_1, ..., \bar{x}_N\) is sampled from \(\Omega\). If \(\bar{x}_i\) are points sampled from a grid evenly spaced across \(\Omega\) then this is known as a Riemann sum. However, if \(\bar{x}_i\) are points randomly sampled from \(\Omega\) then this is known as Monte Carlo integration. In general Riemann sums are more efficient for 1 dimension and Monte Carlo is more efficient at higher dimensions.

Biased Sampling

There is no reason that the values of \(\bar{x}_i\) need to be sampled from a uniform distribution on \(\Omega\). It is often advantageous to sample from some other probability distribution function, which is denoted by \(p(\bar{x_i})\). In this case the integral becomes

\[I \approx \frac{1}{N} \sum_{i=1}^{N} \frac{f(\bar{x}_i)}{p(\bar{x}_i)}\]

Note that if \(p\) is the uniform distribution then \(p(\bar{x}_i) = 1 / V\) which simplifies to the statement above for \(I\).

If two samples were produced from two different distributions: \(N_1\) events drawn from \(p_1\) and \(N_2\) events drawn from \(p_2\) then the total pdf for the combined sample \(\bar{x}_1, ..., \bar{x}_{N_1+N_2}\) becomes

\[p(\bar{x}_i) = \frac{N_1 \cdot p_1(\bar{x}_i) + N_2 \cdot p_2(\bar{x}_i)}{N_1 + N_2}\]

So that the Monte Carlo integral statement becomes

\[I \approx \sum_{i=1}^{N_1+N_2} \frac{f(\bar{x}_i)}{N_1 \cdot p_1(\bar{x}_i) + N_2 \cdot p_2(\bar{x}_i)}\]

To make things easier to to keep track of we can introduce a quantity called the generation bias \(g(\bar{x}_i)\) such that

\[I \approx \sum_{i=1}^{N} g(\bar{x}_i) \cdot f(\bar{x}_i)\]

where the generation bias generalized to M samples is defined as

\[g(\bar{x}_i) = \left({\sum_{j=1}^M N_j \cdot p_j(\bar{x}_i)}\right)^{-1}\]

Note that this result holds even for samples drawn from disjoint surfaces. As long as the pdf for the \(j^{th}\) sample is defined such that \(p_j(\bar{x_i}) = 0\) for events outside of \(\Omega_j\), then the Monte Carlo integral will give the correct answer for integration on the surface of the union of all the \(\Omega_j\).