Global Epsilon Reference

The SAMME.R algorithm [1] uses leaf purity information to calculate per-event scores. The procedure takes the geometric mean over the purity of each leaf node hit by an event. Leaves with signal or background purity of zero or very nearly zero can cause the entire geometric mean to go to zero, guaranteeing a final score of -1 (maximally background-like). As a result, the purity must be clipped to the open interval (0, 1) by enforcing a floating point offset from exactly 0 or 1. By default, this value is the standard C++ std::numeric_limits<double>::epsilon(). However, it may be useful to modify this threshold globally, particularly for compatibility with classifiers imported from older versions of scikit-learn. The following functions facilitate tuning this threshold.

[1] J. Zhu, H. Zou, S. Rosset, T. Hastie. “Multi-class AdaBoost”, 2009.

Functions:

pybdt.ml.set_epsilon(eps)

Set the global epsilon value.

Parameters:

eps (float) – The value to use to trim purity from [0,1] to [epsilon,1-epsilon] when using purity for training or scoring

pybdt.ml.get_epsilon()

Get the global epsilon value.

Returns:

The value used to trim purity from [0,1] to [epsilon,1-epsilon] when using purity for training or scoring