.. _man_application: Classifier Application ====================== Once a classifier has been trained and validated to the your liking, you are ready to use it to classify events of otherwise unknown class. This can be done for individual samples or for whole ensembles at a time. Here we go over methods of classifier application provided by pybdt. Scoring with IceTray -------------------- For IceCube data, it may be convenient to store a classifier score as an I3Double in each Physics frame. This can be done relatively simply with :class:`icecube.pybdtmodule.PyBDTModule`. For the :ref:`ABC example `, you can calculate and store the scores like so:: from icecube.icetray import I3Tray from icecube import dataio from icecube.pybdtmodule import PyBDTModule tray = I3Tray() [...] def varsfunc (frame): a = frame['A'].value b = frame['B'].value c = frame['C'].value out = dict (a=a, b=b, c=c) return out tray.AddModule (PyBDTModule, 'bdt', BDTFilename='path/to/sample.bdt', varsfunc=varsfunc, OutputName='Score' ) [...] It is up to the user to provide ``varsfunc()``, which extracts the relevant features from the frame for use by :class:`icecube.pybdtmodule.PyBDTModule`. An alternative implemmentation would inspect each frame directly for I3Double's corresponding to each required feature, but that approach would require unnecessary pollution of the frames when features are nested deep in frame objects. Consider possible features ``zenith = SplineMPE.dir.zenith`` or ``plogl = SPEFitFitParams.logl / (SPEFitFitParams.ndof + 1.5)``. The use of ``varsfunc()`` simplifies scoring when training features are derived from, but not exactly corresponding to, variables produced by the rest of the processing chain. Scoring without IceTray ----------------------- In other contexts, it is useful to calculate scores outside of the IceTray framework. This can be done for individual events or ensembles at a time. An individual event can be scored like so:: score = bdt.score_event ({'a': 2.7, 'b': -0.31, 'c': 116}) The ``dict`` argument can of course be calculated by any method so long as the keys match the features used by the classifier and the values are real numbers. Suppose one has an ensemble of events with numpy arrays ``a``, ``b`` and ``c`` holding one value per event. Then the per-event scores can be calculated like so:: scores = bdt.score_dict (dict (a=a, b=b, c=c)) If a :class:`pybdt.ml.DataSet` object ``data`` is already available, the per-event scores can be calculated like so:: scores = bdt.score_DataSet (data) In these last two exaples, the resulting ``scores`` are a numpy array with ``dtype=float``. A convienience method :attr:`pybdt.ml.BDTModel.score()` is provided which automatically calls the correct one of the above methods, given the input. Thus the following calls all work as expected:: score = bdt.score ({'a': 2.7, 'b': -0.31, 'c': 116}) scores = bdt.score (dict (a=a, b=b, c=c)) scores = bdt.score (data) Finally, note that the leaf purity :math:`p` can be used to scale the contribution of each tree in the model (see :ref:`man_dt_intro`). This is especially common for forests in which the trees are differentiated only by randomization but not by boosting. To enable this type of scoring, change the above calls to:: score = bdt.score ({'a': 2.7, 'b': -0.31, 'c': 116}, use_purity=True) scores = bdt.score (dict (a=a, b=b, c=c), use_purity=True) scores = bdt.score (data, use_purity=True)