SuperDST

A (lossy) compression format for trigger-level data.

Rationale

Certain types of analyses may be able to benefit by trading data fidelity for higher statistics. The most extreme example of this is the cosmic-ray anisotropy analysis using DST data, where everything but the coarsely-binned result of the online analysis is discarded before transfer to the North. This project defines a (lossy) compression format for the readout data itself, so that slightly more complicated processing and analysis can still be done after the fact.

The Pole filter starts with an event record (“payload”) provided by the DAQ. The record is an encoded bytestring that consists of an event header, a list of hit records, and a list of trigger records. The event header contains the run/event number and event start and end times, while each hit records contains a timestamp and one or more digitized waveforms from a DOM-level trigger (launch). Each trigger record contains a type, source, and configuration ID, as well as a start and end time and the indices of the hits participating in the trigger. Once in IceTray, the payload is decoded into an I3EventHeader, separate I3DOMLaunchSeriesMaps for IceTop and InIce detectors, and an I3TriggerHierarchy. The I3DOMLaunches are processed further into I3RecoPulses, which for the current IceTop processing (tpx) represent the leading-edge time of the PMT pulse and its integrated charge, and for InIce (wavedeform) represent the arrival time of a group of photons at the photocathode. Each IceTop waveform is represented by a single I3RecoPulse. While the number of I3RecoPulses needed to represent an InIce waveform depends on the complexity of the waveform, the single-PE waveforms that make up the vast majority of InIce data can be represented by a single pulse.

Prior to the 2011 Pole season, the Pole filter calibrated the raw waveforms and processed them into RecoPulses, but then discarded the RecoPulses and sent the encoded DAQ payload to the North, where low-level processing was repeated. SuperDST takes a different approach, sending a compressed representation of the RecoPulses instead of the full DAQ records for those launches that are adequately represented by a small number of RecoPulses, resulting in significant satellite bandwidth savings.

Using SuperDST

I3SuperDST and I3SuperDSTTriggerSeries are effectively compactly serialized versions of I3RecoPulseSeriesMap and I3TriggerHierarchy, respectively. They can be implicitly converted to their conventional representation by I3Frame, so client code doesn’t have to know whether a frame object uses SuperDST serialization or not.

C++

C++ code (e.g. in an I3Module) can treat I3SuperDST objects as if they were I3RecoPulseSeriesMap. The following code snippet works the same whether the frame object “I3SuperDST” is an I3RecoPulseSeriesMap or I3SuperDST (or, in fact, I3RecoPulseSeriesMapMask, I3RecoPulseSeriesMapUnion, or I3RecoHitSeriesMap):

I3RecoPulseSeriesMapConstPtr pulses = frame->Get<I3RecoPulseSeriesMapConstPtr>("I3SuperDST");

I3TriggerHierarchy and I3SuperDSTTriggerSeries are similarly interchangeable:

I3TriggerHierarchyConstPtr pulses = frame->Get<I3TriggerHierarchyConstPtr>("DSTTriggers");

Note

The frame must also contain an I3DetectorStatus for I3SuperDSTTriggerSeries decoding to work automatically as shown above. The frame will contain an I3DetectorStatus if the I3 file being read has GCD frames.

Python

The same mechanism works in Python, albeit with slightly different syntax. To use the automatic decoding facility, replace:

pulses = frame["I3SuperDST"]
triggers = frame["DSTTriggers"]

with:

pulses = dataclasses.I3RecoPulseSeriesMap.from_frame(frame, "I3SuperDST")
triggers = dataclasses.I3TriggerHierarchy.from_frame(frame, "DSTTriggers")

Format details

I3SuperDST

The I3SuperDST object contains a compressed representation of an I3RecoPulseSeriesMap. In the Pole filter, this contains all pulses from the InIce and IceTop detectors, merged into a single map.

Group headers

Pulses in a single DOM are sorted into groups spaced closely in time. Each pulse is converted to a charge stamp that contains a discretized version of the pulse’s charge and leading-edge time. The groups of charge stamps are ordered by start time; the time of a charge stamp represents the offset with respect to the previous charge stamp. This allows arbitrarily long events to be encoded efficiently.

Each group is represented on disk by a 2-byte header followed by one or more 2-byte charge stamps:

Group header: 2 bytes

Field

Discretization

Bits

DOM ID

String/OM

13

Leading time

1 ns

3

The first 13 bits of the header are used to encode the string and OM number of the DOM; the remaining 3 bits extend the range of the following 8-bit stamp time. In this way, time differences between groups can be represented up to 2047 ns [1], while time differences within groups must be less than 255 ns. Pulses more than 255 ns apart are split into separate groups.

Charge stamps: InIce

InIce Charge stamp: 2 bytes

Field

Discretization

Bits

Time offset

1 ns

8

Charge

0.05 PE

6

HLC bit

bool

1

Stop

bool

1

The charge is discretized at 0.05 PE, allowing individual pulse charges to be represented up to 1.6 PE [2]. The HLC bit marks whether the underlying pulse was extracted from a hard-local-coincidence DOM launch (containing a full ATWD+FADC readout) or a soft-local-coincidence launch (containing only select bins from the FADC for InIce DOMs, or the sum over the ATWD for IceTop). Finally, the stop bit indicates whether the group contains further charge stamps. This allows the group to contain as many charge stamps as can fit in the entire container.

Charge stamps: IceTop

IceTop charge stamps are stored in a slightly different format. In the current IceTop processing scheme there can be only one pulse per DOMLaunch, but the charges of the pulses can vary by many orders of magnitude. Accordingly, the charge is encoded logarithmically in 14 bits, with a step size of 9.0/((1<<14)-1) in the base-10 logarithm of pulse charge.

IceTop Charge stamp: 4 bytes

Field

Discretization

Bits

Time offset

1 ns

8

Charge

0.001099 in log10(PE)

14

HLC bit

bool

1

Stop

bool

1

Pulse widths

Each charge stamp has an associated width, stored in 4 bits as the base-2 logarithm of the pulse width in nanoseconds. The widths are stored in a run-length-coding scheme where N widths can be stored in 1+log2(N)/4 bytes in the best case (N identical widths) or 1+N bytes in the worst case (N different widths).

For InIce pulses the width gives the binning uncertainty in the time of the pulse, and storing this information in SuperDST preserves an order-of-magnitude estimate of this uncertainty. For IceTop the width is related to the pulse shape, and the SuperDST representation of the width adds a small amount of information versus storing only the pulse time and integrated charge.

In addition to streams of group headers and charge stamps, the serialized version of an I3SuperDST must store the total number of headers, charge stamps, and widths. Each of these sizes is encoded as a variable-width integer, where values up to 247 are encoded in a single byte, and values between 248 and UINT64_MAX are encoded in 1+(log2(N)+1)/8 bytes.

This (along with some overhead) brings the typical footprint of an I3SuperDST object to 18 bytes + 5 bytes per encoded pulse.

Total on-disk size of an I3SuperDST object

Purpose

Per group?

Per stamp?

Bytes

I3FrameObject overhead

12

Group count

1–9

Stamp count

1–9

Width count

4–36

Group header

2

Charge stamp (InIce)

2+N

Charge stamp (IceTop)

4

Pulse width

1

I3SuperDSTTriggerSeries

The SuperDST family of tools also includes a compression format for trigger records. Each record contains the start time and length of the trigger (encoded variable width integers) as well as index of the TriggerKey in I3DetectorStatus::triggerStatus. The index is used during decoding to reconstruct the type, source and configuration ID of the trigger. Only “configured” triggers are stored; if needed, I3GlobalTriggerSim can be used to insert global and throughput triggers to complete the trigger hierarchy.

SuperDSTTrigger

Field

Discretization

Bits

TriggerKey index

4

Start time

1 ns

12–76

Length

1 ns

8–72