Package org.apache.beam.fn.harness.debug
Class DataSampler
- java.lang.Object
-
- org.apache.beam.fn.harness.debug.DataSampler
-
public class DataSampler extends java.lang.ObjectThe DataSampler is a global (per SDK Harness) object that facilitates taking and returning samples to the Runner Harness. The class is thread-safe with respect to executing ProcessBundleDescriptors. Meaning, different threads executing different PBDs can sample simultaneously, even if computing the same logical PCollection.
-
-
Constructor Summary
Constructors Constructor Description DataSampler()Creates a DataSampler to sample every 1000 elements while keeping a maximum of 10 in memory.DataSampler(int maxSamples, int sampleEveryN, java.lang.Boolean onlySampleExceptions)DataSampler(java.lang.Boolean onlySampleExceptions)Creates a DataSampler to sample every 1000 elements while keeping a maximum of 10 in memory.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static DataSamplercreate(org.apache.beam.sdk.options.PipelineOptions options)Optionally returns a DataSampler if the experiment "enable_data_sampling" is present or "enable_always_on_exception_sampling" is present.org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionResponse.BuilderhandleDataSampleRequest(org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionRequest request)Returns all collected samples.<T> OutputSampler<T>sampleOutput(java.lang.String pcollectionId, org.apache.beam.sdk.coders.Coder<T> coder)Creates and returns a class to sample the given PCollection in the given ProcessBundleDescriptor.
-
-
-
Constructor Detail
-
DataSampler
public DataSampler()
Creates a DataSampler to sample every 1000 elements while keeping a maximum of 10 in memory.
-
DataSampler
public DataSampler(java.lang.Boolean onlySampleExceptions)
Creates a DataSampler to sample every 1000 elements while keeping a maximum of 10 in memory.- Parameters:
onlySampleExceptions- If true, only samples elements from exceptions.
-
DataSampler
public DataSampler(int maxSamples, int sampleEveryN, java.lang.Boolean onlySampleExceptions)- Parameters:
maxSamples- Sets the maximum number of samples held in memory at once.sampleEveryN- Sets how often to sample.
-
-
Method Detail
-
create
@Nullable public static DataSampler create(org.apache.beam.sdk.options.PipelineOptions options)
Optionally returns a DataSampler if the experiment "enable_data_sampling" is present or "enable_always_on_exception_sampling" is present. Returns null is data sampling is not enabled or "disable_always_on_exception_sampling" experiment is given.- Parameters:
options- the pipeline options given to this SDK Harness.- Returns:
- the DataSampler if enabled or null, otherwise.
-
sampleOutput
public <T> OutputSampler<T> sampleOutput(java.lang.String pcollectionId, org.apache.beam.sdk.coders.Coder<T> coder)
Creates and returns a class to sample the given PCollection in the given ProcessBundleDescriptor. Uses the given coder encode samples as bytes when responding to a SampleDataRequest.Invoked by multiple bundle processing threads in parallel when a new bundle processor is being instantiated.
- Type Parameters:
T- The type of element contained in the PCollection.- Parameters:
pcollectionId- The PCollection to take intermittent samples from.coder- The coder associated with the PCollection. Coder may be from a nested context.- Returns:
- the OutputSampler corresponding to the unique PBD and PCollection.
-
handleDataSampleRequest
public org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionResponse.Builder handleDataSampleRequest(org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionRequest request)
Returns all collected samples. Thread-safe.- Parameters:
request- The instruction request from the FnApi. Filters based on the given SampleDataRequest.- Returns:
- Returns all collected samples.
-
-