Class DataSampler


  • public class DataSampler
    extends java.lang.Object
    The DataSampler is a global (per SDK Harness) object that facilitates taking and returning samples to the Runner Harness. The class is thread-safe with respect to executing ProcessBundleDescriptors. Meaning, different threads executing different PBDs can sample simultaneously, even if computing the same logical PCollection.
    • Constructor Summary

      Constructors 
      Constructor Description
      DataSampler()
      Creates a DataSampler to sample every 1000 elements while keeping a maximum of 10 in memory.
      DataSampler​(int maxSamples, int sampleEveryN, java.lang.Boolean onlySampleExceptions)  
      DataSampler​(java.lang.Boolean onlySampleExceptions)
      Creates a DataSampler to sample every 1000 elements while keeping a maximum of 10 in memory.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static DataSampler create​(org.apache.beam.sdk.options.PipelineOptions options)
      Optionally returns a DataSampler if the experiment "enable_data_sampling" is present or "enable_always_on_exception_sampling" is present.
      org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionResponse.Builder handleDataSampleRequest​(org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionRequest request)
      Returns all collected samples.
      <T> OutputSampler<T> sampleOutput​(java.lang.String pcollectionId, org.apache.beam.sdk.coders.Coder<T> coder)
      Creates and returns a class to sample the given PCollection in the given ProcessBundleDescriptor.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • DataSampler

        public DataSampler()
        Creates a DataSampler to sample every 1000 elements while keeping a maximum of 10 in memory.
      • DataSampler

        public DataSampler​(java.lang.Boolean onlySampleExceptions)
        Creates a DataSampler to sample every 1000 elements while keeping a maximum of 10 in memory.
        Parameters:
        onlySampleExceptions - If true, only samples elements from exceptions.
      • DataSampler

        public DataSampler​(int maxSamples,
                           int sampleEveryN,
                           java.lang.Boolean onlySampleExceptions)
        Parameters:
        maxSamples - Sets the maximum number of samples held in memory at once.
        sampleEveryN - Sets how often to sample.
    • Method Detail

      • create

        @Nullable
        public static DataSampler create​(org.apache.beam.sdk.options.PipelineOptions options)
        Optionally returns a DataSampler if the experiment "enable_data_sampling" is present or "enable_always_on_exception_sampling" is present. Returns null is data sampling is not enabled or "disable_always_on_exception_sampling" experiment is given.
        Parameters:
        options - the pipeline options given to this SDK Harness.
        Returns:
        the DataSampler if enabled or null, otherwise.
      • sampleOutput

        public <T> OutputSampler<T> sampleOutput​(java.lang.String pcollectionId,
                                                 org.apache.beam.sdk.coders.Coder<T> coder)
        Creates and returns a class to sample the given PCollection in the given ProcessBundleDescriptor. Uses the given coder encode samples as bytes when responding to a SampleDataRequest.

        Invoked by multiple bundle processing threads in parallel when a new bundle processor is being instantiated.

        Type Parameters:
        T - The type of element contained in the PCollection.
        Parameters:
        pcollectionId - The PCollection to take intermittent samples from.
        coder - The coder associated with the PCollection. Coder may be from a nested context.
        Returns:
        the OutputSampler corresponding to the unique PBD and PCollection.
      • handleDataSampleRequest

        public org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionResponse.Builder handleDataSampleRequest​(org.apache.beam.model.fnexecution.v1.BeamFnApi.InstructionRequest request)
        Returns all collected samples. Thread-safe.
        Parameters:
        request - The instruction request from the FnApi. Filters based on the given SampleDataRequest.
        Returns:
        Returns all collected samples.