Class SparkStream<T>

    • Constructor Detail

      • SparkStream

        public SparkStream​(MStream<T> mStream)
        Instantiates a new Spark stream.
        Parameters:
        mStream - the m stream
      • SparkStream

        public SparkStream​(org.apache.spark.api.java.JavaRDD<T> rdd)
        Instantiates a new Spark stream.
        Parameters:
        rdd - the rdd
    • Method Detail

      • toDistributedStream

        public SparkStream<T> toDistributedStream()
        Description copied from interface: MStream
        To distributed stream spark stream.
        Specified by:
        toDistributedStream in interface MStream<T>
        Returns:
        A distributed version of the stream
      • cache

        public SparkStream<T> cache()
        Description copied from interface: MStream
        Caches the stream.
        Specified by:
        cache in interface MStream<T>
        Returns:
        the cached stream
      • collect

        public <R> R collect​(Collector<? super T,​?,​R> collector)
        Description copied from interface: MStream
        Performs a reduction on the string using hte given collector.
        Specified by:
        collect in interface MStream<T>
        Type Parameters:
        R - the component type of the collection after applying the collector
        Parameters:
        collector - the collector to use in reducing the stream
        Returns:
        the result of the collector
      • collect

        public List<T> collect()
        Description copied from interface: MStream
        Collects the items in the stream as a list
        Specified by:
        collect in interface MStream<T>
        Returns:
        the list of items in the stream
      • count

        public long count()
        Description copied from interface: MStream
        The number of items in the stream
        Specified by:
        count in interface MStream<T>
        Returns:
        the number of items in the stream
      • countByValue

        public Map<T,​Long> countByValue()
        Description copied from interface: MStream
        Counts the number of times each item occurs in the stream
        Specified by:
        countByValue in interface MStream<T>
        Returns:
        a map of object - long counts
      • distinct

        public SparkStream<T> distinct()
        Description copied from interface: MStream
        Removes duplicates from the stream
        Specified by:
        distinct in interface MStream<T>
        Returns:
        the new stream without duplicates
      • filter

        public SparkStream<T> filter​(SerializablePredicate<? super T> predicate)
        Description copied from interface: MStream
        Filters the stream.
        Specified by:
        filter in interface MStream<T>
        Parameters:
        predicate - the predicate to use to determine which objects are kept
        Returns:
        the new stream
      • first

        public Optional<T> first()
        Description copied from interface: MStream
        Gets the first item in the stream
        Specified by:
        first in interface MStream<T>
        Returns:
        the optional containing the first item
      • flatMap

        public <R> SparkStream<R> flatMap​(SerializableFunction<? super T,​Stream<? extends R>> mapper)
        Description copied from interface: MStream
        Maps the objects in this stream to one or more new objects using the given function.
        Specified by:
        flatMap in interface MStream<T>
        Type Parameters:
        R - the component type of the returning stream
        Parameters:
        mapper - the function to use to map objects
        Returns:
        the new stream
      • flatMapToPair

        public <R,​U> SparkPairStream<R,​U> flatMapToPair​(SerializableFunction<? super T,​Stream<? extends Map.Entry<? extends R,​? extends U>>> function)
        Description copied from interface: MStream
        Maps the objects in this stream to one or more new key-value pairs using the given function.
        Specified by:
        flatMapToPair in interface MStream<T>
        Type Parameters:
        R - the key type parameter
        U - the value type parameter
        Parameters:
        function - the function to use to map objects
        Returns:
        the new pair stream
      • fold

        public T fold​(T zeroValue,
                      SerializableBinaryOperator<T> operator)
        Description copied from interface: MStream
        Performs a reduction on the elements of this stream using the given binary operator.
        Specified by:
        fold in interface MStream<T>
        Parameters:
        zeroValue - The initial value
        operator - the binary operator used to combine two objects
        Returns:
        the optional describing the reduction
      • forEach

        public void forEach​(SerializableConsumer<? super T> consumer)
        Description copied from interface: MStream
        Performs an operation on each item in the stream
        Specified by:
        forEach in interface MStream<T>
        Parameters:
        consumer - the consumer action to perform
      • forEachLocal

        public void forEachLocal​(SerializableConsumer<? super T> consumer)
        Description copied from interface: MStream
        Performs an operation on each item in the stream ensuring that is done locally and not distributed.
        Specified by:
        forEachLocal in interface MStream<T>
        Parameters:
        consumer - the consumer action to perform
      • getRDD

        public org.apache.spark.api.java.JavaRDD<T> getRDD()
        Gets the wrapped rdd.
        Returns:
        the rdd
      • groupBy

        public <U> SparkPairStream<U,​Iterable<T>> groupBy​(SerializableFunction<? super T,​? extends U> function)
        Description copied from interface: MStream
        Groups the items in the stream using the given function that maps objects to key values
        Specified by:
        groupBy in interface MStream<T>
        Type Parameters:
        U - the key type parameter
        Parameters:
        function - the function that determines the key of the objects in the stream
        Returns:
        the new pair stream
      • isEmpty

        public boolean isEmpty()
        Description copied from interface: MStream
        Determines if the stream is empty or not
        Specified by:
        isEmpty in interface MStream<T>
        Returns:
        True if empty, False otherwise
      • iterator

        public Iterator<T> iterator()
        Description copied from interface: MStream
        Gets an iterator for the stream
        Specified by:
        iterator in interface Iterable<T>
        Specified by:
        iterator in interface MStream<T>
        Returns:
        the iterator of items in the stream
      • javaStream

        public Stream<T> javaStream()
        Description copied from interface: MStream
        Converts this stream into a java stream
        Specified by:
        javaStream in interface MStream<T>
        Returns:
        the java stream
      • limit

        public SparkStream<T> limit​(long number)
        Description copied from interface: MStream
        Limits the stream to the first number items.
        Specified by:
        limit in interface MStream<T>
        Parameters:
        number - the number of items desired
        Returns:
        the new stream of size number
      • map

        public <R> SparkStream<R> map​(SerializableFunction<? super T,​? extends R> function)
        Description copied from interface: MStream
        Maps the objects in the stream using the given function
        Specified by:
        map in interface MStream<T>
        Type Parameters:
        R - the component type of the returning stream
        Parameters:
        function - the function to use to map objects
        Returns:
        the new stream
      • mapPartitions

        public <R> SparkStream<R> mapPartitions​(SerializableFunction<Iterator<? super T>,​Stream<R>> function)
        Maps the objects in the stream by block using the given function
        Type Parameters:
        R - the component type of the returning stream
        Parameters:
        function - the function to use to map objects
        Returns:
        the new stream
      • mapToDouble

        public com.gengoai.stream.spark.SparkDoubleStream mapToDouble​(SerializableToDoubleFunction<? super T> function)
        Description copied from interface: MStream
        Maps objects in this stream to double values
        Specified by:
        mapToDouble in interface MStream<T>
        Parameters:
        function - the function to convert objects to doubles
        Returns:
        the new double stream
      • mapToPair

        public <R,​U> SparkPairStream<R,​U> mapToPair​(SerializableFunction<? super T,​? extends Map.Entry<? extends R,​? extends U>> function)
        Description copied from interface: MStream
        Maps the objects in this stream to a key-value pair using the given function.
        Specified by:
        mapToPair in interface MStream<T>
        Type Parameters:
        R - the key type parameter
        U - the value type parameter
        Parameters:
        function - the function to use to map objects
        Returns:
        the new pair stream
      • max

        public Optional<T> max​(SerializableComparator<? super T> comparator)
        Description copied from interface: MStream
        Returns the max item in the stream using the given comparator to compare items.
        Specified by:
        max in interface MStream<T>
        Parameters:
        comparator - the comparator to use to compare values in the stream
        Returns:
        the optional containing the max value
      • min

        public Optional<T> min​(SerializableComparator<? super T> comparator)
        Description copied from interface: MStream
        Returns the min item in the stream using the given comparator to compare items.
        Specified by:
        min in interface MStream<T>
        Parameters:
        comparator - the comparator to use to compare values in the stream
        Returns:
        the optional containing the min value
      • onClose

        public MStream<T> onClose​(SerializableRunnable closeHandler)
        Description copied from interface: MStream
        Sets the handler to call when the stream is closed. Typically, this is to clean up any open resources, such as file handles.
        Specified by:
        onClose in interface MStream<T>
        Parameters:
        closeHandler - the handler to run when the stream is closed.
        Returns:
        the m stream
      • persist

        public MStream<T> persist​(StorageLevel storageLevel)
        Description copied from interface: MStream
        Persists the stream to the given storage level
        Specified by:
        persist in interface MStream<T>
        Parameters:
        storageLevel - the storage level
        Returns:
        the persisted MStream
      • parallel

        public SparkStream<T> parallel()
        Description copied from interface: MStream
        Ensures that the stream is parallel or distributed.
        Specified by:
        parallel in interface MStream<T>
        Returns:
        the new stream
      • partition

        public MStream<Stream<T>> partition​(long partitionSize)
        Description copied from interface: MStream
        Partitions the stream into iterables each of size <= partitionSize.
        Specified by:
        partition in interface MStream<T>
        Parameters:
        partitionSize - the desired number of objects in each partition
        Returns:
        the new stream
      • reduce

        public Optional<T> reduce​(SerializableBinaryOperator<T> reducer)
        Description copied from interface: MStream
        Performs a reduction on the elements of this stream using the given binary operator.
        Specified by:
        reduce in interface MStream<T>
        Parameters:
        reducer - the binary operator used to combine two objects
        Returns:
        the optional describing the reduction
      • repartition

        public SparkStream<T> repartition​(int numPartitions)
        Description copied from interface: MStream
        Repartitions the stream to the given number of partitions. This may be a no-op for some streams, i.e. Local Streams.
        Specified by:
        repartition in interface MStream<T>
        Parameters:
        numPartitions - the number of partitions the stream should have
        Returns:
        the new stream
      • sample

        public SparkStream<T> sample​(boolean withReplacement,
                                     int number)
        Description copied from interface: MStream
        Randomly samples number items from the stream.
        Specified by:
        sample in interface MStream<T>
        Parameters:
        withReplacement - true allow a single item to be represented in the sample multiple times, false allow a single item to only be picked once.
        number - the number of items desired in the sample
        Returns:
        the new stream
      • saveAsTextFile

        public void saveAsTextFile​(Resource location)
        Description copied from interface: MStream
        Save as the stream to a text file at the given location. Writing may result in multiple files being created.
        Specified by:
        saveAsTextFile in interface MStream<T>
        Parameters:
        location - the location to write the stream to
      • saveAsTextFile

        public void saveAsTextFile​(String location)
        Description copied from interface: MStream
        Save as the stream to a text file at the given location. Writing may result in multiple files being created.
        Specified by:
        saveAsTextFile in interface MStream<T>
        Parameters:
        location - the location to write the stream to
      • shuffle

        public SparkStream<T> shuffle()
        Description copied from interface: MStream
        Shuffles the items in the stream.
        Specified by:
        shuffle in interface MStream<T>
        Returns:
        the new stream
      • shuffle

        public SparkStream<T> shuffle​(Random random)
        Description copied from interface: MStream
        Shuffles the items in the string using the given Random object.
        Specified by:
        shuffle in interface MStream<T>
        Parameters:
        random - the random number generator
        Returns:
        the new stream
      • skip

        public SparkStream<T> skip​(long n)
        Description copied from interface: MStream
        Skips the first n items in the stream
        Specified by:
        skip in interface MStream<T>
        Parameters:
        n - the number of items in the stream
        Returns:
        the new stream
      • sortBy

        public <R extends Comparable<R>> MStream<T> sortBy​(boolean ascending,
                                                           SerializableFunction<? super T,​? extends R> keyFunction)
        Description copied from interface: MStream
        Sorts the items in the stream in ascending or descending order using the given keyFunction to determine how to compare.
        Specified by:
        sortBy in interface MStream<T>
        Type Parameters:
        R - the type parameter
        Parameters:
        ascending - determines if the items should be sorted in ascending (true) or descending (false) order
        keyFunction - function to use to convert the items in the stream to something that is comparable.
        Returns:
        the new stream
      • take

        public List<T> take​(int n)
        Description copied from interface: MStream
        Takes the first n items from the stream.
        Specified by:
        take in interface MStream<T>
        Parameters:
        n - the number of items to take
        Returns:
        a list of the first n items
      • isDistributed

        public boolean isDistributed()
        Description copied from interface: MStream
        Is distributed boolean.
        Specified by:
        isDistributed in interface MStream<T>
        Returns:
        True if the stream is distributed
      • intersection

        public MStream<T> intersection​(MStream<T> other)
        Description copied from interface: MStream
        Returns a new MStream containing the intersection of elements in this stream and the argument stream.
        Specified by:
        intersection in interface MStream<T>
        Parameters:
        other - Stream to perform intersection with
        Returns:
        the new stream
      • union

        public SparkStream<T> union​(MStream<T> other)
        Description copied from interface: MStream
        Unions this stream with another.
        Specified by:
        union in interface MStream<T>
        Parameters:
        other - the other stream to add to this one.
        Returns:
        the new stream
      • zip

        public <U> SparkPairStream<T,​U> zip​(MStream<U> other)
        Description copied from interface: MStream

        Zips (combines) this stream together with the given other creating a pair stream. For example, if this stream contains [1,2,3] and stream 2 contains [4,5,6] the result would be a pair stream containing the key value pairs [(1,4), (2,5), (3,6)]. Note that the length of the resulting stream will be the minimum of the two streams.

        Specified by:
        zip in interface MStream<T>
        Type Parameters:
        U - the component type of the second stream
        Parameters:
        other - the stream making up the value in the resulting entries
        Returns:
        a new pair stream with keys from this stream and values for the other stream
      • zipWithIndex

        public SparkPairStream<T,​Long> zipWithIndex()
        Description copied from interface: MStream
        Creates a pair stream where the keys are items in this stream and values are the index (starting at 0) of the item in the stream.
        Specified by:
        zipWithIndex in interface MStream<T>
        Returns:
        the new pair stream
      • updateConfig

        public void updateConfig()
        Description copied from interface: MStream
        Updates the config instance used for this stream
        Specified by:
        updateConfig in interface MStream<T>