Package com.gengoai.hermes.extraction
Class NGramExtractor
- java.lang.Object
-
- com.gengoai.apollo.ml.feature.Featurizer<HString>
-
- com.gengoai.hermes.extraction.FeaturizingExtractor
-
- com.gengoai.hermes.extraction.MultiPhaseExtractor
-
- com.gengoai.hermes.extraction.NGramExtractor
-
- All Implemented Interfaces:
FeatureExtractor<HString>
,ObservationExtractor<HString>
,Copyable<FeaturizingExtractor>
,Extractor
,Serializable
public class NGramExtractor extends MultiPhaseExtractor
AMultiPhaseExtractor
implementation that extracts n-grams over the desired annotation types. In addition to the standard extraction methods, this extractor provides theextractStringTuples(HString)
method for returning a list of String tuples of the extractions.- Author:
- David B. Bracewell
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
NGramExtractor.Builder
Builder Class for constructingNGramExtractor
-
Nested classes/interfaces inherited from class com.gengoai.hermes.extraction.MultiPhaseExtractor
MultiPhaseExtractor.MultiPhaseExtractorBuilder<T extends MultiPhaseExtractor,V extends MultiPhaseExtractor.MultiPhaseExtractorBuilder<T,V>>
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static NGramExtractor.Builder
bigrams()
static NGramExtractor.Builder
builder()
static NGramExtractor.Builder
builder(int n)
Creates a builder initialized to extract n-grams of the given order.static NGramExtractor.Builder
builder(int minOrder, int maxOrder)
Creates a builder initialized to extract n-grams ranging from the given minimum to the given maximum order.protected Stream<HString>
createStream(HString hString)
Creates a stream of extractions from the given inputList<Tuple>
extractStringTuples(@NonNull HString hString)
Extracts NGrams as a List of String tuples.NGramExtractor.Builder
toBuilder()
Converts the Extractor into a builder.String
toString()
static NGramExtractor.Builder
trigrams()
-
Methods inherited from class com.gengoai.hermes.extraction.MultiPhaseExtractor
applyAsFeatures, copy, extract
-
Methods inherited from class com.gengoai.apollo.ml.feature.Featurizer
booleanFeaturizer, chain, chain, countFeaturizer, extractObservation, multiValueFeaturizer, predicateFeaturizer, realFeaturizer, valueFeaturizer, valueFeaturizer, withContext, withContext
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface com.gengoai.apollo.ml.feature.FeatureExtractor
contextualize, extractSequence
-
-
-
-
Method Detail
-
bigrams
public static NGramExtractor.Builder bigrams()
- Returns:
-
builder
public static NGramExtractor.Builder builder()
- Returns:
-
builder
public static NGramExtractor.Builder builder(int n)
Creates a builder initialized to extract n-grams of the given order.- Parameters:
n
- the n-gram order- Returns:
- the builder
-
builder
public static NGramExtractor.Builder builder(int minOrder, int maxOrder)
Creates a builder initialized to extract n-grams ranging from the given minimum to the given maximum order.- Parameters:
minOrder
- the minimum ordermaxOrder
- the maximum order- Returns:
- the builder
-
trigrams
public static NGramExtractor.Builder trigrams()
- Returns:
- An builder initialized for trigrams
-
createStream
protected Stream<HString> createStream(HString hString)
Description copied from class:MultiPhaseExtractor
Creates a stream of extractions from the given input- Specified by:
createStream
in classMultiPhaseExtractor
- Parameters:
hString
- the input text- Returns:
- the stream of extractions
-
extractStringTuples
public List<Tuple> extractStringTuples(@NonNull @NonNull HString hString)
Extracts NGrams as a List of String tuples.- Parameters:
hString
- the input text- Returns:
- the list of String tuples
-
toBuilder
public NGramExtractor.Builder toBuilder()
Description copied from class:MultiPhaseExtractor
Converts the Extractor into a builder.- Specified by:
toBuilder
in classMultiPhaseExtractor
- Returns:
- the builder initialized with values from this extractor
-
toString
public String toString()
- Overrides:
toString
in classMultiPhaseExtractor
-
-