Uses of Interface
com.gengoai.hermes.corpus.DocumentCollection
-
-
Uses of DocumentCollection in com.gengoai.hermes.corpus
Subinterfaces of DocumentCollection in com.gengoai.hermes.corpus Modifier and Type Interface Description interface
Corpus
A persistent collection of documents each having a unique document ID.interface
SearchResults
Methods in com.gengoai.hermes.corpus that return DocumentCollection Modifier and Type Method Description default DocumentCollection
DocumentCollection. annotate(@NonNull AnnotatableType... annotatableTypes)
Annotates this corpus with the given annotation types and returns a new corpus with the given annotation types presentdefault DocumentCollection
DocumentCollection. apply(@NonNull SerializableFunction<HString,HString> function)
default DocumentCollection
DocumentCollection. apply(@NonNull TokenRegex pattern, @NonNull SerializableConsumer<TokenMatch> onMatch)
Applies token regular expression to the corpus creating annotations of the given type for matches.default DocumentCollection
DocumentCollection. apply(@NonNull Lexicon lexicon, @NonNull SerializableConsumer<HString> onMatch)
Applies a lexicon to the corpus creating annotations of the given type for matches.default DocumentCollection
DocumentCollection. cache()
Caches any actions performed on this collection.static DocumentCollection
DocumentCollection. create(@NonNull Document... documents)
Creates a document collection for one or more documents.static DocumentCollection
DocumentCollection. create(@NonNull Specification specification)
Creates a document collection from a specification detailing the document format and path of the documents.static DocumentCollection
DocumentCollection. create(@NonNull MStream<Document> documents)
Creates a document collection for a stream of documents.static DocumentCollection
DocumentCollection. create(@NonNull Iterable<Document> documents)
Creates a document collection for one or more documents.static DocumentCollection
DocumentCollection. create(@NonNull String specification)
Creates a document collection from a specification detailing the document format and path of the documents.static DocumentCollection
DocumentCollection. create(@NonNull Stream<Document> documents)
Creates a document collection for a stream of documents.default DocumentCollection
DocumentCollection. filter(@NonNull SerializablePredicate<Document> predicate)
Filters the documents in the collection using the given predicatedefault DocumentCollection
DocumentCollection. repartition(int numPartitions)
Repartitions the corpus.default DocumentCollection
DocumentCollection. sample(int size)
Create a sample of this corpus using Reservoir sampling.default DocumentCollection
DocumentCollection. sample(int count, @NonNull Random random)
Create a sample of this corpus using Reservoir sampling.default DocumentCollection
DocumentCollection. update(@NonNull CaduceusProgram program)
Updates all documents in the corpus using the givenCaduceusProgram
DocumentCollection
DocumentCollection. update(String operationName, @NonNull SerializableConsumer<Document> documentProcessor)
Updates all documents in the corpus using the given document processorMethods in com.gengoai.hermes.corpus with parameters of type DocumentCollection Modifier and Type Method Description static ProgressLogger
ProgressLogger. create(@NonNull DocumentCollection owner, @NonNull String operation)
-
Uses of DocumentCollection in com.gengoai.hermes.extraction.keyword
Methods in com.gengoai.hermes.extraction.keyword with parameters of type DocumentCollection Modifier and Type Method Description void
KeywordExtractor. fit(@NonNull DocumentCollection corpus)
In certain cases a keyword extractor needs to collect corpus level statistics or construct a model of what a good keyword looks like.void
NPClusteringKeywordExtractor. fit(DocumentCollection corpus)
void
RakeKeywordExtractor. fit(DocumentCollection corpus)
void
TermKeywordExtractor. fit(DocumentCollection corpus)
void
TextRank. fit(@NonNull DocumentCollection corpus)
void
TFIDFKeywordExtractor. fit(DocumentCollection corpus)
-
Uses of DocumentCollection in com.gengoai.hermes.extraction.summarization
Methods in com.gengoai.hermes.extraction.summarization with parameters of type DocumentCollection Modifier and Type Method Description void
Summarizer. fit(@NonNull DocumentCollection corpus)
In certain cases a Summarizer needs to collect corpus level statistics or construct a model of what a good summarization looks like.void
TextRankSummarizer. fit(@NonNull DocumentCollection corpus)
-
Uses of DocumentCollection in com.gengoai.hermes.format
Methods in com.gengoai.hermes.format with parameters of type DocumentCollection Modifier and Type Method Description void
CsvFormat. write(DocumentCollection corpus, Resource outputResource)
void
DocFormat. write(DocumentCollection documentCollection, Resource outputResource)
Writes a corpus of documents in this format to the given output resourcedefault void
OneDocPerFileFormat. write(DocumentCollection corpus, Resource outputResource)
-
Uses of DocumentCollection in com.gengoai.hermes.ml
Methods in com.gengoai.hermes.ml with parameters of type DocumentCollection Modifier and Type Method Description void
EntityTagger. estimate(@NonNull DocumentCollection documentCollection)
default void
HStringMLModel. estimate(@NonNull DocumentCollection documentCollection)
Estimate.default DataSet
HStringMLModel. transform(@NonNull DocumentCollection documentCollection)
-
Uses of DocumentCollection in com.gengoai.hermes.similarity
Methods in com.gengoai.hermes.similarity with parameters of type DocumentCollection Modifier and Type Method Description void
EmbeddingSimilarity. fit(@NonNull DocumentCollection corpus)
void
ExtractorBasedSimilarity. fit(@NonNull DocumentCollection corpus)
void
HStringSimilarity. fit(@NonNull DocumentCollection corpus)
In certain cases a HStringSimilarity needs to collect corpus level statistics to determine similarity. -
Uses of DocumentCollection in com.gengoai.hermes.workflow
Methods in com.gengoai.hermes.workflow that return DocumentCollection Modifier and Type Method Description DocumentCollection
Action. process(DocumentCollection corpus, Context context)
Process corpus.DocumentCollection
SequentialWorkflow. process(@NonNull DocumentCollection input, @NonNull Context context)
Process corpus.DocumentCollection
Workflow. process(@NonNull DocumentCollection input, @NonNull Context context)
Process corpus.Methods in com.gengoai.hermes.workflow with parameters of type DocumentCollection Modifier and Type Method Description default State
Action. loadPreviousState(DocumentCollection corpus, Context context)
Loads from a previous processing state.DocumentCollection
Action. process(DocumentCollection corpus, Context context)
Process corpus.DocumentCollection
SequentialWorkflow. process(@NonNull DocumentCollection input, @NonNull Context context)
Process corpus.DocumentCollection
Workflow. process(@NonNull DocumentCollection input, @NonNull Context context)
Process corpus. -
Uses of DocumentCollection in com.gengoai.hermes.workflow.actions
Methods in com.gengoai.hermes.workflow.actions that return DocumentCollection Modifier and Type Method Description protected DocumentCollection
TermCounts. onComplete(DocumentCollection corpus, Context context, Counter<String> counts)
On complete corpus.DocumentCollection
Annotate. process(@NonNull DocumentCollection corpus, @NonNull Context context)
DocumentCollection
ImportDocuments. process(DocumentCollection corpus, Context context)
DocumentCollection
KeywordExtraction. process(DocumentCollection corpus, Context context)
DocumentCollection
SpellChecker. process(@NonNull DocumentCollection corpus, @NonNull Context context)
DocumentCollection
TermCounts. process(DocumentCollection corpus, Context context)
Methods in com.gengoai.hermes.workflow.actions with parameters of type DocumentCollection Modifier and Type Method Description protected DocumentCollection
TermCounts. onComplete(DocumentCollection corpus, Context context, Counter<String> counts)
On complete corpus.DocumentCollection
Annotate. process(@NonNull DocumentCollection corpus, @NonNull Context context)
DocumentCollection
ImportDocuments. process(DocumentCollection corpus, Context context)
DocumentCollection
KeywordExtraction. process(DocumentCollection corpus, Context context)
DocumentCollection
SpellChecker. process(@NonNull DocumentCollection corpus, @NonNull Context context)
DocumentCollection
TermCounts. process(DocumentCollection corpus, Context context)
-