Interface Document
-
- All Superinterfaces:
CharSequence
,Comparable<Span>
,HString
,Serializable
,Span
,StringLike
public interface Document extends HString
A document represents text content with an accompanying set of metadata (Attributes), linguistic overlays (Annotations), and relations between elements in the document. Documents are represented as
HString
with additional methods for adding, removing, and accessing the annotations over its content. Every document has an id associated with it, which should be unique within a corpus.Documents are created using a
DocumentFactory
, which defines the preprocessing (e.g whitespace and unicode normalization) steps (TextNormalizers) to be performed on raw text before creating a document and the default language with which the documents are written. Additionally, the Document class provides a number of conveniencecreate
methods for constructing documents using the default DocumentFactory instance.- Author:
- David B. Bracewell
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static class
Document.AnnotationBuilder
Annotation builder for creating annotations associated with a document
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description void
annotate(AnnotatableType... types)
Convenience method for annotating the document with the given annotatable types.Annotation
annotation(long id)
Gets an annotation on the document by its ID.default Document.AnnotationBuilder
annotationBuilder(@NonNull AnnotationType type)
Creates an annotation builder for adding annotations to the document.List<Annotation>
annotations(AnnotationType type, Span span)
Gets annotations of the given type that overlap with the given span.List<Annotation>
annotations(AnnotationType type, Span span, Predicate<? super Annotation> filter)
Gets annotations of the given type that overlap with the given span and meet the given filter.void
attach(Annotation annotation)
Attaches the given annotation to the document.default List<Annotation>
children(@NonNull String relation)
Gets all child annotations, i.e.Set<AnnotatableType>
completed()
Gets the set of completed AnnotatableType on this document.boolean
contains(Annotation annotation)
Determines if the given annotation is attached to this document.static Document
create(@NonNull String text)
Convenience method for creating a document using the default document factory.static Document
create(@NonNull String text, @NonNull Language language)
Convenience method for creating a document using the default document factory.static Document
create(@NonNull String text, @NonNull Language language, @NonNull Map<AttributeType<?>,?> attributes)
Convenience method for creating a document using the default document factory.static Document
create(@NonNull String id, @NonNull String text)
Convenience method for creating a document using the default document factory.static Document
create(@NonNull String id, @NonNull String text, @NonNull Language language)
Convenience method for creating a document using the default document factory.static Document
create(@NonNull String id, @NonNull String text, @NonNull Language language, @NonNull Map<AttributeType<?>,?> attributes)
Convenience method for creating a document using the default document factory.static Document
create(@NonNull String id, @NonNull String text, @NonNull Map<AttributeType<?>,?> attributes)
Convenience method for creating a document using the default document factory.static Document
create(@NonNull String text, @NonNull Map<AttributeType<?>,?> attributes)
Convenience method for creating a document using the default document factory.Annotation
createAnnotation(AnnotationType type, int start, int end, Map<AttributeType<?>,?> attributeMap)
Creates an annotation of the given type encompassing the given span and having the given attributes.Annotation
createAnnotation(AnnotationType type, int start, int end, Map<AttributeType<?>,?> attributeMap, List<Relation> relations)
Creates an annotation of the given type encompassing the given span and having the given attributes.default Document
document()
default List<Annotation>
enclosedAnnotations()
static Document
fromJson(@NonNull String jsonString)
Creates a document from a JSON representation (created by the write or toJson methods)String
getAnnotationProvider(AnnotatableType type)
Gets the provider for the given AnnotatableType when that type is completed on the document.String
getId()
Gets the id of the documentdefault Language
getLanguage()
default List<Annotation>
incoming(RelationType type, boolean includeSubAnnotations)
Gets all annotations that have relation with this HString as the target.default List<Annotation>
incoming(RelationType type, String value, boolean includeSubAnnotations)
Gets all annotations that have relation with this HString as the target.default List<Relation>
incomingRelations(boolean includeSubAnnotations)
Gets all incoming relations to this HString.default List<Relation>
incomingRelations(RelationType relationType, boolean includeSubAnnotations)
Gets all relations of the given type targeting this HString.boolean
isCompleted(AnnotatableType type)
Checks is if a givenAnnotatableType
is completed, i.e.default boolean
isDocument()
Annotation
next(Annotation annotation, AnnotationType type)
Determines the next annotation of the given type after the given annotation (e.g.int
numberOfAnnotations()
default List<Annotation>
outgoing(RelationType type, boolean includeSubAnnotations)
Gets all annotations with which this HString has an outgoing relation of the given type.default List<Annotation>
outgoing(RelationType type, String value, boolean includeSubAnnotations)
Gets all annotations with which this HString has an outgoing relation of the given type and value.default List<Relation>
outgoingRelations(boolean includeSubAnnotations)
Gets all outgoing relations to this HString.default List<Relation>
outgoingRelations(@NonNull RelationType relationType, boolean includeSubAnnotations)
Gets all relations of the given type originating from this HString.Annotation
previous(Annotation annotation, AnnotationType type)
Determines the previous annotation of the given type after the given annotation (e.g.Map<AnnotatableType,String>
providers()
boolean
remove(Annotation annotation)
Removes the given annotation from the documentvoid
removeAnnotationType(AnnotationType type)
Removes all annotations of a given type.void
setCompleted(AnnotatableType type, String provider)
Marks the given AnnotatableType as being completed by the given provider.void
setId(String id)
Sets the id of the document.void
setUncompleted(AnnotatableType type)
Marks the given AnnotatableType as not being completed.default String
toJson()
-
Methods inherited from interface java.lang.CharSequence
chars, codePoints, toString
-
Methods inherited from interface com.gengoai.hermes.HString
add, addAll, annotationGraph, annotations, annotations, annotations, annotationStream, annotationStream, asAnnotation, asAnnotation, atBeginningOfSentence, atEndOfSentence, attribute, attribute, attributeEquals, attributeIsA, attributeMap, categories, charAt, charNGrams, charNGrams, children, computeIfAbsent, context, context, dependency, dependencyGraph, dependencyGraph, dependencyIsA, embedding, embedding, enclosedAnnotations, encloses, find, find, findAll, first, firstToken, forEach, getLemma, getMorphologicalFeatures, getStemmedForm, hasAnnotation, hasAttribute, hasIncomingRelation, hasIncomingRelation, hasOutgoingRelation, hasOutgoingRelation, head, ifNotEmpty, incoming, incoming, incomingRelations, incomingRelations, incomingRelationStream, incomingRelationStream, interleaved, isA, isAnnotation, isInstance, last, lastToken, leftContext, leftContext, length, next, outgoing, outgoing, outgoingRelations, outgoingRelations, outgoingRelationStream, outgoingRelationStream, overlaps, parent, pos, previous, put, putAdd, putAll, putAll, putIfAbsent, removeAttribute, removeRelation, rightContext, rightContext, sentence, sentences, sentenceStream, setLanguage, split, startingHere, substring, toDocument, tokenAt, tokenLength, tokens, tokenStream, toPOSString, toPOSString, trim, trimLeft, trimRight, union
-
Methods inherited from interface com.gengoai.collection.tree.Span
compareTo, encloses, end, isEmpty, overlaps, start
-
Methods inherited from interface com.gengoai.string.StringLike
contains, contentEquals, contentEqualsIgnoreCase, endsWith, indexOf, indexOf, matcher, matcher, matches, replace, replaceAll, replaceFirst, startsWith, subSequence, toCharArray, toLowerCase, toUpperCase
-
-
-
-
Method Detail
-
create
static Document create(@NonNull @NonNull String text)
Convenience method for creating a document using the default document factory.- Parameters:
text
- the text content making up the document- Returns:
- the document
-
create
static Document create(@NonNull @NonNull String text, @NonNull @NonNull Language language)
Convenience method for creating a document using the default document factory.- Parameters:
text
- the text content making up the documentlanguage
- the language of the content- Returns:
- the document
-
create
static Document create(@NonNull @NonNull String text, @NonNull @NonNull Language language, @NonNull @NonNull Map<AttributeType<?>,?> attributes)
Convenience method for creating a document using the default document factory.- Parameters:
text
- the text content making up the documentlanguage
- the language of the contentattributes
- the attributes, i.e. metadata, associated with the document- Returns:
- the document
-
create
static Document create(@NonNull @NonNull String text, @NonNull @NonNull Map<AttributeType<?>,?> attributes)
Convenience method for creating a document using the default document factory.- Parameters:
text
- the text content making up the documentattributes
- the attributes, i.e. metadata, associated with the document- Returns:
- the document
-
create
static Document create(@NonNull @NonNull String id, @NonNull @NonNull String text)
Convenience method for creating a document using the default document factory.- Parameters:
id
- the document idtext
- the text content making up the document- Returns:
- the document
-
create
static Document create(@NonNull @NonNull String id, @NonNull @NonNull String text, @NonNull @NonNull Language language)
Convenience method for creating a document using the default document factory.- Parameters:
id
- the document idtext
- the text content making up the documentlanguage
- the language of the content- Returns:
- the document
-
create
static Document create(@NonNull @NonNull String id, @NonNull @NonNull String text, @NonNull @NonNull Language language, @NonNull @NonNull Map<AttributeType<?>,?> attributes)
Convenience method for creating a document using the default document factory.- Parameters:
id
- the document idtext
- the text content making up the documentlanguage
- the language of the contentattributes
- the attributes, i.e. metadata, associated with the document- Returns:
- the document
-
create
static Document create(@NonNull @NonNull String id, @NonNull @NonNull String text, @NonNull @NonNull Map<AttributeType<?>,?> attributes)
Convenience method for creating a document using the default document factory.- Parameters:
id
- the document idtext
- the text content making up the documentattributes
- the attributes, i.e. metadata, associated with the document- Returns:
- the document
-
fromJson
static Document fromJson(@NonNull @NonNull String jsonString)
Creates a document from a JSON representation (created by the write or toJson methods)- Parameters:
jsonString
- the json string- Returns:
- the document
-
annotate
void annotate(AnnotatableType... types)
Convenience method for annotating the document with the given annotatable types.- Parameters:
types
- the types to annotate
-
annotation
Annotation annotation(long id)
Gets an annotation on the document by its ID.- Parameters:
id
- the id of the annotation to retrieve- Returns:
- the annotation
-
annotationBuilder
default Document.AnnotationBuilder annotationBuilder(@NonNull @NonNull AnnotationType type)
Creates an annotation builder for adding annotations to the document.- Returns:
- the annotation builder
-
annotations
List<Annotation> annotations(AnnotationType type, Span span)
Gets annotations of the given type that overlap with the given span.- Parameters:
type
- the type of annotationspan
- the span to search for overlapping annotations- Returns:
- All annotations of the given type on the document that overlap with the give span.
-
annotations
List<Annotation> annotations(AnnotationType type, Span span, Predicate<? super Annotation> filter)
Gets annotations of the given type that overlap with the given span and meet the given filter.- Parameters:
type
- the type of annotationspan
- the span to search for overlapping annotationsfilter
- the filter to use on the annotations- Returns:
- All annotations of the given type on the document that overlap with the give span and meet the given filter.
-
attach
void attach(Annotation annotation)
Attaches the given annotation to the document.- Parameters:
annotation
- The annotation to attach to the document.
-
children
default List<Annotation> children(@NonNull @NonNull String relation)
Description copied from interface:HString
Gets all child annotations, i.e. those annotations that have a dependency relation pointing this HString, with the given dependency relation.
-
completed
Set<AnnotatableType> completed()
Gets the set of completed AnnotatableType on this document.- Returns:
- the set of completed AnnotatableType
-
contains
boolean contains(Annotation annotation)
Determines if the given annotation is attached to this document.- Parameters:
annotation
- The annotation to check- Returns:
- True if this annotation is attached to this document, false otherwise.
-
createAnnotation
Annotation createAnnotation(AnnotationType type, int start, int end, Map<AttributeType<?>,?> attributeMap, List<Relation> relations)
Creates an annotation of the given type encompassing the given span and having the given attributes. The annotation is added to the document and has a unique id assigned.- Parameters:
type
- the type of annotationstart
- the start of the spanend
- the end of the spanattributeMap
- the attributes associated with the annotationrelations
- the relations to add on the annotation- Returns:
- the created annotation
-
createAnnotation
Annotation createAnnotation(AnnotationType type, int start, int end, Map<AttributeType<?>,?> attributeMap)
Creates an annotation of the given type encompassing the given span and having the given attributes. The annotation is added to the document and has a unique id assigned.- Parameters:
type
- the type of annotationstart
- the start of the spanend
- the end of the spanattributeMap
- the attributes associated with the annotation- Returns:
- the created annotation
-
document
default Document document()
-
enclosedAnnotations
default List<Annotation> enclosedAnnotations()
- Specified by:
enclosedAnnotations
in interfaceHString
- Returns:
- all annotations enclosed by this HString
-
getAnnotationProvider
String getAnnotationProvider(AnnotatableType type)
Gets the provider for the given AnnotatableType when that type is completed on the document.- Parameters:
type
- the annotatable type whose provider we want- Returns:
- The provider of the given annotatable type
-
getId
String getId()
Gets the id of the document- Returns:
- The id of the document
-
getLanguage
default Language getLanguage()
- Specified by:
getLanguage
in interfaceHString
- Specified by:
getLanguage
in interfaceStringLike
-
incoming
default List<Annotation> incoming(RelationType type, String value, boolean includeSubAnnotations)
Description copied from interface:HString
Gets all annotations that have relation with this HString as the target. IfincludedSubAnnotations
istrue
then all sub-annotations are examined as potential targets.
-
incoming
default List<Annotation> incoming(RelationType type, boolean includeSubAnnotations)
Description copied from interface:HString
Gets all annotations that have relation with this HString as the target. IfincludedSubAnnotations
istrue
then all sub-annotations are examined as potential targets.
-
incomingRelations
default List<Relation> incomingRelations(boolean includeSubAnnotations)
Description copied from interface:HString
Gets all incoming relations to this HString.- Specified by:
incomingRelations
in interfaceHString
- Parameters:
includeSubAnnotations
- True - include relations to sub-annotations- Returns:
- the collection of relations
-
incomingRelations
default List<Relation> incomingRelations(RelationType relationType, boolean includeSubAnnotations)
Description copied from interface:HString
Gets all relations of the given type targeting this HString.- Specified by:
incomingRelations
in interfaceHString
- Parameters:
relationType
- the relation typeincludeSubAnnotations
- True - include relations to sub-annotations- Returns:
- the relations
-
isCompleted
boolean isCompleted(AnnotatableType type)
Checks is if a givenAnnotatableType
is completed, i.e. been added by an annotator.- Parameters:
type
- the type to check- Returns:
- True if the type is complete, False if not
-
isDocument
default boolean isDocument()
- Specified by:
isDocument
in interfaceHString
- Returns:
- True if this HString represents a document
-
next
Annotation next(Annotation annotation, AnnotationType type)
Determines the next annotation of the given type after the given annotation (e.g. what is the token after the current token)- Parameters:
annotation
- The current annotation.type
- The type of annotation we want to find after the current annotation.- Returns:
- The annotation of the given type after the current annotation or an empty HString if there are none.
-
numberOfAnnotations
int numberOfAnnotations()
- Returns:
- The number of annotations on the document
-
outgoing
default List<Annotation> outgoing(RelationType type, boolean includeSubAnnotations)
Description copied from interface:HString
Gets all annotations with which this HString has an outgoing relation of the given type.
-
outgoing
default List<Annotation> outgoing(RelationType type, String value, boolean includeSubAnnotations)
Description copied from interface:HString
Gets all annotations with which this HString has an outgoing relation of the given type and value.
-
outgoingRelations
default List<Relation> outgoingRelations(boolean includeSubAnnotations)
Description copied from interface:HString
Gets all outgoing relations to this HString.- Specified by:
outgoingRelations
in interfaceHString
- Parameters:
includeSubAnnotations
- True - include relations to sub-annotations- Returns:
- the collection of relations
-
outgoingRelations
default List<Relation> outgoingRelations(@NonNull @NonNull RelationType relationType, boolean includeSubAnnotations)
Description copied from interface:HString
Gets all relations of the given type originating from this HString.- Specified by:
outgoingRelations
in interfaceHString
- Parameters:
relationType
- the relation typeincludeSubAnnotations
- True - include relations to sub-annotations- Returns:
- the relations
-
previous
Annotation previous(Annotation annotation, AnnotationType type)
Determines the previous annotation of the given type after the given annotation (e.g. what is the token before the current token)- Parameters:
annotation
- The current annotation.type
- The type of annotation we want to find before the current annotation.- Returns:
- The annotation of the given type the the current annotation or an empty HString if there are none.
-
providers
Map<AnnotatableType,String> providers()
-
remove
boolean remove(Annotation annotation)
Removes the given annotation from the document- Parameters:
annotation
- the annotation to remove- Returns:
- True if the annotation was successfully removed, False otherwise
-
removeAnnotationType
void removeAnnotationType(AnnotationType type)
Removes all annotations of a given type.- Parameters:
type
- the type of to remove
-
setCompleted
void setCompleted(AnnotatableType type, String provider)
Marks the given AnnotatableType as being completed by the given provider.- Parameters:
type
- The AnnotatableType to mark as completed.provider
- The provided that satisfied the given AnnotatableType
-
setId
void setId(String id)
Sets the id of the document. If a null or blank id is given a random id will generated.- Parameters:
id
- The new id of the document
-
setUncompleted
void setUncompleted(AnnotatableType type)
Marks the given AnnotatableType as not being completed. Useful for reannotating for a given type.- Parameters:
type
- The AnnotatableType to mark as uncompleted.
-
toJson
default String toJson()
- Returns:
- JSON representation of the document
-
-