Interface Document

  • All Superinterfaces:
    CharSequence, Comparable<Span>, HString, Serializable, Span, StringLike

    public interface Document
    extends HString

    A document represents text content with an accompanying set of metadata (Attributes), linguistic overlays (Annotations), and relations between elements in the document. Documents are represented as HString with additional methods for adding, removing, and accessing the annotations over its content. Every document has an id associated with it, which should be unique within a corpus.

    Documents are created using a DocumentFactory, which defines the preprocessing (e.g whitespace and unicode normalization) steps (TextNormalizers) to be performed on raw text before creating a document and the default language with which the documents are written. Additionally, the Document class provides a number of convenience create methods for constructing documents using the default DocumentFactory instance.

    Author:
    David B. Bracewell
    • Method Detail

      • create

        static Document create​(@NonNull
                               @NonNull String text)
        Convenience method for creating a document using the default document factory.
        Parameters:
        text - the text content making up the document
        Returns:
        the document
      • create

        static Document create​(@NonNull
                               @NonNull String text,
                               @NonNull
                               @NonNull Language language)
        Convenience method for creating a document using the default document factory.
        Parameters:
        text - the text content making up the document
        language - the language of the content
        Returns:
        the document
      • create

        static Document create​(@NonNull
                               @NonNull String text,
                               @NonNull
                               @NonNull Language language,
                               @NonNull
                               @NonNull Map<AttributeType<?>,​?> attributes)
        Convenience method for creating a document using the default document factory.
        Parameters:
        text - the text content making up the document
        language - the language of the content
        attributes - the attributes, i.e. metadata, associated with the document
        Returns:
        the document
      • create

        static Document create​(@NonNull
                               @NonNull String text,
                               @NonNull
                               @NonNull Map<AttributeType<?>,​?> attributes)
        Convenience method for creating a document using the default document factory.
        Parameters:
        text - the text content making up the document
        attributes - the attributes, i.e. metadata, associated with the document
        Returns:
        the document
      • create

        static Document create​(@NonNull
                               @NonNull String id,
                               @NonNull
                               @NonNull String text)
        Convenience method for creating a document using the default document factory.
        Parameters:
        id - the document id
        text - the text content making up the document
        Returns:
        the document
      • create

        static Document create​(@NonNull
                               @NonNull String id,
                               @NonNull
                               @NonNull String text,
                               @NonNull
                               @NonNull Language language)
        Convenience method for creating a document using the default document factory.
        Parameters:
        id - the document id
        text - the text content making up the document
        language - the language of the content
        Returns:
        the document
      • create

        static Document create​(@NonNull
                               @NonNull String id,
                               @NonNull
                               @NonNull String text,
                               @NonNull
                               @NonNull Language language,
                               @NonNull
                               @NonNull Map<AttributeType<?>,​?> attributes)
        Convenience method for creating a document using the default document factory.
        Parameters:
        id - the document id
        text - the text content making up the document
        language - the language of the content
        attributes - the attributes, i.e. metadata, associated with the document
        Returns:
        the document
      • create

        static Document create​(@NonNull
                               @NonNull String id,
                               @NonNull
                               @NonNull String text,
                               @NonNull
                               @NonNull Map<AttributeType<?>,​?> attributes)
        Convenience method for creating a document using the default document factory.
        Parameters:
        id - the document id
        text - the text content making up the document
        attributes - the attributes, i.e. metadata, associated with the document
        Returns:
        the document
      • fromJson

        static Document fromJson​(@NonNull
                                 @NonNull String jsonString)
        Creates a document from a JSON representation (created by the write or toJson methods)
        Parameters:
        jsonString - the json string
        Returns:
        the document
      • annotate

        void annotate​(AnnotatableType... types)
        Convenience method for annotating the document with the given annotatable types.
        Parameters:
        types - the types to annotate
      • annotation

        Annotation annotation​(long id)
        Gets an annotation on the document by its ID.
        Parameters:
        id - the id of the annotation to retrieve
        Returns:
        the annotation
      • annotationBuilder

        default Document.AnnotationBuilder annotationBuilder​(@NonNull
                                                             @NonNull AnnotationType type)
        Creates an annotation builder for adding annotations to the document.
        Returns:
        the annotation builder
      • annotations

        List<Annotation> annotations​(AnnotationType type,
                                     Span span)
        Gets annotations of the given type that overlap with the given span.
        Parameters:
        type - the type of annotation
        span - the span to search for overlapping annotations
        Returns:
        All annotations of the given type on the document that overlap with the give span.
      • annotations

        List<Annotation> annotations​(AnnotationType type,
                                     Span span,
                                     Predicate<? super Annotation> filter)
        Gets annotations of the given type that overlap with the given span and meet the given filter.
        Parameters:
        type - the type of annotation
        span - the span to search for overlapping annotations
        filter - the filter to use on the annotations
        Returns:
        All annotations of the given type on the document that overlap with the give span and meet the given filter.
      • attach

        void attach​(Annotation annotation)
        Attaches the given annotation to the document.
        Parameters:
        annotation - The annotation to attach to the document.
      • children

        default List<Annotation> children​(@NonNull
                                          @NonNull String relation)
        Description copied from interface: HString
        Gets all child annotations, i.e. those annotations that have a dependency relation pointing this HString, with the given dependency relation.
        Specified by:
        children in interface HString
        Parameters:
        relation - The dependency relation value
        Returns:
        the list of child annotations
      • completed

        Set<AnnotatableType> completed()
        Gets the set of completed AnnotatableType on this document.
        Returns:
        the set of completed AnnotatableType
      • contains

        boolean contains​(Annotation annotation)
        Determines if the given annotation is attached to this document.
        Parameters:
        annotation - The annotation to check
        Returns:
        True if this annotation is attached to this document, false otherwise.
      • createAnnotation

        Annotation createAnnotation​(AnnotationType type,
                                    int start,
                                    int end,
                                    Map<AttributeType<?>,​?> attributeMap,
                                    List<Relation> relations)
        Creates an annotation of the given type encompassing the given span and having the given attributes. The annotation is added to the document and has a unique id assigned.
        Parameters:
        type - the type of annotation
        start - the start of the span
        end - the end of the span
        attributeMap - the attributes associated with the annotation
        relations - the relations to add on the annotation
        Returns:
        the created annotation
      • createAnnotation

        Annotation createAnnotation​(AnnotationType type,
                                    int start,
                                    int end,
                                    Map<AttributeType<?>,​?> attributeMap)
        Creates an annotation of the given type encompassing the given span and having the given attributes. The annotation is added to the document and has a unique id assigned.
        Parameters:
        type - the type of annotation
        start - the start of the span
        end - the end of the span
        attributeMap - the attributes associated with the annotation
        Returns:
        the created annotation
      • document

        default Document document()
        Specified by:
        document in interface HString
        Returns:
        the document that this HString is associated with
      • getAnnotationProvider

        String getAnnotationProvider​(AnnotatableType type)
        Gets the provider for the given AnnotatableType when that type is completed on the document.
        Parameters:
        type - the annotatable type whose provider we want
        Returns:
        The provider of the given annotatable type
      • getId

        String getId()
        Gets the id of the document
        Returns:
        The id of the document
      • incoming

        default List<Annotation> incoming​(RelationType type,
                                          String value,
                                          boolean includeSubAnnotations)
        Description copied from interface: HString
        Gets all annotations that have relation with this HString as the target. If includedSubAnnotations is true then all sub-annotations are examined as potential targets.
        Specified by:
        incoming in interface HString
        Parameters:
        type - the relation type
        value - the relation value
        includeSubAnnotations - True - this HString or any of its sub-annotations can be the target, False - only relations with this exact HString as the target.
        Returns:
        the annotations
      • incoming

        default List<Annotation> incoming​(RelationType type,
                                          boolean includeSubAnnotations)
        Description copied from interface: HString
        Gets all annotations that have relation with this HString as the target. If includedSubAnnotations is true then all sub-annotations are examined as potential targets.
        Specified by:
        incoming in interface HString
        Parameters:
        type - the relation type
        includeSubAnnotations - True - this HString or any of its sub-annotations can be the target, False - only relations with this exact HString as the target.
        Returns:
        the annotations
      • incomingRelations

        default List<Relation> incomingRelations​(boolean includeSubAnnotations)
        Description copied from interface: HString
        Gets all incoming relations to this HString.
        Specified by:
        incomingRelations in interface HString
        Parameters:
        includeSubAnnotations - True - include relations to sub-annotations
        Returns:
        the collection of relations
      • incomingRelations

        default List<Relation> incomingRelations​(RelationType relationType,
                                                 boolean includeSubAnnotations)
        Description copied from interface: HString
        Gets all relations of the given type targeting this HString.
        Specified by:
        incomingRelations in interface HString
        Parameters:
        relationType - the relation type
        includeSubAnnotations - True - include relations to sub-annotations
        Returns:
        the relations
      • isCompleted

        boolean isCompleted​(AnnotatableType type)
        Checks is if a given AnnotatableType is completed, i.e. been added by an annotator.
        Parameters:
        type - the type to check
        Returns:
        True if the type is complete, False if not
      • isDocument

        default boolean isDocument()
        Specified by:
        isDocument in interface HString
        Returns:
        True if this HString represents a document
      • next

        Annotation next​(Annotation annotation,
                        AnnotationType type)
        Determines the next annotation of the given type after the given annotation (e.g. what is the token after the current token)
        Parameters:
        annotation - The current annotation.
        type - The type of annotation we want to find after the current annotation.
        Returns:
        The annotation of the given type after the current annotation or an empty HString if there are none.
      • numberOfAnnotations

        int numberOfAnnotations()
        Returns:
        The number of annotations on the document
      • outgoing

        default List<Annotation> outgoing​(RelationType type,
                                          boolean includeSubAnnotations)
        Description copied from interface: HString
        Gets all annotations with which this HString has an outgoing relation of the given type.
        Specified by:
        outgoing in interface HString
        Parameters:
        type - the relation type
        includeSubAnnotations - True - include annotations for which any of the sub-annotations has an outgoing relation.
        Returns:
        the annotations
      • outgoing

        default List<Annotation> outgoing​(RelationType type,
                                          String value,
                                          boolean includeSubAnnotations)
        Description copied from interface: HString
        Gets all annotations with which this HString has an outgoing relation of the given type and value.
        Specified by:
        outgoing in interface HString
        Parameters:
        type - the relation type
        value - the relation value
        includeSubAnnotations - True - include annotations for which any of the sub-annotations has an outgoing relation.
        Returns:
        the annotations
      • outgoingRelations

        default List<Relation> outgoingRelations​(boolean includeSubAnnotations)
        Description copied from interface: HString
        Gets all outgoing relations to this HString.
        Specified by:
        outgoingRelations in interface HString
        Parameters:
        includeSubAnnotations - True - include relations to sub-annotations
        Returns:
        the collection of relations
      • outgoingRelations

        default List<Relation> outgoingRelations​(@NonNull
                                                 @NonNull RelationType relationType,
                                                 boolean includeSubAnnotations)
        Description copied from interface: HString
        Gets all relations of the given type originating from this HString.
        Specified by:
        outgoingRelations in interface HString
        Parameters:
        relationType - the relation type
        includeSubAnnotations - True - include relations to sub-annotations
        Returns:
        the relations
      • previous

        Annotation previous​(Annotation annotation,
                            AnnotationType type)
        Determines the previous annotation of the given type after the given annotation (e.g. what is the token before the current token)
        Parameters:
        annotation - The current annotation.
        type - The type of annotation we want to find before the current annotation.
        Returns:
        The annotation of the given type the the current annotation or an empty HString if there are none.
      • remove

        boolean remove​(Annotation annotation)
        Removes the given annotation from the document
        Parameters:
        annotation - the annotation to remove
        Returns:
        True if the annotation was successfully removed, False otherwise
      • removeAnnotationType

        void removeAnnotationType​(AnnotationType type)
        Removes all annotations of a given type.
        Parameters:
        type - the type of to remove
      • setCompleted

        void setCompleted​(AnnotatableType type,
                          String provider)
        Marks the given AnnotatableType as being completed by the given provider.
        Parameters:
        type - The AnnotatableType to mark as completed.
        provider - The provided that satisfied the given AnnotatableType
      • setId

        void setId​(String id)
        Sets the id of the document. If a null or blank id is given a random id will generated.
        Parameters:
        id - The new id of the document
      • setUncompleted

        void setUncompleted​(AnnotatableType type)
        Marks the given AnnotatableType as not being completed. Useful for reannotating for a given type.
        Parameters:
        type - The AnnotatableType to mark as uncompleted.
      • toJson

        default String toJson()
        Returns:
        JSON representation of the document