Class DocumentFactory

  • All Implemented Interfaces:
    Serializable

    public final class DocumentFactory
    extends Object
    implements Serializable

    A document factory facilitates the creation of document objects performing any predefined preprocessing, e.g. whitespace normalization, on the document content. A default factory can be obtained by calling getInstance() or a factory can be built using a DocumentFactory.DocumentFactoryBuilder constructed using builder().

    The default factory uses configuration settings to determine the default language and preprocessing normalizers. The default language is defined using the com.gengoai.hermes.DefaultLanguage configuration property and the normalizers are defined using the com.gengoai.hermes.preprocessing.normalizers configuration property.

    Author:
    David B. Bracewell
    See Also:
    Serialized Form
    • Method Detail

      • getInstance

        public static DocumentFactory getInstance()
        Returns:
        A document factory whose preprocessors and default language are set via configuration options
      • create

        public Document create​(@NonNull
                               @NonNull String content)
        Creates a document with the given content assigning the new document an auto-generated id and setting its language to the default language.
        Parameters:
        content - the document content
        Returns:
        the document
      • create

        public Document create​(@NonNull
                               @NonNull String id,
                               @NonNull
                               @NonNull String content)
        Creates a document with the given id and content setting its language to the default language.
        Parameters:
        id - the id
        content - the content
        Returns:
        the document
      • create

        public Document create​(@NonNull
                               @NonNull String content,
                               @NonNull
                               @NonNull Language language)
        Creates a document with the given content written in the given language assigning the new document an auto-generated id .
        Parameters:
        content - the content
        language - the language
        Returns:
        the document
      • create

        public Document create​(@NonNull
                               @NonNull String id,
                               @NonNull
                               @NonNull String content,
                               @NonNull
                               @NonNull Language language)
        Creates a document with the given id and content written in the given language.
        Parameters:
        id - the id
        content - the content
        language - the language
        Returns:
        the document
      • create

        public Document create​(@NonNull
                               @NonNull String content,
                               @NonNull
                               @NonNull Language language,
                               @NonNull
                               @NonNull Map<AttributeType<?>,​?> attributeMap)
        Creates a document with the given content written in the given language having the given set of attributes.
        Parameters:
        content - the content
        language - the language
        attributeMap - the attribute map
        Returns:
        the document
      • create

        public Document create​(@NonNull
                               @NonNull String id,
                               @NonNull
                               @NonNull String content,
                               @NonNull
                               @NonNull Language language,
                               @NonNull
                               @NonNull Map<AttributeType<?>,​?> attributeMap)
        Creates a document with the given id and content written in the given language having the given set of attributes.
        Parameters:
        id - the id
        content - the content
        language - the language
        attributeMap - the attribute map
        Returns:
        the document
      • createRaw

        public Document createRaw​(@NonNull
                                  @NonNull String id,
                                  @NonNull
                                  @NonNull String content,
                                  @NonNull
                                  @NonNull Language language,
                                  @NonNull
                                  @NonNull Map<AttributeType<?>,​?> attributeMap)
        Creates a document with the given id and content written in the given language having the given set of attributes. This method does not apply any TextNormalizer
        Parameters:
        id - the id
        content - the content
        language - the language
        attributeMap - the attribute map
        Returns:
        the document
      • createRaw

        public Document createRaw​(@NonNull
                                  @NonNull String content)
        Creates a document with the given content written in the default language. This method does not apply any TextNormalizer**
        Parameters:
        content - the content
        Returns:
        the document
      • createRaw

        public Document createRaw​(@NonNull
                                  @NonNull String id,
                                  @NonNull
                                  @NonNull String content)
        Creates a document with the given id and content written in the default language. This method does not apply any TextNormalizer
        Parameters:
        id - the id
        content - the content
        Returns:
        the document
      • createRaw

        public Document createRaw​(@NonNull
                                  @NonNull String content,
                                  @NonNull
                                  @NonNull Language language)
        Creates a document with the given content written in the given language. This method does not apply any TextNormalizer**
        Parameters:
        content - the content
        language - the language
        Returns:
        the document
      • createRaw

        public Document createRaw​(@NonNull
                                  @NonNull String id,
                                  @NonNull
                                  @NonNull String content,
                                  @NonNull
                                  @NonNull Language language)
        Creates a document with the given id and content written in the given language. This method does not apply any TextNormalizer
        Parameters:
        id - the id
        content - the content
        language - the language
        Returns:
        the document
      • createRaw

        public Document createRaw​(@NonNull
                                  @NonNull String content,
                                  @NonNull
                                  @NonNull Language language,
                                  @NonNull
                                  @NonNull Map<AttributeType<?>,​?> attributeMap)
        Creates a document with the given content written in the given language having the given set of attributes. This method does not apply any TextNormalizer
        Parameters:
        content - the content
        language - the language
        attributeMap - the attribute map
        Returns:
        the document
      • fromTokens

        public Document fromTokens​(@NonNull
                                   @NonNull Language language,
                                   @NonNull
                                   @NonNull String... tokens)
        Creates a document from the given tokens. The language parameter controls how the content of the documents is created. If the language has whitespace tokens are joined with a single space between them, otherwise no space is inserted between tokens.
        Parameters:
        language - the language of the document
        tokens - the tokens making up the document
        Returns:
        the document with tokens provided.
      • fromTokens

        public Document fromTokens​(@NonNull
                                   @NonNull String... tokens)
        Creates a document from the given tokens using the default language.
        Parameters:
        tokens - the tokens
        Returns:
        the document
      • fromTokens

        public Document fromTokens​(@NonNull
                                   @NonNull Iterable<String> tokens)
        Creates a document from the given tokens using the default language.
        Parameters:
        tokens - the tokens
        Returns:
        the document
      • fromTokens

        public Document fromTokens​(@NonNull
                                   @NonNull Iterable<String> tokens,
                                   @NonNull
                                   @NonNull Language language)
        Creates a document from the given tokens. The language parameter controls how the content of the documents is created. If the language has whitespace tokens are joined with a single space between them, otherwise no space is inserted between tokens.
        Parameters:
        tokens - the tokens
        language - the language
        Returns:
        the document