Class UnicodeNormalizer

  • All Implemented Interfaces:
    Serializable

    public class UnicodeNormalizer
    extends TextNormalizer

    Converts unicode to canonical form and removes smart quotes.

    Author:
    David B. Bracewell
    See Also:
    Serialized Form
    • Constructor Detail

      • UnicodeNormalizer

        public UnicodeNormalizer()
    • Method Detail

      • performNormalization

        public String performNormalization​(String input,
                                           Language inputLanguage)
        Description copied from class: TextNormalizer
        Performs a pre-processing operation on the input string in the given input language
        Specified by:
        performNormalization in class TextNormalizer
        Parameters:
        input - The input text
        inputLanguage - The language of the input
        Returns:
        The post-processed text