Class ExtractorBasedSimilarity

    • Constructor Detail

      • ExtractorBasedSimilarity

        public ExtractorBasedSimilarity​(@NonNull
                                        @NonNull Similarity measure)
        Instantiates a new TokenSimilarity using a TermExtractor that ignores stop words, and converts HString to their lemma form.
        Parameters:
        measure - the similarity measure to use
      • ExtractorBasedSimilarity

        public ExtractorBasedSimilarity​(@NonNull
                                        @NonNull Similarity measure,
                                        @NonNull
                                        @NonNull Extractor termExtractor)
        Instantiates a new TokenSimilarity.
        Parameters:
        measure - the similarity measure to use
        termExtractor - the extractor to use to generate extractions for calculating simialrity
    • Method Detail

      • calculate

        public double calculate​(@NonNull
                                @NonNull HString first,
                                @NonNull
                                @NonNull HString second)
        Description copied from interface: HStringSimilarity
        Calculates the similarity between the two given HString
        Specified by:
        calculate in interface HStringSimilarity
        Parameters:
        first - the first HString
        second - the second HString
        Returns:
        the similarity between first and second
      • fit

        public void fit​(@NonNull
                        @NonNull DocumentCollection corpus)
        Description copied from interface: HStringSimilarity
        In certain cases a HStringSimilarity needs to collect corpus level statistics to determine similarity. The fit method allows implementations to perform this logic at a corpus level.
        Specified by:
        fit in interface HStringSimilarity
        Parameters:
        corpus - the corpus to fit the similarity measure to