Package com.gengoai.hermes.morphology
Class StopWords
- java.lang.Object
-
- com.gengoai.hermes.morphology.StopWords
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
ENStopWords
,StopWords.NoOptStopWords
public abstract class StopWords extends Object implements Serializable
Defines a methodology for determining if an HString or String is a stopword for a given language.
- Author:
- David B. Bracewell
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
StopWords.NoOptStopWords
StopWords implementation that treats everything as a content word.
-
Constructor Summary
Constructors Constructor Description StopWords()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description static StopWords
getStopWords(Language language)
Gets the Stopwords instance for the given language.static SerializablePredicate<HString>
hasOnlyContentWords()
static SerializablePredicate<HString>
hasStopWord()
boolean
hasStopWord(HString text)
Returns true when any token in a given HString is a stopwordstatic SerializablePredicate<HString>
isContentWord()
static SerializablePredicate<HString>
isStopWord()
boolean
isStopWord(HString text)
Checks if the given text is a stopwordabstract boolean
isStopWord(String word)
Checks if the given word is a stopwordprotected abstract boolean
isTokenStopWord(Annotation token)
Checks if the given token is a stopword
-
-
-
Method Detail
-
getStopWords
public static StopWords getStopWords(Language language)
Gets the Stopwords instance for the given language.- Parameters:
language
- the language- Returns:
- the Stopwords instance
-
hasOnlyContentWords
public static SerializablePredicate<HString> hasOnlyContentWords()
- Returns:
- predicate returning true when all tokens in the given HString are content words (i.e. not a stopword)
-
hasStopWord
public static SerializablePredicate<HString> hasStopWord()
- Returns:
- predicate returning true when any token in a given HString is a stopword
-
isContentWord
public static SerializablePredicate<HString> isContentWord()
- Returns:
- predicate returning true when the given HString is a content word (i.e. not a stopword)
-
isStopWord
public static SerializablePredicate<HString> isStopWord()
- Returns:
-
hasStopWord
public boolean hasStopWord(HString text)
Returns true when any token in a given HString is a stopword- Parameters:
text
- the text- Returns:
- true when any token in a given HString is a stopword
-
isStopWord
public boolean isStopWord(HString text)
Checks if the given text is a stopword- Parameters:
text
- the text- Returns:
- True if a stopword, False if a content word.
-
isStopWord
public abstract boolean isStopWord(String word)
Checks if the given word is a stopword- Parameters:
word
- the word- Returns:
- True if a stopword, False if a content word.
-
isTokenStopWord
protected abstract boolean isTokenStopWord(Annotation token)
Checks if the given token is a stopword- Parameters:
token
- the token- Returns:
- True if a stopword, False if a content word.
-
-