Class TokenRegex
- java.lang.Object
-
- com.gengoai.hermes.extraction.regex.TokenRegex
-
- All Implemented Interfaces:
Extractor
,Serializable
public final class TokenRegex extends Object implements Serializable, Extractor
Hermes provides a token-based regular expression engine that allows for matches on arbitrary annotation types, relation types, and attributes, while providing many of the operators that are possible using standard Java regular expressions. As with Java regular expressions, the token regular expression is specified as a string and is compiled into an instance of of TokenRegex. The TokenRegex class has many of the same methods as Java’s regular expression, but returns a
TokenMatcher
instead of Matcher. The TokenMatcher class allows for iterating of the matches, extracting the match or named-groups within the match, the starting and ending offset of the match, and conversion into a TokenMatch object which records the current state of the match. Token regular expressions can act as extractors where the extraction generates the HStrings matched for the default group. An example of compiling a regular expression, creating a match, and iterating over the matches is as follows:TokenRegex regex = TokenRegex.compile(pattern); TokenMatcher matcher = regex.matcher(document); while (matcher.find()) { System.out.println(matcher.group()); }
The syntax for token-based regular expressions borrows from the Lyre Expression Language where possible. Token-based regular expressions differ from Lyre in that they work over sequences of HStrings whereas Lyre is working on single HString units. As such, there are differences in the syntax between Lyre. Details on the syntax can be found in the Hermes User Guide.
- Author:
- David B. Bracewell
- See Also:
- Serialized Form
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static TokenRegex
compile(@NonNull String pattern)
Compiles the given pattern into a TokenRegex objectExtraction
extract(@NonNull HString hString)
Generate anExtraction
from the givenHString
.TokenMatcher
matcher(HString text)
Creates aTokenMatcher
to match against the given text.TokenMatcher
matcher(HString text, int start)
Creates aTokenMatcher
to match against the given text.boolean
matches(HString text)
Determines if the regex matches the entire region of the given input text.Optional<HString>
matchFirst(HString text)
Runs the pattern over the given input text returning the first match if one exists.String
pattern()
String
toString()
-
-
-
Method Detail
-
compile
public static TokenRegex compile(@NonNull @NonNull String pattern) throws ParseException
Compiles the given pattern into a TokenRegex object- Parameters:
pattern
- The token regex pattern- Returns:
- A compiled TokenRegex
- Throws:
ParseException
- The given pattern has a syntax error
-
extract
public Extraction extract(@NonNull @NonNull HString hString)
Description copied from interface:Extractor
Generate anExtraction
from the givenHString
.
-
matchFirst
public Optional<HString> matchFirst(HString text)
Runs the pattern over the given input text returning the first match if one exists.- Parameters:
text
- the text to run the pattern over- Returns:
- an optional of the match
-
matcher
public TokenMatcher matcher(HString text, int start)
Creates aTokenMatcher
to match against the given text.- Parameters:
text
- The text to run the TokenRegex againststart
- Which token to start the TokenRegex on- Returns:
- A TokenMatcher
-
matcher
public TokenMatcher matcher(HString text)
Creates aTokenMatcher
to match against the given text.- Parameters:
text
- The text to run the TokenRegex against- Returns:
- A TokenMatcher
-
matches
public boolean matches(HString text)
Determines if the regex matches the entire region of the given input text.- Parameters:
text
- the text to match- Returns:
- True if the pattern matches the entire region of the input text, False otherwise
-
pattern
public String pattern()
- Returns:
- The token regex pattern as a string
-
-