public final class CaduceusProgram extends Object implements Serializable, Extractor
Caduceus, pronounced ca·du·ceus, is a rule-based information extraction system. Caduceus programs consist of a list of rules for extracting arbitrary spans of text to define annotations (e.g. entities and events) and relations (e.g. event roles). Each rule starts with a unique name declared in square brackets, e.g.
[my_rule]. Following the rule name is the trigger, which is a
TokenRegexthat captures the text causing the rule to fire.
Rules construct annotations and/or relations based on the matched trigger. A rule may have define zero or more annotations to be constructed. Each annotation is defined using
annotation:and requires the following options to be specified:
`capture=(\*|GROUP_NAME)`: The text span which will make up the annotation, where `\*` represents the full trigger match and `GROUP_NAME` represents a named group from the trigger match. `type=ANNOTATION_TYPE`: The name of the annotation type to construct.
$ATTRIBUTE_NAME = VALUE.
- See Also:
- Serialized Form
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description
execute(@NonNull Document document)Executes the program over the given document.
extract(@NonNull HString hString)
read(@NonNull Resource resource)Reads a Caduceus program from the given resource.
public static CaduceusProgram read(@NonNull @NonNull Resource resource) throws IOException, ParseExceptionReads a Caduceus program from the given resource.
public void execute(@NonNull @NonNull Document document)Executes the program over the given document.
document- the document to execute the program on