Class CaduceusProgram

  • All Implemented Interfaces:
    Extractor, Serializable

    public final class CaduceusProgram
    extends Object
    implements Serializable, Extractor

    Caduceus, pronounced ca·du·ceus, is a rule-based information extraction system. Caduceus programs consist of a list of rules for extracting arbitrary spans of text to define annotations (e.g. entities and events) and relations (e.g. event roles). Each rule starts with a unique name declared in square brackets, e.g. [my_rule]. Following the rule name is the trigger, which is a TokenRegex that captures the text causing the rule to fire.

    Rules construct annotations and/or relations based on the matched trigger. A rule may have define zero or more annotations to be constructed. Each annotation is defined using annotation: and requires the following options to be specified:

     
     `capture=(\*|GROUP_NAME)`: The text span which will make up the annotation, where `\*` represents the full trigger match and `GROUP_NAME` represents a named group from the trigger match.
     `type=ANNOTATION_TYPE`: The name of the annotation type to construct.
     
     
    Additionally, attributes can be defined using as follows: $ATTRIBUTE_NAME = VALUE.

    See Also:
    Serialized Form
    • Method Detail

      • read

        public static CaduceusProgram read​(@NonNull
                                           @NonNull Resource resource)
                                    throws IOException,
                                           ParseException
        Reads a Caduceus program from the given resource.
        Parameters:
        resource - the resource containing the Caduceus program
        Returns:
        the CaduceusProgram
        Throws:
        IOException - Something went wrong reading from the resource
        ParseException - Something went wrong parsing the Caduceus program
      • execute

        public void execute​(@NonNull
                            @NonNull Document document)
        Executes the program over the given document.
        Parameters:
        document - the document to execute the program on
      • extract

        public Extraction extract​(@NonNull
                                  @NonNull HString hString)
        Description copied from interface: Extractor
        Generate an Extraction from the given HString.
        Specified by:
        extract in interface Extractor
        Parameters:
        hString - the source text from which we will generate an Extraction
        Returns:
        the Extraction