Class PennTreebankFormat

  • All Implemented Interfaces:
    DocFormat, OneDocPerFileFormat, Serializable

    public class PennTreebankFormat
    extends WholeFileTextFormat
    implements OneDocPerFileFormat

    Format Name: ptb

    Reader for Penn Treebank mrg files. Provides the following AnnotatableType:

    • TOKEN
    • SENTENCE
    • PART_OF_SPEECH
    • CONSTITUENT_PARSE, which adds NON_TERMINAL_NODE annotations and SYNTACTIC_HEAD relations

    Function tags are represented on the SYNTACTIC_HEAD relation with the NON_TERMINAL_NODE annotations only have the base part-of-speech. Note this removes all -None- entries.

    See Also:
    Serialized Form