Class Xml


  • public final class Xml
    extends Object
    Common methods for parsing and handling XML files
    Author:
    David B. Bracewell
    • Field Detail

      • WHITESPACE_FILTER

        public static final EventFilter WHITESPACE_FILTER
        An EventFilter that ignores character elements that are white space
    • Method Detail

      • createXPath

        public static XPath createXPath()
        Creates an XPath instance
        Returns:
        An XPath instance
      • getAttributeValue

        public static String getAttributeValue​(Node n,
                                               String atrName,
                                               BiPredicate<String,​String> matcher)
        Gets the value of an attribute for a node
        Parameters:
        n - Node to get attribute value for
        atrName - Name of attribute whose value is desired
        matcher - StringMatcher to match attribute names
        Returns:
        The attribute value or null
      • getAttributeValue

        public static String getAttributeValue​(Node n,
                                               String atrName)
        Gets the value of an attribute for a node using a case insensitive string matcher
        Parameters:
        n - Node to get attribute value for
        atrName - Name of attribute whose value is desired
        Returns:
        The attribute value or null
      • getTextContent

        public static String getTextContent​(Node n)

        Gathers the text from the child TEXT_NODE of the given node

        Parameters:
        n - Node to get text from
        Returns:
        String with text from the child TEXT_NODE
      • getTextContent

        public static String getTextContent​(Collection<Node> nodes)

        Gathers the text from the child TEXT_NODE from each of the nodes in the given collection.

        A line separator is added between the content of each node

        Parameters:
        nodes - the nodes
        Returns:
        String with text from the child TEXT_NODE
      • getTextContent

        public static String getTextContent​(NodeList nodes)

        Gathers the text from the child TEXT_NODE from each of the nodes in the given NodeList.

        A line separator is added between the content of each node

        Parameters:
        nodes - the nodes
        Returns:
        String with text from the child TEXT_NODE
      • getTextContentRecursive

        public static String getTextContentRecursive​(Node n)
        Gets all text from the node and its child nodes
        Parameters:
        n - Node to get text from
        Returns:
        String with all text from the node and its children or null
      • getTextContentRecursive

        public static String getTextContentRecursive​(NodeList nl)
        Gets all text from all the nodes and their children in a list of nodes
        Parameters:
        nl - the nl
        Returns:
        String with all text from all the nodes and their children in a list of nodes or null
      • getTextContentRecursive

        public static String getTextContentRecursive​(Collection<Node> nl)
        Gets all text from all the nodes and their children in a list of nodes
        Parameters:
        nl - the nl
        Returns:
        String with all text from all the nodes and their children in a list of nodes or null
      • getTextContentWithComments

        public static String getTextContentWithComments​(Node n)

        Gathers the text from the child TEXT_NODE and COMMENT_NODE of the given node

        Parameters:
        n - Node to get text from
        Returns:
        String with text from the child TEXT_NODE
      • getTextContentWithComments

        public static String getTextContentWithComments​(Collection<Node> nodes)

        Gathers the text from the child TEXT_NODE and COMMENT_NODE from each of the nodes in the given collection.

        A line separator is added between the content of each node

        Parameters:
        nodes - the nodes
        Returns:
        String with text from the child TEXT_NODE and COMMENT_NODE
      • getTextContentWithComments

        public static String getTextContentWithComments​(NodeList nodes)

        Gathers the text from the child TEXT_NODE and COMMENT_NODE from each of the nodes in the given NodeList.

        A line separator is added between the content of each node

        Parameters:
        nodes - the nodes
        Returns:
        String with text from the child TEXT_NODE and COMMENT_NODE
      • getTextContentWithCommentsRecursive

        public static String getTextContentWithCommentsRecursive​(NodeList nl)
        Gets all text from all the nodes and their children in a list of nodes
        Parameters:
        nl - the nl
        Returns:
        String with all text from all the nodes and their children in a list of nodes or null
      • getTextContentWithCommentsRecursive

        public static String getTextContentWithCommentsRecursive​(Collection<Node> nl)
        Gets all text from all the nodes and their children in a list of nodes
        Parameters:
        nl - the nl
        Returns:
        String with all text from all the nodes and their children in a list of nodes or null
      • getTextContentWithCommentsRecursive

        public static String getTextContentWithCommentsRecursive​(Node n)
        Gets all text from the node and its child nodes
        Parameters:
        n - Node to get text from
        Returns:
        String with all text from the node and its children or null
      • parse

        public static Iterable<Document> parse​(Resource xmlResource,
                                               String tag)
                                        throws IOException,
                                               XMLStreamException
        Parses the given XML resource creating sub-documents (DOM) from the elements with the given tag name. An example of usage for this method is constructing DOM documents from each "page" element in the Wikipedia dump.
        Parameters:
        xmlResource - the xml resource
        tag - the tag to create documents on
        Returns:
        the iterable
        Throws:
        IOException - the io exception
        XMLStreamException - the xml stream exception
      • parse

        public static Iterable<Document> parse​(Resource xmlResource,
                                               String tag,
                                               EventFilter eventFilter)
                                        throws IOException,
                                               XMLStreamException
        Parses the given XML resource creating sub-documents (DOM) from the elements with the given tag name and filtering out events using the given event filter. An example of usage for this method is constructing DOM documents from each "page" element in the Wikipedia dump.
        Parameters:
        xmlResource - the xml resource
        tag - the tag to create documents on
        eventFilter - the event filter
        Returns:
        the iterable
        Throws:
        IOException - the io exception
        XMLStreamException - the xml stream exception
      • removeNodeAndChildren

        public static void removeNodeAndChildren​(Node n)

        Removes a node by first recursively removing of all of it's children and their children

        Parameters:
        n - Node to remove
      • removeNodes

        public static void removeNodes​(NodeList nl)

        Removes a set of nodes in aNodeList from the document.

        Parameters:
        nl - The nodes to remove
      • removeNodes

        public static void removeNodes​(Collection<? extends Node> nl)

        Removes a set of nodes in a Collection from the document.

        Parameters:
        nl - The nodes to remove
      • selectAllNodes

        public static List<Node> selectAllNodes​(Node n)
        Selects all nodes under and including the given node
        Parameters:
        n - Node to start at
        Returns:
        A List of Node that has the given node and all the nodes under it
      • selectChildNodeBreadthFirst

        public static Node selectChildNodeBreadthFirst​(Node n,
                                                       Predicate<Node> predicate)

        Selects the first node passing a given predicate using a breadth first search

        Parameters:
        n - Node to start at
        predicate - The Predicate to use for evaluation
        Returns:
        The first Node to match the predicate or null
      • selectChildNodes

        public static List<Node> selectChildNodes​(Node n,
                                                  Predicate<Node> predicate)
        Selects nodes under and including the given node that are of a given type
        Parameters:
        n - Node to start at
        predicate - the predicate
        Returns:
        A List of Node that are of the given type
      • selectNodes

        public static List<Node> selectNodes​(Node n,
                                             Predicate<Node> predicate)

        Selects nodes under and including the given node that are of a given type

        Parameters:
        n - Node to start at
        predicate - The Predicate to use for evaluation
        Returns:
        A List of Node that are of the given type
      • tagMatchPredicate

        public static Predicate<Node> tagMatchPredicate​(String tagName)

        Creates an Predicate that evaluates to true if the input node's name is the same as the given node name. String matches are case insensitive

        Parameters:
        tagName - The name to match
        Returns:
        The Predicate
      • typeMatchPredicate

        public static Predicate<Node> typeMatchPredicate​(short nodeType)

        Creates an Predicate that evaluates to true if the input node's type is the same as the given node types

        Parameters:
        nodeType - The Node type to match
        Returns:
        The Predicate