public class XMLTagExtractor extends DefaultHandler2
PepperModules. The
XMLTagExtractor generates a dictionary of the xml vocabulary. The
dictionary consists of xml tag names, xml namespaces and attribute names from
a source file and generates a java interface and a java class as well. The
interface contains the xml namespace declarations, the xml element and
attribute names as fields (public static final Strings). The generated java
class implements that interface and further extends the
DefaultHandler2 class, to read a xml file following the generated xml
dictionary. PepperImporter or
PepperExporter classes consuming or producing xml formats. In that
case, a sample xml file (containing most or better all of the elements) can
be used to extract all element names as keys for the implementation. <sentence xml:lang="en"> <token pos="VBZ">Is</token> <token pos="DT" lemma="this">this</token> <token>example</token> </sentence>will be result in the following interface:
public interface INTERFACE_NAME {
public static final String TAG_TOKEN = "token";
public static final String TAG_SENTENCE = "sentence";
public static final String ATT_LEMMA = "lemma";
public static final String ATT_XML_LANG = "xml:lang";
public static final String ATT_POS = "pos";
}
where INTERFACE_NAME is the name of the xml file.
public class INTERFACE_NAMEReader extends DefaultHandler2 implements Bergleute_WebLicht_BitPar {
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (TAG_TOKEN.equals(qName)) {
} else if (TAG_SENTENCE.equals(qName)) {
}
}
}
XMLTagExtractor extractor = new XMLTagExtractor(); extractor.setXmlResource(input); extractor.setJavaResource(output); extractor.extract();
| Modifier and Type | Field and Description |
|---|---|
static String |
ARG_INPUT
argument for command line call for determine input file
|
static String |
ARG_OUTPUT
argument for command line call for determine output file
|
static String |
PREFIX_ATTRIBUTE
Name of prefix for xml attribute.
|
static String |
PREFIX_ELEMENT
Name of prefix for xml tags.
|
static String |
PREFIX_NAMESPACE
Name of prefix for xml namespaces prefix.
|
static String |
PREFIX_NAMESPACE_VALUE
Name of prefix for xml namespaces.
|
| Constructor and Description |
|---|
XMLTagExtractor() |
| Modifier and Type | Method and Description |
|---|---|
void |
extract() |
URI |
getJavaResource()
returns java file to be parsed.
|
URI |
getXmlResource()
returns xml file to be parsed.
|
static void |
main(String[] args)
java XMLTagExtractor.class -i XML_FILE -o
OUTPUT_PATH
|
void |
setJavaResource(URI resource)
Sets java file to be parsed.
|
void |
setXmlResource(URI resource)
Sets xml file to be parsed.
|
void |
startElement(String uri,
String localName,
String qName,
Attributes attributes) |
attributeDecl, comment, elementDecl, endCDATA, endDTD, endEntity, externalEntityDecl, getExternalSubset, internalEntityDecl, resolveEntity, resolveEntity, startCDATA, startDTD, startEntitycharacters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warningpublic static final String PREFIX_NAMESPACE
public static final String PREFIX_NAMESPACE_VALUE
public static final String PREFIX_ELEMENT
public static final String PREFIX_ATTRIBUTE
public static final String ARG_INPUT
public static final String ARG_OUTPUT
public void setXmlResource(URI resource) throws FileNotFoundException
FileNotFoundExceptionpublic URI getXmlResource()
public void setJavaResource(URI resource) throws FileNotFoundException
FileNotFoundExceptionpublic URI getJavaResource()
public void extract()
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException
startElement in interface ContentHandlerstartElement in class org.xml.sax.helpers.DefaultHandlerSAXExceptionpublic static void main(String[] args)
args - -i XML_FILE -o OUTPUT_PATHCopyright © 2009–2019 Humboldt-Universität zu Berlin, INRIA. All rights reserved.