Package org.corpus_tools.pepper.impl
Class PepperImporterImpl
- java.lang.Object
-
- org.corpus_tools.pepper.impl.PepperModuleImpl
-
- org.corpus_tools.pepper.impl.PepperImporterImpl
-
- All Implemented Interfaces:
Thread.UncaughtExceptionHandler,PepperImporter,PepperModule
- Direct Known Subclasses:
DoNothingImporter,SaltXMLImporter,TextImporter
public abstract class PepperImporterImpl extends PepperModuleImpl implements PepperImporter
An importer in Pepper reads data from a format A and maps its data to a Salt model. An importer must implement the class
PepperImporterand can extend the this class. We strongly recommend to extend this class, since it contains a lot of helpful functions and methods controlling the workflow.- Author:
- Florian Zipser
- See Also:
PepperImporter
-
-
Field Summary
Fields Modifier and Type Field Description protected CorpusDesccorpusDescTODO make docu-
Fields inherited from class org.corpus_tools.pepper.impl.PepperModuleImpl
isMultithreaded, logger, moduleController, resources, saltProject, sCorpusGraph, symbolicName, temproraries
-
Fields inherited from interface org.corpus_tools.pepper.modules.PepperImporter
NEGATIVE_FILE_EXTENSION_MARKER
-
Fields inherited from interface org.corpus_tools.pepper.modules.PepperModule
ENDING_ALL_FILES, ENDING_FOLDER, ENDING_LEAF_FOLDER, ENDING_TAB, ENDING_TXT, ENDING_XML
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedPepperImporterImpl()Creates aPepperModuleof typeMODULE_TYPE.IMPORTER.protectedPepperImporterImpl(String name)Creates aPepperModuleof typeMODULE_TYPE.IMPORTERand sets is name to the passed one.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description FormatDescaddSupportedFormat(String formatName, String formatVersion, org.eclipse.emf.common.util.URI formatReference){@inheritDoc PepperImporter#addSupportedFormat(String, String, URI)}CorpusDescgetCorpusDesc(){@inheritDoc PepperImporter#getCorpusDefinition()}Collection<String>getCorpusEndings(){@inheritDoc PepperImporter#getCorpusEndings()}Collection<String>getDocumentEndings(){@inheritDoc PepperImporter#getDocumentEndings()}Map<org.corpus_tools.salt.graph.Identifier,org.eclipse.emf.common.util.URI>getIdentifier2ResourceTable(){@inheritDoc PepperImporter#getIdentifier2ResourceTable()}Collection<String>getIgnoreEndings()Returns a collection of filenames, not to be imported.List<FormatDesc>getSupportedFormats(){@inheritDoc PepperImporter#getSupportedFormats()}voidimportCorpusStructure(org.corpus_tools.salt.common.SCorpusGraph corpusGraph){@inheritDoc PepperImporter#importCorpusStructure(SCorpusGraph)}protected BooleanimportCorpusStructureRec(org.eclipse.emf.common.util.URI currURI, org.corpus_tools.salt.common.SCorpus parent)Top down traversal in file given structure.DoubleisImportable(org.eclipse.emf.common.util.URI corpusPath){@inheritDoc PepperImporter#isImportable(URI)}protected voidreadXMLResource(DefaultHandler2 contentHandler, org.eclipse.emf.common.util.URI documentLocation)Helper method to read an xml file with aDefaultHandler2implementation given as contentHandler.protected Collection<String>sampleFileContent(org.eclipse.emf.common.util.URI corpusPath, String... fileEndings)Returns lines of a sampled set of files having the ending specified byfileEndingsrecursively from specified corpus path.voidsetCorpusDesc(CorpusDesc newCorpusDefinition){@inheritDoc PepperImporter#setCorpusDefinition(CorpusDefinition)}voidsetCorpusPathResolver(CorpusPathResolver corpusPathResolver)Sets aCorpusPathResolverwhich is used byisImportable(URI).org.corpus_tools.salt.SALT_TYPEsetTypeOfResource(org.eclipse.emf.common.util.URI resource){@inheritDoc PepperImporter#setTypeOfResource(URI)}voidstart()Overrides the methodPepperModuleImpl.start()to add the following, beforePepperModuleImpl.start()is called.-
Methods inherited from class org.corpus_tools.pepper.impl.PepperModuleImpl
activate, createPepperMapper, done, done, end, getComponentContext, getCorpusGraph, getDesc, getDocumentId2DC, getFingerprint, getMapperControllers, getMapperThreadGroup, getModuleController, getModuleType, getName, getProgress, getProgress, getProperties, getResources, getSaltProject, getSelfTestDesc, getStartProblems, getSupplierContact, getSupplierHomepage, getSymbolicName, getTemproraries, getVersion, isMultithreaded, isReadyToStart, proposeImportOrder, setCorpusGraph, setDesc, setIsMultithreaded, setMapperThreadGroup, setName, setPepperModuleController, setPepperModuleController_basic, setProperties, setResources, setSaltProject, setSupplierContact, setSupplierHomepage, setSymbolicName, setTemproraries, setVersion, start, toString, uncaughtException
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.corpus_tools.pepper.modules.PepperModule
createPepperMapper, done, done, end, getComponentContext, getCorpusGraph, getDesc, getFingerprint, getModuleController, getModuleType, getName, getProgress, getProgress, getProperties, getResources, getSaltProject, getSelfTestDesc, getStartProblems, getSupplierContact, getSupplierHomepage, getSymbolicName, getTemproraries, getVersion, isMultithreaded, isReadyToStart, proposeImportOrder, setCorpusGraph, setDesc, setIsMultithreaded, setPepperModuleController, setPepperModuleController_basic, setProperties, setResources, setSaltProject, setSupplierContact, setSupplierHomepage, setSymbolicName, setTemproraries, setVersion, start
-
-
-
-
Field Detail
-
corpusDesc
protected CorpusDesc corpusDesc
TODO make docu
-
-
Constructor Detail
-
PepperImporterImpl
protected PepperImporterImpl()
Creates aPepperModuleof typeMODULE_TYPE.IMPORTER. The name is set to "MyImporter".
We recommend to use the constructorPepperImporterImpl(String)and pass a proper name.
-
PepperImporterImpl
protected PepperImporterImpl(String name)
Creates aPepperModuleof typeMODULE_TYPE.IMPORTERand sets is name to the passed one.
-
-
Method Detail
-
getSupportedFormats
public List<FormatDesc> getSupportedFormats()
{@inheritDoc PepperImporter#getSupportedFormats()}- Specified by:
getSupportedFormatsin interfacePepperImporter- Returns:
-
addSupportedFormat
public FormatDesc addSupportedFormat(String formatName, String formatVersion, org.eclipse.emf.common.util.URI formatReference)
{@inheritDoc PepperImporter#addSupportedFormat(String, String, URI)}- Specified by:
addSupportedFormatin interfacePepperImporter
-
getCorpusDesc
public CorpusDesc getCorpusDesc()
{@inheritDoc PepperImporter#getCorpusDefinition()}- Specified by:
getCorpusDescin interfacePepperImporter- Returns:
-
setCorpusDesc
public void setCorpusDesc(CorpusDesc newCorpusDefinition)
{@inheritDoc PepperImporter#setCorpusDefinition(CorpusDefinition)}- Specified by:
setCorpusDescin interfacePepperImporter
-
getIdentifier2ResourceTable
public Map<org.corpus_tools.salt.graph.Identifier,org.eclipse.emf.common.util.URI> getIdentifier2ResourceTable()
{@inheritDoc PepperImporter#getIdentifier2ResourceTable()}- Specified by:
getIdentifier2ResourceTablein interfacePepperImporter
-
importCorpusStructure
public void importCorpusStructure(org.corpus_tools.salt.common.SCorpusGraph corpusGraph) throws PepperModuleException{@inheritDoc PepperImporter#importCorpusStructure(SCorpusGraph)}- Specified by:
importCorpusStructurein interfacePepperImporter- Parameters:
corpusGraph- an empty graph given by Pepper, which shall contains the corpus structure- Throws:
PepperModuleException
-
importCorpusStructureRec
protected Boolean importCorpusStructureRec(org.eclipse.emf.common.util.URI currURI, org.corpus_tools.salt.common.SCorpus parent)
Top down traversal in file given structure. This method is called byimportCorpusStructure(SCorpusGraph)and creates the corpus-structure via a top down traversal in file structure. For each found file (real file and folder), the methodsetTypeOfResource(URI)is called to set the type of the resource. If the type is aSALT_TYPE.SDOCUMENTaSDocumentobject is created for the resource, if the type is aSALT_TYPE.SCORPUSaSCorpusobject is created, if the type is null, the resource is ignored.- Parameters:
currURI-parentsID-endings-- Returns:
- retrns true, if path contains documents, flase otherwise
- Throws:
IOException
-
start
public void start() throws PepperModuleExceptionOverrides the methodPepperModuleImpl.start()to add the following, beforePepperModuleImpl.start()is called.- a check if corpus path exists
- Specified by:
startin interfacePepperModule- Overrides:
startin classPepperModuleImpl- Throws:
PepperModuleException
-
getDocumentEndings
public Collection<String> getDocumentEndings()
{@inheritDoc PepperImporter#getDocumentEndings()}- Specified by:
getDocumentEndingsin interfacePepperImporter- Returns:
- a collection of endings
-
getCorpusEndings
public Collection<String> getCorpusEndings()
{@inheritDoc PepperImporter#getCorpusEndings()}- Specified by:
getCorpusEndingsin interfacePepperImporter- Returns:
- a collection of endings
-
setTypeOfResource
public org.corpus_tools.salt.SALT_TYPE setTypeOfResource(org.eclipse.emf.common.util.URI resource)
{@inheritDoc PepperImporter#setTypeOfResource(URI)}- Specified by:
setTypeOfResourcein interfacePepperImporter- Parameters:
resource-URIresource to be specified- Returns:
SALT_TYPE.SCORPUSif resource represents aSCorpusobject,SALT_TYPE.SDOCUMENTif resource represents aSDocumentobject or null, if it shall be igrnored.
-
getIgnoreEndings
public Collection<String> getIgnoreEndings()
Returns a collection of filenames, not to be imported. {@inheritDoc #importIgnoreList} .- Specified by:
getIgnoreEndingsin interfacePepperImporter- Returns:
-
readXMLResource
protected void readXMLResource(DefaultHandler2 contentHandler, org.eclipse.emf.common.util.URI documentLocation)
Helper method to read an xml file with aDefaultHandler2implementation given as contentHandler. It is assumed, that the file encoding is set to UTF-8.- Parameters:
contentHandler-DefaultHandler2implementationdocumentLocation- location of the xml-file
-
isImportable
public Double isImportable(org.eclipse.emf.common.util.URI corpusPath)
{@inheritDoc PepperImporter#isImportable(URI)}- Specified by:
isImportablein interfacePepperImporter- Returns:
- 1 if corpus is importable, 0 if corpus is not importable, 0 < X < 1, if no definitiv answer is possible, null if method is not overridden
-
setCorpusPathResolver
public void setCorpusPathResolver(CorpusPathResolver corpusPathResolver)
Sets aCorpusPathResolverwhich is used byisImportable(URI). With aCorpusPathResolverit is possible, to share read lines of files between multiple importers. Doing this saves time for retrieving the content of the corpus path and the reading of the first x lines of the files.- Parameters:
corpusPathResolver-
-
sampleFileContent
protected Collection<String> sampleFileContent(org.eclipse.emf.common.util.URI corpusPath, String... fileEndings)
Returns lines of a sampled set of files having the ending specified byfileEndingsrecursively from specified corpus path.This method only delegates to
IsImportableUtil#sampleFileContent(URI, int, int, String...). The classIsImportableUtilalso contains further helper methods, in case this method is too unprecise.- Parameters:
corpusPath- directory to be searched infileEndings- endings to be considered. If no endings specified, all files are considered- Returns:
numberOfLineslines ofnumberOfSampledFilesfiles
-
-