Class PepperImporterImpl

    • Field Detail

      • corpusDesc

        protected CorpusDesc corpusDesc
        TODO make docu
    • Method Detail

      • addSupportedFormat

        public FormatDesc addSupportedFormat​(String formatName,
                                             String formatVersion,
                                             org.eclipse.emf.common.util.URI formatReference)
        {@inheritDoc PepperImporter#addSupportedFormat(String, String, URI)}
        Specified by:
        addSupportedFormat in interface PepperImporter
      • setCorpusDesc

        public void setCorpusDesc​(CorpusDesc newCorpusDefinition)
        {@inheritDoc PepperImporter#setCorpusDefinition(CorpusDefinition)}
        Specified by:
        setCorpusDesc in interface PepperImporter
      • getIdentifier2ResourceTable

        public Map<org.corpus_tools.salt.graph.Identifier,​org.eclipse.emf.common.util.URI> getIdentifier2ResourceTable()
        {@inheritDoc PepperImporter#getIdentifier2ResourceTable()}
        Specified by:
        getIdentifier2ResourceTable in interface PepperImporter
      • importCorpusStructure

        public void importCorpusStructure​(org.corpus_tools.salt.common.SCorpusGraph corpusGraph)
                                   throws PepperModuleException
        {@inheritDoc PepperImporter#importCorpusStructure(SCorpusGraph)}
        Specified by:
        importCorpusStructure in interface PepperImporter
        Parameters:
        corpusGraph - an empty graph given by Pepper, which shall contains the corpus structure
        Throws:
        PepperModuleException
      • importCorpusStructureRec

        protected Boolean importCorpusStructureRec​(org.eclipse.emf.common.util.URI currURI,
                                                   org.corpus_tools.salt.common.SCorpus parent)
        Top down traversal in file given structure. This method is called by importCorpusStructure(SCorpusGraph) and creates the corpus-structure via a top down traversal in file structure. For each found file (real file and folder), the method setTypeOfResource(URI) is called to set the type of the resource. If the type is a SALT_TYPE.SDOCUMENT a SDocument object is created for the resource, if the type is a SALT_TYPE.SCORPUS a SCorpus object is created, if the type is null, the resource is ignored.
        Parameters:
        currURI -
        parentsID -
        endings -
        Returns:
        retrns true, if path contains documents, flase otherwise
        Throws:
        IOException
      • setTypeOfResource

        public org.corpus_tools.salt.SALT_TYPE setTypeOfResource​(org.eclipse.emf.common.util.URI resource)
        {@inheritDoc PepperImporter#setTypeOfResource(URI)}
        Specified by:
        setTypeOfResource in interface PepperImporter
        Parameters:
        resource - URI resource to be specified
        Returns:
        SALT_TYPE.SCORPUS if resource represents a SCorpus object, SALT_TYPE.SDOCUMENT if resource represents a SDocument object or null, if it shall be igrnored.
      • readXMLResource

        protected void readXMLResource​(DefaultHandler2 contentHandler,
                                       org.eclipse.emf.common.util.URI documentLocation)
        Helper method to read an xml file with a DefaultHandler2 implementation given as contentHandler. It is assumed, that the file encoding is set to UTF-8.
        Parameters:
        contentHandler - DefaultHandler2 implementation
        documentLocation - location of the xml-file
      • isImportable

        public Double isImportable​(org.eclipse.emf.common.util.URI corpusPath)
        {@inheritDoc PepperImporter#isImportable(URI)}
        Specified by:
        isImportable in interface PepperImporter
        Returns:
        1 if corpus is importable, 0 if corpus is not importable, 0 < X < 1, if no definitiv answer is possible, null if method is not overridden
      • setCorpusPathResolver

        public void setCorpusPathResolver​(CorpusPathResolver corpusPathResolver)
        Sets a CorpusPathResolver which is used by isImportable(URI). With a CorpusPathResolver it is possible, to share read lines of files between multiple importers. Doing this saves time for retrieving the content of the corpus path and the reading of the first x lines of the files.
        Parameters:
        corpusPathResolver -
      • sampleFileContent

        protected Collection<String> sampleFileContent​(org.eclipse.emf.common.util.URI corpusPath,
                                                       String... fileEndings)
        Returns lines of a sampled set of files having the ending specified by fileEndings recursively from specified corpus path.

        This method only delegates to IsImportableUtil#sampleFileContent(URI, int, int, String...). The class IsImportableUtil also contains further helper methods, in case this method is too unprecise.

        Parameters:
        corpusPath - directory to be searched in
        fileEndings - endings to be considered. If no endings specified, all files are considered
        Returns:
        numberOfLines lines of numberOfSampledFiles files