Class CorpusPathResolver


  • public class CorpusPathResolver
    extends Object
    • Field Detail

      • NUMBER_OF_SAMPLED_FILES

        public static final int NUMBER_OF_SAMPLED_FILES
        The number of files which are read for sampling when invoking #findAppropriateImporters(URI).
        See Also:
        Constant Field Values
      • NUMBER_OF_SAMPLED_LINES

        public static final int NUMBER_OF_SAMPLED_LINES
        The number of lines in a file which are read for sampling when invoking #findAppropriateImporters(URI).
        See Also:
        Constant Field Values
      • unreadFilesGroupedByExtension

        protected com.google.common.collect.Multimap<String,​File> unreadFilesGroupedByExtension
      • readFilesGroupedByExtension

        protected com.google.common.collect.Multimap<String,​org.corpus_tools.pepper.impl.CorpusPathResolver.FileContent> readFilesGroupedByExtension
    • Method Detail

      • sampleFileContent

        public Collection<String> sampleFileContent​(String... fileEndings)
        Returns 10 lines of a sampled set of 20 files having the ending specified by fileEndings recursively from specified corpus path.
        Parameters:
        fileEnding - ending to be considered. If no endings specified, all files are considered
        Returns:
        the first 10 lines of 20 files
      • sampleFileContent

        public Collection<String> sampleFileContent​(int numberOfSampledFiles,
                                                    int numberOfSampledLines,
                                                    String... fileEndings)
        Returns fileEndings lines of a sampled set of numberOfSampledLines files having the ending specified by fileEndings recursively from specified corpus path.
        Parameters:
        numberOfSampledFiles - number of files to be read
        numberOfSampledLines - number of lines to be read
        fileEnding - ending to be considered. If no endings specified, all files are considered
        Returns:
        the first 10 lines of numberOfSampledLines files
      • groupFilesByEnding

        protected com.google.common.collect.Multimap<String,​File> groupFilesByEnding​(org.eclipse.emf.common.util.URI corpusPath)
                                                                                    throws FileNotFoundException
        Groups files for their file ending into a multimap. The key is the ending.
        Parameters:
        corpusPath -
        Returns:
        Throws:
        FileNotFoundException
      • getXFilesWithExtension

        protected Collection<org.corpus_tools.pepper.impl.CorpusPathResolver.FileContent> getXFilesWithExtension​(int numOfFiles,
                                                                                                                 int numOfLinesToRead,
                                                                                                                 String fileEnding)
      • sampleFiles

        protected Collection<File> sampleFiles​(Collection<File> files,
                                               int numberOfSampledFiles)
        Creates a sampled set of numberOfSampledFiles files recursively from directory dir with specified endings.
        Parameters:
        dir - the directory to be traversed recursively
        numberOfSampledFiles - number of files to be sampled
        fileEndings - endings of files to be sampled
        Returns:
        a collection of files having on of the endings in endings in directory dir
      • readFirstLines

        protected String readFirstLines​(File file,
                                        int numOfLinesToRead)
        Reads the first X lines of the passed file and returns them as a String
        Parameters:
        corpusPath - path to file
        lines - number of lines
        Returns:
        first X lines