Class ClusterByTimestamp

Object
org.anchoranalysis.bean.AnchorBean<CopyFilesNaming<T>>
org.anchoranalysis.plugin.io.bean.file.copy.naming.CopyFilesNaming<ClusterMembership>
org.anchoranalysis.plugin.io.bean.file.copy.naming.cluster.ClusterByTimestamp

public class ClusterByTimestamp
extends CopyFilesNaming<ClusterMembership>
Associates particular timestamp with each file, and clusters.

The timestamp is chosen, in this order of priority:

  • A date / time string extracted from the filename, if exists in particular patterns, falling back to creation-time, if none exists.
  • Original photo-taken time from EXIF metadata if available, and the file has a jpg or jpeg extension.
  • File creation time.

Timezones are assumed to be the current time-zone, if not otherwise indicated.

File modification time is not considered.

The clustered are named 01, 02, 03 etc. depending on the number of clusters.

The DBSCAN algorithm is used for clustering.

A special cluster OUTLIER_CLUSTER_IDENTIFIER may also be created, for points that were not density-reachable by others, and aren't part of any cluster in particular.

The relative-path of files are preserved, being added relative to the cluster subdirectory.

The default-patterns for matching filenames are:

  • yyyy-mm-dd hh:mm:ss
  • yyyymmdd_hhmmss
  • yyyymmdd hhmmss
Author:
Owen Feehan
  • Constructor Details

  • Method Details

    • beforeCopying

      public ClusterMembership beforeCopying​(Path destinationDirectory, List<org.anchoranalysis.io.input.file.FileWithDirectoryInput> inputs) throws org.anchoranalysis.core.exception.OperationFailedException
      Description copied from class: CopyFilesNaming
      Specified by:
      beforeCopying in class CopyFilesNaming<ClusterMembership>
      Parameters:
      destinationDirectory - the directory to which files are copied.
      inputs - the total number of files to copy.
      Throws:
      org.anchoranalysis.core.exception.OperationFailedException
    • destinationPathRelative

      public Optional<Path> destinationPathRelative​(File file, org.anchoranalysis.io.output.path.prefixer.DirectoryWithPrefix outputTarget, int index, CopyContext<ClusterMembership> context) throws org.anchoranalysis.io.output.error.OutputWriteFailedException
      Description copied from class: CopyFilesNaming
      Calculates the relative-output path (to be appended to destDir)
      Specified by:
      destinationPathRelative in class CopyFilesNaming<ClusterMembership>
      Parameters:
      file - file to be copied
      outputTarget - the directory and prefix associated with the file for outputting
      index - an increasing sequence of numbers for each file beginning at 0
      context - the context for the copying
      Returns:
      the relative-path. if empty, the file should be skipped.
      Throws:
      org.anchoranalysis.io.output.error.OutputWriteFailedException
    • getThresholdHours

      public double getThresholdHours()
      Files whose creation-time differs <= this parameter are joined into the same cluster.

      This is the principle parameter for affecting the sensitivity of the clustering. It is specified in hours between the date-time of two files.

      A larger value encourages a smaller total number of clusters (or larger cluster-size). A smaller values encourages the opposite.

    • setThresholdHours

      public void setThresholdHours​(double thresholdHours)
      Files whose creation-time differs <= this parameter are joined into the same cluster.

      This is the principle parameter for affecting the sensitivity of the clustering. It is specified in hours between the date-time of two files.

      A larger value encourages a smaller total number of clusters (or larger cluster-size). A smaller values encourages the opposite.

    • getMinimumPerCluster

      public int getMinimumPerCluster()
      The minimum number of files that must exist for a cluster.
    • setMinimumPerCluster

      public void setMinimumPerCluster​(int minimumPerCluster)
      The minimum number of files that must exist for a cluster.
    • isPreserveSubdirectories

      public boolean isPreserveSubdirectories()
      If true, the entire relative-path is used when copying files into the cluster directory. If false, only the file-name is used.
    • setPreserveSubdirectories

      public void setPreserveSubdirectories​(boolean preserveSubdirectories)
      If true, the entire relative-path is used when copying files into the cluster directory. If false, only the file-name is used.
    • getTimestampPatterns

      public List<TimestampPattern> getTimestampPatterns()
      The patterns which can be used to extract a date-time from a filename.
    • setTimestampPatterns

      public void setTimestampPatterns​(List<TimestampPattern> timestampPatterns)
      The patterns which can be used to extract a date-time from a filename.
    • getTimeZoneOffset

      public int getTimeZoneOffset()
      If >= 0, sets a specific time-offset in hours. If == -1, then the offset is taken from the current system time-zone settings.
    • setTimeZoneOffset

      public void setTimeZoneOffset​(int timeZoneOffset)
      If >= 0, sets a specific time-offset in hours. If == -1, then the offset is taken from the current system time-zone settings.