| Class | Description |
|---|---|
| FileStorageHelper |
A collection of utility routines used by the file storage system.
|
| FileStorageImpl |
The default implementation of
FileStorage. |
| FileStorageImplWrapper |
A thin wrapper around the existing FileStorageImpl.
|
| Exception | Description |
|---|---|
| InvalidCharacterException |
Indicates that an object ID contains an invalid character.
|
| InvalidPathException |
Indicates a PairTree path ("ppath" or "relative path") that is not correctly
formed, and cannot be converted to an object ID.
|
The code in this package implements the Vitro file-storage system.
The system incorporates a number of ideas from the PairTree specification,
A typical structure would look like this:
+ basedir
|
+--+ file_storage_namespaces.properties
|
+--+ file_storage_root
The file_storage_root directory contains the subdirectories
that implement the encoded IDs, and the final directory for each ID will
contain a single file that corresponds to that ID.
To reduce the length of the file paths, the system will can be initialized to recognize certain sets of characters (loosely termed "namespaces") and to replace them with a given prefix and separator character during ID encoding.
For example, the sytem might be initialized with a "namespace" of "http://vivo.mydomain.edu/file/". If that is the only namespace, it will be internally assigned a prefix of "a", so a URI like this:
http://vivo.mydomain.edu/file/n3424/myPhoto.jpgwould be converted to this:
a~n3424/myPhoto.jpg
The namespaces and their assigned prefixes are stored in a properties file when the structure is initialized. When the structure is re-opened, the file is read to find the correct prefixes. The file might look like this:
a = http://the.first.namespace/ b = http://the.second.namespace/
This is a multi-step process:
" * + , < = > ? ^ | \ ~The hexadecimal encoding consists of a caret followed by 2 hex digits, e.g.: ^7C
ark:/13030/xt12t3 becomes ark/+=1/303/0=x/t12/t3
http://n2t.info/urn:nbn:se:kb:repos-1 becomes htt/p+=/=n2/t,i/nfo/=ur/n+n/bn+/se+/kb+/rep/os-/1
what-the-*@?#!^!~? becomes wha/t-t/he-/^2a/@^3/f#!/^5e/!^7/e^3/f
http://vivo.myDomain.edu/file/n3424 with namespace
http://vivo.myDomain.edu/file/ and prefix
a becomes a~n/342/4
The name of the file is encoded as needed to guard against illegal characters for the filesystem, but in practice we expect little encoding to be required, since few files are named with the special characters.
The encoding process is the same as the "rare character encoding" and "common character encoding" steps used for ID encoding, except that periods are not encoded.
The uploaded image files are identified by a combination of URI and filename. The URI is used as the principal identifier so we don't need to worry about collisions if two people each upload an image named "image.jpg". The filename is retained so the user can use their browser to download their image from the system and it will be named as they expect it to be.
We wanted a way to store thousands of image files so they would not all be in the same directory. We took our inspiration from the PairTree folks, and modified their algorithm to suit our needs. The general idea is to store files in a multi-layer directory structure based on the URI assigned to the file.
Let's consider a file with this information:
URI = http://vivo.mydomain.edu/individual/n3156
Filename = lily1.jpg
We want to turn the URI into the directory path, but the URI contains prohibited characters. Using a PairTree-like character substitution, we might store it at this path:
/usr/local/vivo/uploads/file_storage_root/http+==vivo.mydomain.edu=individual=n3156/lily1.jpg
Using that scheme would mean that each file sits in its own directory under the storage root. At a large institution, there might be hundreds of thousands of directories under that root.
By breaking this into PairTree-like groupings, we insure that all files don't go into the same directory. Limiting to 3-character names will insure a maximum of about 30,000 files per directory. In practice, the number will be considerably smaller. So then it would look like this:
/usr/local/vivo/uploads/file_storage_root/htt/p+=/=vi/vo./myd/oma/in./edu/=in/div/idu/al=/n31/56/lily1.jpg
But almost all of our URIs will start with the same namespace, so the namespace just adds unnecessary and unhelpful depth to the directory tree. We assign a single-character prefix to that namespace, using the file_storage_namespaces.properties file in the uploads directory, like this:
a = http://vivo.mydomain.edu/individual/
And our URI now looks like this:
a~n3156
Which translates to:
/usr/local/vivo/uploads/file_storage_root/a~n/315/6/lily1.jpg
So what we hope we have implemented is a system where:
By the way, almost all of this is implemented in
edu.cornell.mannlib.vitro.webapp.filestorage.impl.FileStorageHelper
and illustrated in
edu.cornell.mannlib.vitro.webapp.filestorage.impl.FileStorageHelperTest
Copyright © 2016. All rights reserved.