org.dspace.discovery
Class SolrServiceImpl

java.lang.Object
  extended by org.dspace.discovery.SolrServiceImpl
All Implemented Interfaces:
IndexingService, SearchService

@Service
public class SolrServiceImpl
extends Object
implements SearchService, IndexingService

SolrIndexer contains the methods that index Items and their metadata, collections, communities, etc. It is meant to either be invoked from the command line (see dspace/bin/index-all) or via the indexContent() methods within DSpace.

The Administrator can choose to run SolrIndexer in a cron that repeats regularly, a failed attempt to index from the UI will be "caught" up on in that cron. The SolrServiceImple is registered as a Service in the ServiceManager via A spring configuration file located under classpath://spring/spring-dspace-applicationContext.xml Its configuration is Autowired by the ApplicationContext

Author:
Kevin Van de Velde (kevin at atmire dot com), Mark Diggory (markd at atmire dot com), Ben Bosman (ben at atmire dot com)

Field Summary
static String AUTHORITY_SEPARATOR
           
static String FILTER_SEPARATOR
           
protected static String LAST_INDEXED_FIELD
           
static String STORE_SEPARATOR
           
static String VARIANTS_STORE_SEPARATOR
           
 
Constructor Summary
SolrServiceImpl()
           
 
Method Summary
protected  void addContainerMetadataField(org.apache.solr.common.SolrInputDocument doc, List<String> highlightedMetadataFields, String metadataField, String value)
          Add the metadata value of the community/collection to the solr document IF needed highlighting is added !
protected  void buildDocument(Context context, Collection collection)
          Build a solr document for a DSpace Collection.
protected  void buildDocument(Context context, Community community)
          Build a solr document for a DSpace Community.
protected  void buildDocument(Context context, Item item)
          Build a Lucene document for a DSpace Item and write the index
protected  org.apache.solr.common.SolrInputDocument buildDocument(int type, int id, String handle, List<String> locations)
          Create Lucene document with all the shared fields initialized.
 void cleanIndex(boolean force)
          Iterates over all documents in the Lucene index and verifies they are in database, if not, they are removed.
 void commit()
           
 void createIndex(Context c)
          create full index - wiping old index
protected  void emailException(Exception exception)
           
protected static DSpaceObject findDSpaceObject(Context context, org.apache.solr.common.SolrDocument doc)
           
protected  List<String> getCollectionLocations(Collection target)
           
protected  List<String> getItemLocations(Item myitem)
           
 List<Item> getRelatedItems(Context context, Item item, DiscoveryMoreLikeThisConfiguration mltConfig)
           
protected  org.apache.solr.client.solrj.impl.CommonsHttpSolrServer getSolr()
           
 void indexContent(Context context, DSpaceObject dso)
          If the handle for the "dso" already exists in the index, and the "dso" has a lastModified timestamp that is newer than the document in the index then it is updated, otherwise a new document is added.
 void indexContent(Context context, DSpaceObject dso, boolean force)
          If the handle for the "dso" already exists in the index, and the "dso" has a lastModified timestamp that is newer than the document in the index then it is updated, otherwise a new document is added.
 void indexContent(Context context, DSpaceObject dso, boolean force, boolean commit)
           
static String locationToName(Context context, String field, String value)
           
 void optimize()
          Maintenance to keep a SOLR index efficient.
 void reIndexContent(Context context, DSpaceObject dso)
          reIndexContent removes something from the index, then re-indexes it
protected  boolean requiresIndexing(String handle, Date lastModified)
          Is stale checks the lastModified time stamp in the database and the index to determine if the index is stale.
protected  org.apache.solr.client.solrj.SolrQuery resolveToSolrQuery(Context context, DiscoverQuery discoveryQuery, boolean includeWithdrawn)
           
protected  DiscoverResult retrieveResult(Context context, DiscoverQuery query, org.apache.solr.client.solrj.response.QueryResponse solrQueryResponse)
           
 DiscoverResult search(Context context, DiscoverQuery query)
          Convenient method to call @see #search(Context, DSpaceObject, DiscoverQuery) with a null DSpace Object as scope (i.e.
 DiscoverResult search(Context context, DiscoverQuery discoveryQuery, boolean includeWithdrawn)
           
 DiscoverResult search(Context context, DSpaceObject dso, DiscoverQuery query)
          Convenient method to call @see #search(Context, DSpaceObject, DiscoverQuery, boolean) with includeWithdrawn=false
 DiscoverResult search(Context context, DSpaceObject dso, DiscoverQuery discoveryQuery, boolean includeWithdrawn)
           
 List<DSpaceObject> search(Context context, String query, int offset, int max, String... filterquery)
           
 List<DSpaceObject> search(Context context, String query, String orderfield, boolean ascending, int offset, int max, String... filterquery)
           
 InputStream searchAsInputStream(DiscoverQuery query)
          Simple means to return the search result as an InputStream
 InputStream searchJSON(Context context, DiscoverQuery query, DSpaceObject dso, String jsonIdentifier)
           
 InputStream searchJSON(Context context, DiscoverQuery discoveryQuery, String jsonIdentifier)
           
static Date toDate(String t)
          Helper function to retrieve a date using a best guess of the potential date encodings on a field
 DiscoverFilterQuery toFilterQuery(Context context, String field, String operator, String value)
          Transforms the given string field and value into a filter query
 String toSortFieldIndex(String metadataField, String type)
          Transforms the metadata field of the given sort configuration into the indexed field which we can then use in our solr queries
protected  String transformAuthorityValue(Context context, String field, String value)
           
protected  String transformDisplayedValue(Context context, String field, String value)
           
protected  String transformFacetField(DiscoverFacetField facetFieldConfig, String field, boolean removePostfix)
           
protected  String transformSortValue(Context context, String field, String value)
           
 void unIndexContent(Context context, DSpaceObject dso)
          unIndex removes an Item, Collection, or Community
 void unIndexContent(Context context, DSpaceObject dso, boolean commit)
          unIndex removes an Item, Collection, or Community
 void unIndexContent(Context context, String handle)
          Unindex a Document in the Lucene index.
 void unIndexContent(Context context, String handle, boolean commit)
          Unindex a Document in the Lucene Index.
 void updateIndex(Context context)
          Iterates over all Items, Collections and Communities.
 void updateIndex(Context context, boolean force)
          Iterates over all Items, Collections and Communities.
protected  void writeDocument(org.apache.solr.common.SolrInputDocument doc)
          Write the document to the index under the appropriate handle.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LAST_INDEXED_FIELD

protected static final String LAST_INDEXED_FIELD
See Also:
Constant Field Values

FILTER_SEPARATOR

public static final String FILTER_SEPARATOR
See Also:
Constant Field Values

AUTHORITY_SEPARATOR

public static final String AUTHORITY_SEPARATOR
See Also:
Constant Field Values

STORE_SEPARATOR

public static final String STORE_SEPARATOR
See Also:
Constant Field Values

VARIANTS_STORE_SEPARATOR

public static final String VARIANTS_STORE_SEPARATOR
See Also:
Constant Field Values
Constructor Detail

SolrServiceImpl

public SolrServiceImpl()
Method Detail

getSolr

protected org.apache.solr.client.solrj.impl.CommonsHttpSolrServer getSolr()
                                                                   throws MalformedURLException,
                                                                          org.apache.solr.client.solrj.SolrServerException
Throws:
MalformedURLException
org.apache.solr.client.solrj.SolrServerException

indexContent

public void indexContent(Context context,
                         DSpaceObject dso)
                  throws SQLException
If the handle for the "dso" already exists in the index, and the "dso" has a lastModified timestamp that is newer than the document in the index then it is updated, otherwise a new document is added.

Specified by:
indexContent in interface IndexingService
Parameters:
context - Users Context
dso - DSpace Object (Item, Collection or Community
Throws:
SQLException
IOException

indexContent

public void indexContent(Context context,
                         DSpaceObject dso,
                         boolean force)
                  throws SQLException
If the handle for the "dso" already exists in the index, and the "dso" has a lastModified timestamp that is newer than the document in the index then it is updated, otherwise a new document is added.

Specified by:
indexContent in interface IndexingService
Parameters:
context - Users Context
dso - DSpace Object (Item, Collection or Community
force - Force update even if not stale.
Throws:
SQLException
IOException

unIndexContent

public void unIndexContent(Context context,
                           DSpaceObject dso)
                    throws SQLException,
                           IOException
unIndex removes an Item, Collection, or Community

Specified by:
unIndexContent in interface IndexingService
Parameters:
context -
dso - DSpace Object, can be Community, Item, or Collection
Throws:
SQLException
IOException

unIndexContent

public void unIndexContent(Context context,
                           DSpaceObject dso,
                           boolean commit)
                    throws SQLException,
                           IOException
unIndex removes an Item, Collection, or Community

Specified by:
unIndexContent in interface IndexingService
Parameters:
context -
dso - DSpace Object, can be Community, Item, or Collection
commit - if true force an immediate commit on SOLR
Throws:
SQLException
IOException

unIndexContent

public void unIndexContent(Context context,
                           String handle)
                    throws IOException,
                           SQLException
Unindex a Document in the Lucene index.

Specified by:
unIndexContent in interface IndexingService
Parameters:
context - the dspace context
handle - the handle of the object to be deleted
Throws:
IOException
SQLException

unIndexContent

public void unIndexContent(Context context,
                           String handle,
                           boolean commit)
                    throws SQLException,
                           IOException
Unindex a Document in the Lucene Index.

Specified by:
unIndexContent in interface IndexingService
Parameters:
context - the dspace context
handle - the handle of the object to be deleted
Throws:
SQLException
IOException

reIndexContent

public void reIndexContent(Context context,
                           DSpaceObject dso)
                    throws SQLException,
                           IOException
reIndexContent removes something from the index, then re-indexes it

Specified by:
reIndexContent in interface IndexingService
Parameters:
context - context object
dso - object to re-index
Throws:
SQLException
IOException

createIndex

public void createIndex(Context c)
                 throws SQLException,
                        IOException
create full index - wiping old index

Specified by:
createIndex in interface IndexingService
Parameters:
c - context to use
Throws:
SQLException
IOException

updateIndex

public void updateIndex(Context context)
Iterates over all Items, Collections and Communities. And updates them in the index. Uses decaching to control memory footprint. Uses indexContent and isStale to check state of item in index.

Specified by:
updateIndex in interface IndexingService
Parameters:
context - the dspace context

updateIndex

public void updateIndex(Context context,
                        boolean force)
Iterates over all Items, Collections and Communities. And updates them in the index. Uses decaching to control memory footprint. Uses indexContent and isStale to check state of item in index.

At first it may appear counterintuitive to have an IndexWriter/Reader opened and closed on each DSO. But this allows the UI processes to step in and attain a lock and write to the index even if other processes/jvms are running a reindex.

Specified by:
updateIndex in interface IndexingService
Parameters:
context - the dspace context
force - whether or not to force the reindexing

cleanIndex

public void cleanIndex(boolean force)
                throws IOException,
                       SQLException,
                       SearchServiceException
Iterates over all documents in the Lucene index and verifies they are in database, if not, they are removed.

Specified by:
cleanIndex in interface IndexingService
Parameters:
force - whether or not to force a clean index
Throws:
IOException - IO exception
SQLException - sql exception
SearchServiceException - occurs when something went wrong with querying the solr server

optimize

public void optimize()
Maintenance to keep a SOLR index efficient. Note: This might take a long time.

Specified by:
optimize in interface IndexingService

emailException

protected void emailException(Exception exception)

requiresIndexing

protected boolean requiresIndexing(String handle,
                                   Date lastModified)
                            throws SQLException,
                                   IOException,
                                   SearchServiceException
Is stale checks the lastModified time stamp in the database and the index to determine if the index is stale.

Parameters:
handle - the handle of the dso
lastModified - the last modified date of the DSpace object
Returns:
a boolean indicating if the dso should be re indexed again
Throws:
SQLException - sql exception
IOException - io exception
SearchServiceException - if something went wrong with querying the solr server

getItemLocations

protected List<String> getItemLocations(Item myitem)
                                 throws SQLException
Parameters:
myitem - the item for which our locations are to be retrieved
Returns:
a list containing the identifiers of the communities & collections
Throws:
SQLException - sql exception

getCollectionLocations

protected List<String> getCollectionLocations(Collection target)
                                       throws SQLException
Throws:
SQLException

writeDocument

protected void writeDocument(org.apache.solr.common.SolrInputDocument doc)
                      throws IOException
Write the document to the index under the appropriate handle.

Parameters:
doc - the solr document to be written to the server
Throws:
IOException - IO exception

buildDocument

protected void buildDocument(Context context,
                             Community community)
                      throws SQLException,
                             IOException
Build a solr document for a DSpace Community.

Parameters:
community - Community to be indexed
Throws:
SQLException
IOException

buildDocument

protected void buildDocument(Context context,
                             Collection collection)
                      throws SQLException,
                             IOException
Build a solr document for a DSpace Collection.

Parameters:
collection - Collection to be indexed
Throws:
SQLException - sql exception
IOException - IO exception

addContainerMetadataField

protected void addContainerMetadataField(org.apache.solr.common.SolrInputDocument doc,
                                         List<String> highlightedMetadataFields,
                                         String metadataField,
                                         String value)
Add the metadata value of the community/collection to the solr document IF needed highlighting is added !

Parameters:
doc - the solr document
highlightedMetadataFields - the list of metadata fields that CAN be highlighted
metadataField - the metadata field added
value - the value (can be NULL !)

buildDocument

protected void buildDocument(Context context,
                             Item item)
                      throws SQLException,
                             IOException
Build a Lucene document for a DSpace Item and write the index

Parameters:
context - Users Context
item - The DSpace Item to be indexed
Throws:
SQLException
IOException

buildDocument

protected org.apache.solr.common.SolrInputDocument buildDocument(int type,
                                                                 int id,
                                                                 String handle,
                                                                 List<String> locations)
Create Lucene document with all the shared fields initialized.

Parameters:
type - Type of DSpace Object
id -
handle -
locations - @return

toDate

public static Date toDate(String t)
Helper function to retrieve a date using a best guess of the potential date encodings on a field

Parameters:
t - the string to be transformed to a date
Returns:
a date if the formatting was successful, null if not able to transform to a date

locationToName

public static String locationToName(Context context,
                                    String field,
                                    String value)
                             throws SQLException
Throws:
SQLException

search

public DiscoverResult search(Context context,
                             DiscoverQuery query)
                      throws SearchServiceException
Description copied from interface: SearchService
Convenient method to call @see #search(Context, DSpaceObject, DiscoverQuery) with a null DSpace Object as scope (i.e. all the repository)

Specified by:
search in interface SearchService
Parameters:
context - DSpace Context object
query - the discovery query object
Returns:
Throws:
SearchServiceException

search

public DiscoverResult search(Context context,
                             DSpaceObject dso,
                             DiscoverQuery query)
                      throws SearchServiceException
Description copied from interface: SearchService
Convenient method to call @see #search(Context, DSpaceObject, DiscoverQuery, boolean) with includeWithdrawn=false

Specified by:
search in interface SearchService
Parameters:
context - DSpace Context object
dso - a DSpace Object to use as scope of the search (only results within this object)
query - the discovery query object
Returns:
Throws:
SearchServiceException

search

public DiscoverResult search(Context context,
                             DSpaceObject dso,
                             DiscoverQuery discoveryQuery,
                             boolean includeWithdrawn)
                      throws SearchServiceException
Specified by:
search in interface SearchService
Parameters:
context - DSpace Context object
dso - a DSpace Object to use as scope of the search (only results within this object)
discoveryQuery - the discovery query object
includeWithdrawn - use true to include in the results also withdrawn items that match the query
Returns:
Throws:
SearchServiceException

search

public DiscoverResult search(Context context,
                             DiscoverQuery discoveryQuery,
                             boolean includeWithdrawn)
                      throws SearchServiceException
Specified by:
search in interface SearchService
Parameters:
context - DSpace Context object
includeWithdrawn - use true to include in the results also withdrawn items that match the query
Returns:
Throws:
SearchServiceException

resolveToSolrQuery

protected org.apache.solr.client.solrj.SolrQuery resolveToSolrQuery(Context context,
                                                                    DiscoverQuery discoveryQuery,
                                                                    boolean includeWithdrawn)

searchJSON

public InputStream searchJSON(Context context,
                              DiscoverQuery query,
                              DSpaceObject dso,
                              String jsonIdentifier)
                       throws SearchServiceException
Specified by:
searchJSON in interface SearchService
Throws:
SearchServiceException

searchJSON

public InputStream searchJSON(Context context,
                              DiscoverQuery discoveryQuery,
                              String jsonIdentifier)
                       throws SearchServiceException
Specified by:
searchJSON in interface SearchService
Throws:
SearchServiceException

retrieveResult

protected DiscoverResult retrieveResult(Context context,
                                        DiscoverQuery query,
                                        org.apache.solr.client.solrj.response.QueryResponse solrQueryResponse)
                                 throws SQLException
Throws:
SQLException

findDSpaceObject

protected static DSpaceObject findDSpaceObject(Context context,
                                               org.apache.solr.common.SolrDocument doc)
                                        throws SQLException
Throws:
SQLException

searchAsInputStream

public InputStream searchAsInputStream(DiscoverQuery query)
                                throws SearchServiceException,
                                       IOException
Simple means to return the search result as an InputStream

Throws:
SearchServiceException
IOException

search

public List<DSpaceObject> search(Context context,
                                 String query,
                                 int offset,
                                 int max,
                                 String... filterquery)

search

public List<DSpaceObject> search(Context context,
                                 String query,
                                 String orderfield,
                                 boolean ascending,
                                 int offset,
                                 int max,
                                 String... filterquery)
Specified by:
search in interface SearchService

toFilterQuery

public DiscoverFilterQuery toFilterQuery(Context context,
                                         String field,
                                         String operator,
                                         String value)
                                  throws SQLException
Description copied from interface: SearchService
Transforms the given string field and value into a filter query

Specified by:
toFilterQuery in interface SearchService
Parameters:
context - the DSpace context
field - the field of the filter query
value - the filter query value
Returns:
a filter query
Throws:
SQLException - ...

getRelatedItems

public List<Item> getRelatedItems(Context context,
                                  Item item,
                                  DiscoveryMoreLikeThisConfiguration mltConfig)
Specified by:
getRelatedItems in interface SearchService

toSortFieldIndex

public String toSortFieldIndex(String metadataField,
                               String type)
Description copied from interface: SearchService
Transforms the metadata field of the given sort configuration into the indexed field which we can then use in our solr queries

Specified by:
toSortFieldIndex in interface SearchService
Parameters:
metadataField - the metadata field
Returns:
the indexed field

transformFacetField

protected String transformFacetField(DiscoverFacetField facetFieldConfig,
                                     String field,
                                     boolean removePostfix)

transformDisplayedValue

protected String transformDisplayedValue(Context context,
                                         String field,
                                         String value)
                                  throws SQLException
Throws:
SQLException

transformAuthorityValue

protected String transformAuthorityValue(Context context,
                                         String field,
                                         String value)
                                  throws SQLException
Throws:
SQLException

transformSortValue

protected String transformSortValue(Context context,
                                    String field,
                                    String value)
                             throws SQLException
Throws:
SQLException

indexContent

public void indexContent(Context context,
                         DSpaceObject dso,
                         boolean force,
                         boolean commit)
                  throws SearchServiceException,
                         SQLException
Specified by:
indexContent in interface IndexingService
Throws:
SearchServiceException
SQLException

commit

public void commit()
            throws SearchServiceException
Specified by:
commit in interface IndexingService
Throws:
SearchServiceException


Copyright © 2012 DuraSpace. All Rights Reserved.