Class NameCentricRetrieval
- java.lang.Object
-
- de.julielab.gene.candidateretrieval.NameCentricRetrieval
-
- All Implemented Interfaces:
CandidateRetrieval,de.julielab.geneexpbase.candidateretrieval.CandidateRetrieval,Closeable,AutoCloseable
public class NameCentricRetrieval extends Object implements CandidateRetrieval
-
-
Field Summary
Fields Modifier and Type Field Description static org.slf4j.LoggercandidateLogstatic de.julielab.geneexpbase.candidateretrieval.QueryGeneratorCONJUNCTIONstatic de.julielab.geneexpbase.candidateretrieval.QueryGeneratorDISJUNCTIONstatic de.julielab.geneexpbase.candidateretrieval.QueryGeneratorDISJUNCTION_MINUS_1static de.julielab.geneexpbase.candidateretrieval.QueryGeneratorDISJUNCTION_MINUS_2static intJAROWINKLER_SCORERstatic intLEVENSHTEIN_SCORERstatic StringLOGGER_NAME_CANDIDATESstatic intLUCENE_MAX_HITSthe maximal number of hits lucene returns for a querystatic intLUCENE_SCORERstatic intMAXENT_SCORERstatic StringMAXENT_SCORER_MODELdefault model for MaxEntScorerstatic StringNAME_PRIO_DELIMITERstatic de.julielab.geneexpbase.candidateretrieval.QueryGeneratorNGRAM_2_3static intSIMPLE_SCORERstatic intTFIDFstatic intTOKEN_JAROWINKLER_SCORER
-
Constructor Summary
Constructors Constructor Description NameCentricRetrieval(Configuration config, de.julielab.geneexpbase.services.CacheService cacheService)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, String organism, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, Collection<String> organisms, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(de.julielab.geneexpbase.genemodel.GeneMention gm, Collection<String> taxId, de.julielab.geneexpbase.configuration.Parameters parameters, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, Collection<String> geneIdsFilter, Collection<String> organisms, boolean loadFields, de.julielab.geneexpbase.configuration.Parameters parameters, int numReturnedHits, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, Collection<String> geneIdsFilter, Collection<String> organisms, boolean loadFields, de.julielab.geneexpbase.configuration.Parameters parameters, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, Collection<String> geneIdsFilter, Collection<String> organisms, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(String originalSearchTerm, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(String geneMentionText, String organism, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(String geneMentionText, Collection<String> organism, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(String geneMentionText, Collection<String> geneIdsFilter, Collection<String> organism, boolean loadFields, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getCandidates(String geneMentionText, Collection<String> geneIdsFilter, Collection<String> organism, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getFamilyNames(de.julielab.geneexpbase.genemodel.GeneMention gm, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)Searches the index for the given gene mention filtered for family names.Set<GeneRecordHit>getGeneRecords(Collection<String> ids)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getIndexEntries(List<String> ids)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getIndexRecords(Collection<String> ids)The record-index equivalent togetIndexEntries(List).de.julielab.geneexpbase.TermNormalizergetNormalizer()List<de.julielab.geneexpbase.candidateretrieval.SynHit>getOriginalNamesIndexRecords(Collection<String> geneIds)List<de.julielab.geneexpbase.candidateretrieval.SynHit>getOriginalNamesIndexRecords(Collection<String> geneIds, de.julielab.geneexpbase.genemodel.GeneName geneName)List<String>getPriorityNames(String id, int priority)List<String>getPriorityNames(Collection<String> ids, int priority)Map<String,String>getPriorityNamesMap(Collection<String> ids, int priority)de.julielab.geneexpbase.scoring.ScorergetScorer()StringgetScorerInfo()intgetScorerType()org.apache.lucene.search.spell.SpellCheckergetSpellingChecker()List<String>getSynonyms(String id)de.julielab.geneexpbase.scoring.TFIDFScorergetTFIDFOnGeneRecordNames()static AtomicLonggetTotalCacheGettime()static AtomicLonggetTotalCachePuttime()static AtomicLonggetTotalLuceneQueryTime()StringmapGeneIdToTaxId(String geneId)List<de.julielab.geneexpbase.candidateretrieval.SynHit>scoreIdsByBoWSynonyms(Collection<String> allSynonyms, Set<String> ids, de.julielab.geneexpbase.candidateretrieval.QueryGenerator qg)List<de.julielab.geneexpbase.candidateretrieval.SynHit>scoreIdsByExactSynonyms(Collection<String> allSynonyms, Set<String> geneIds)List<de.julielab.geneexpbase.candidateretrieval.SynHit>scoreIdsByNGramSynonyms(String synonymsString, Set<String> geneIds)org.apache.commons.lang3.tuple.Pair<Map<String,Double>,Map<String,Set<String>>>scoreSynonymsRecordIndex(String queryType, Map<String,Collection<de.julielab.geneexpbase.genemodel.GeneName>> ids2entities, Function<GeneRecordHit,String[]> synhit2namesFunc, de.julielab.geneexpbase.candidateretrieval.QueryGenerator qg)org.apache.commons.lang3.tuple.Pair<Map<String,Double>,Map<String,List<String>>>scoreSynonymsRecordIndex(Collection<de.julielab.geneexpbase.genemodel.GeneName> allSynonyms, Set<String> ids, Function<GeneRecordHit,String[]> synhit2namesFunc, de.julielab.geneexpbase.candidateretrieval.QueryGenerator qg)Scores each synonym in allSynonym against the IDs in ids.voidsetFulltextFieldsToRecordHits(Collection<? extends de.julielab.geneexpbase.candidateretrieval.SynHit> recordHits, Collection<String> recordContextFieldNames)voidsetNormalizer(de.julielab.geneexpbase.TermNormalizer normalizer)de.julielab.geneexpbase.scoring.ScorersetScorerType(int type)static voidshutdownExecutor()
-
-
-
Field Detail
-
CONJUNCTION
public static final de.julielab.geneexpbase.candidateretrieval.QueryGenerator CONJUNCTION
-
DISJUNCTION
public static final de.julielab.geneexpbase.candidateretrieval.QueryGenerator DISJUNCTION
-
DISJUNCTION_MINUS_1
public static final de.julielab.geneexpbase.candidateretrieval.QueryGenerator DISJUNCTION_MINUS_1
-
DISJUNCTION_MINUS_2
public static final de.julielab.geneexpbase.candidateretrieval.QueryGenerator DISJUNCTION_MINUS_2
-
NGRAM_2_3
public static final de.julielab.geneexpbase.candidateretrieval.QueryGenerator NGRAM_2_3
-
NAME_PRIO_DELIMITER
public static final String NAME_PRIO_DELIMITER
- See Also:
- Constant Field Values
-
LOGGER_NAME_CANDIDATES
public static final String LOGGER_NAME_CANDIDATES
- See Also:
- Constant Field Values
-
SIMPLE_SCORER
public static final int SIMPLE_SCORER
- See Also:
- Constant Field Values
-
TOKEN_JAROWINKLER_SCORER
public static final int TOKEN_JAROWINKLER_SCORER
- See Also:
- Constant Field Values
-
MAXENT_SCORER
public static final int MAXENT_SCORER
- See Also:
- Constant Field Values
-
JAROWINKLER_SCORER
public static final int JAROWINKLER_SCORER
- See Also:
- Constant Field Values
-
LEVENSHTEIN_SCORER
public static final int LEVENSHTEIN_SCORER
- See Also:
- Constant Field Values
-
TFIDF
public static final int TFIDF
- See Also:
- Constant Field Values
-
LUCENE_SCORER
public static final int LUCENE_SCORER
- See Also:
- Constant Field Values
-
MAXENT_SCORER_MODEL
public static final String MAXENT_SCORER_MODEL
default model for MaxEntScorer- See Also:
- Constant Field Values
-
candidateLog
public static final org.slf4j.Logger candidateLog
-
LUCENE_MAX_HITS
public static final int LUCENE_MAX_HITS
the maximal number of hits lucene returns for a query- See Also:
- Constant Field Values
-
-
Constructor Detail
-
NameCentricRetrieval
@Inject public NameCentricRetrieval(Configuration config, de.julielab.geneexpbase.services.CacheService cacheService) throws de.julielab.geneexpbase.candidateretrieval.GeneCandidateRetrievalException
- Throws:
de.julielab.geneexpbase.candidateretrieval.GeneCandidateRetrievalException
-
-
Method Detail
-
getTotalCacheGettime
public static AtomicLong getTotalCacheGettime()
-
getTotalCachePuttime
public static AtomicLong getTotalCachePuttime()
-
getTotalLuceneQueryTime
public static AtomicLong getTotalLuceneQueryTime()
-
shutdownExecutor
public static void shutdownExecutor()
-
getNormalizer
public de.julielab.geneexpbase.TermNormalizer getNormalizer()
-
setNormalizer
public void setNormalizer(de.julielab.geneexpbase.TermNormalizer normalizer)
-
getScorer
public de.julielab.geneexpbase.scoring.Scorer getScorer()
-
getSpellingChecker
public org.apache.lucene.search.spell.SpellChecker getSpellingChecker()
- Specified by:
getSpellingCheckerin interfaceCandidateRetrieval
-
setScorerType
public de.julielab.geneexpbase.scoring.Scorer setScorerType(int type) throws de.julielab.geneexpbase.candidateretrieval.GeneCandidateRetrievalException- Throws:
de.julielab.geneexpbase.candidateretrieval.GeneCandidateRetrievalException
-
getScorerInfo
public String getScorerInfo()
-
getScorerType
public int getScorerType()
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(String originalSearchTerm, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
- Specified by:
getCandidatesin interfaceCandidateRetrieval
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
- Specified by:
getCandidatesin interfacede.julielab.geneexpbase.candidateretrieval.CandidateRetrieval
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, Collection<String> organisms, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
- Specified by:
getCandidatesin interfacede.julielab.geneexpbase.candidateretrieval.CandidateRetrieval
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, Collection<String> geneIdsFilter, Collection<String> organisms, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
- Specified by:
getCandidatesin interfacede.julielab.geneexpbase.candidateretrieval.CandidateRetrieval
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, Collection<String> geneIdsFilter, Collection<String> organisms, boolean loadFields, de.julielab.geneexpbase.configuration.Parameters parameters, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
- Specified by:
getCandidatesin interfaceCandidateRetrieval
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, Collection<String> geneIdsFilter, Collection<String> organisms, boolean loadFields, de.julielab.geneexpbase.configuration.Parameters parameters, int numReturnedHits, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(String geneMentionText, Collection<String> geneIdsFilter, Collection<String> organism, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
- Specified by:
getCandidatesin interfaceCandidateRetrieval
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(String geneMentionText, Collection<String> geneIdsFilter, Collection<String> organism, boolean loadFields, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(de.julielab.geneexpbase.genemodel.GeneMention geneMention, String organism, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
- Specified by:
getCandidatesin interfaceCandidateRetrieval
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(String geneMentionText, String organism, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
- Specified by:
getCandidatesin interfaceCandidateRetrieval
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(String geneMentionText, Collection<String> organism, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
- Specified by:
getCandidatesin interfaceCandidateRetrieval
-
mapGeneIdToTaxId
public String mapGeneIdToTaxId(String geneId)
- Specified by:
mapGeneIdToTaxIdin interfacede.julielab.geneexpbase.candidateretrieval.CandidateRetrieval
-
getIndexEntries
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getIndexEntries(List<String> ids) throws IOException
- Throws:
IOException
-
getIndexRecords
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getIndexRecords(Collection<String> ids) throws IOException
The record-index equivalent togetIndexEntries(List). ReturnsGeneRecordHitinstances with all fields loaded.- Parameters:
ids-- Returns:
- Throws:
IOException
-
scoreIdsByNGramSynonyms
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> scoreIdsByNGramSynonyms(String synonymsString, Set<String> geneIds)
-
scoreIdsByBoWSynonyms
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> scoreIdsByBoWSynonyms(Collection<String> allSynonyms, Set<String> ids, de.julielab.geneexpbase.candidateretrieval.QueryGenerator qg)
- Specified by:
scoreIdsByBoWSynonymsin interfaceCandidateRetrieval
-
getCandidates
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getCandidates(de.julielab.geneexpbase.genemodel.GeneMention gm, Collection<String> taxId, de.julielab.geneexpbase.configuration.Parameters parameters, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
- Specified by:
getCandidatesin interfaceCandidateRetrieval
-
scoreSynonymsRecordIndex
public org.apache.commons.lang3.tuple.Pair<Map<String,Double>,Map<String,List<String>>> scoreSynonymsRecordIndex(Collection<de.julielab.geneexpbase.genemodel.GeneName> allSynonyms, Set<String> ids, Function<GeneRecordHit,String[]> synhit2namesFunc, de.julielab.geneexpbase.candidateretrieval.QueryGenerator qg)
Scores each synonym in allSynonym against the IDs in ids.
Each resulting SynHit adds its mention score to the ID represented by this SynHit.
- Parameters:
allSynonyms-ids-qg-- Returns:
-
scoreIdsByExactSynonyms
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> scoreIdsByExactSynonyms(Collection<String> allSynonyms, Set<String> geneIds)
-
getPriorityNamesMap
public Map<String,String> getPriorityNamesMap(Collection<String> ids, int priority)
-
getPriorityNames
public List<String> getPriorityNames(Collection<String> ids, int priority) throws IOException
- Throws:
IOException
-
close
public void close()
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCandidateRetrieval- Specified by:
closein interfaceCloseable
-
getFamilyNames
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getFamilyNames(de.julielab.geneexpbase.genemodel.GeneMention gm, de.julielab.geneexpbase.candidateretrieval.QueryGenerator queryGenerator)
Description copied from interface:CandidateRetrievalSearches the index for the given gene mention filtered for family names.- Specified by:
getFamilyNamesin interfaceCandidateRetrieval- Parameters:
gm- The gene mention to check for family names.queryGenerator- The query generator to use.- Returns:
-
getOriginalNamesIndexRecords
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getOriginalNamesIndexRecords(Collection<String> geneIds, de.julielab.geneexpbase.genemodel.GeneName geneName)
- Specified by:
getOriginalNamesIndexRecordsin interfaceCandidateRetrieval
-
getOriginalNamesIndexRecords
public List<de.julielab.geneexpbase.candidateretrieval.SynHit> getOriginalNamesIndexRecords(Collection<String> geneIds)
- Specified by:
getOriginalNamesIndexRecordsin interfaceCandidateRetrieval
-
getTFIDFOnGeneRecordNames
public de.julielab.geneexpbase.scoring.TFIDFScorer getTFIDFOnGeneRecordNames()
- Specified by:
getTFIDFOnGeneRecordNamesin interfaceCandidateRetrieval
-
setFulltextFieldsToRecordHits
public void setFulltextFieldsToRecordHits(Collection<? extends de.julielab.geneexpbase.candidateretrieval.SynHit> recordHits, Collection<String> recordContextFieldNames)
- Specified by:
setFulltextFieldsToRecordHitsin interfaceCandidateRetrieval
-
scoreSynonymsRecordIndex
public org.apache.commons.lang3.tuple.Pair<Map<String,Double>,Map<String,Set<String>>> scoreSynonymsRecordIndex(String queryType, Map<String,Collection<de.julielab.geneexpbase.genemodel.GeneName>> ids2entities, Function<GeneRecordHit,String[]> synhit2namesFunc, de.julielab.geneexpbase.candidateretrieval.QueryGenerator qg)
- Specified by:
scoreSynonymsRecordIndexin interfaceCandidateRetrieval
-
getGeneRecords
public Set<GeneRecordHit> getGeneRecords(Collection<String> ids)
-
-