Package generic.lsh.vector
Class LSHVectorFactory
- java.lang.Object
-
- generic.lsh.vector.LSHVectorFactory
-
- Direct Known Subclasses:
WeightedLSHCosineVectorFactory
public abstract class LSHVectorFactory extends java.lang.Object
-
-
Field Summary
Fields Modifier and Type Field Description protected IDFLookup
idfLookup
protected int
settings
protected WeightFactory
weightFactory
-
Constructor Summary
Constructors Constructor Description LSHVectorFactory()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract LSHVector
buildVector(int[] feature)
Generate an LSHVector from a feature set, individual features are integer hashes.abstract LSHVector
buildZeroVector()
Generate vector with all coefficients zero.double
calculateSignificance(VectorCompare data)
Given comparison data generated by the LSHVector.compare() method, calculate the significance of any similarity between the two vectors, normalized for this factory's specific weight settingsdouble
getSelfSignificance(LSHVector vector)
Calculate a vector's significance as compared to itself, normalized for this factory's specific weight settingsint
getSettings()
double
getSignificanceAddend()
double
getSignificanceScale()
boolean
isLoaded()
void
readWeights(XmlPullParser parser)
Read both the weights and the lookup hashes from an XML streamabstract LSHVector
restoreVectorFromSql(java.lang.String sql)
Generate an LSHVector based on string returned from SQL query Factory generates weights based on term frequency info in the string and its internal IDF knowledgeabstract LSHVector
restoreVectorFromXml(XmlPullParser parser)
Generate an LSHVector based on XML tag seen by pull parser.void
set(WeightFactory wFactory, IDFLookup iLookup, int settings)
Load the factory with weights and the feature map
-
-
-
Field Detail
-
weightFactory
protected WeightFactory weightFactory
-
idfLookup
protected IDFLookup idfLookup
-
settings
protected int settings
-
-
Method Detail
-
buildZeroVector
public abstract LSHVector buildZeroVector()
Generate vector with all coefficients zero.- Returns:
- the zero vector
-
buildVector
public abstract LSHVector buildVector(int[] feature)
Generate an LSHVector from a feature set, individual features are integer hashes. The integers MUST already be sorted. The same integer can occur more than once in the array (term frequency (TF) > 1). The factory decides internally how to create weights based on term frequency and any knowledge of Inverse Document Frequency (IDF)- Parameters:
feature
- is the sorted array of integer features- Returns:
- the newly minted LSHVector
-
restoreVectorFromXml
public abstract LSHVector restoreVectorFromXml(XmlPullParser parser)
Generate an LSHVector based on XML tag seen by pull parser. Factory generates weights based on term frequency info in the XML tag and its internal IDF knowledge- Parameters:
parser
- is the XML parser- Returns:
- the newly minted LSHVector
-
restoreVectorFromSql
public abstract LSHVector restoreVectorFromSql(java.lang.String sql) throws java.io.IOException
Generate an LSHVector based on string returned from SQL query Factory generates weights based on term frequency info in the string and its internal IDF knowledge- Parameters:
sql
- is the column data string returned by an SQL query- Returns:
- the newly minted LSHVector
- Throws:
java.io.IOException
-
set
public void set(WeightFactory wFactory, IDFLookup iLookup, int settings)
Load the factory with weights and the feature map- Parameters:
wFactory
- is the weight table of IDF and TF weightsiLookup
- is the map from features int the weight tablesettings
- is an integer id for this particular weighting scheme
-
isLoaded
public boolean isLoaded()
- Returns:
- true if this factory has weights and lookup loaded
-
getSignificanceScale
public double getSignificanceScale()
- Returns:
- the weighttable's significance scale for this factory
-
getSignificanceAddend
public double getSignificanceAddend()
- Returns:
- the weighttable's significance addend for this factory
-
getSettings
public int getSettings()
- Returns:
- settings ID used to generate factory's current weights
-
getSelfSignificance
public double getSelfSignificance(LSHVector vector)
Calculate a vector's significance as compared to itself, normalized for this factory's specific weight settings- Parameters:
vector
- is the LSHVector- Returns:
- the vector's significance score
-
calculateSignificance
public double calculateSignificance(VectorCompare data)
Given comparison data generated by the LSHVector.compare() method, calculate the significance of any similarity between the two vectors, normalized for this factory's specific weight settings- Parameters:
data
- is the comparison object produced when comparing two LSHVectors- Returns:
- the significance score
-
readWeights
public void readWeights(XmlPullParser parser) throws org.xml.sax.SAXException
Read both the weights and the lookup hashes from an XML stream- Parameters:
parser
- is the XML parser- Throws:
org.xml.sax.SAXException
-
-