Class LSHVectorFactory

    • Field Detail

      • settings

        protected int settings
    • Constructor Detail

      • LSHVectorFactory

        public LSHVectorFactory()
    • Method Detail

      • buildZeroVector

        public abstract LSHVector buildZeroVector()
        Generate vector with all coefficients zero.
        Returns:
        the zero vector
      • buildVector

        public abstract LSHVector buildVector​(int[] feature)
        Generate an LSHVector from a feature set, individual features are integer hashes. The integers MUST already be sorted. The same integer can occur more than once in the array (term frequency (TF) > 1). The factory decides internally how to create weights based on term frequency and any knowledge of Inverse Document Frequency (IDF)
        Parameters:
        feature - is the sorted array of integer features
        Returns:
        the newly minted LSHVector
      • restoreVectorFromXml

        public abstract LSHVector restoreVectorFromXml​(XmlPullParser parser)
        Generate an LSHVector based on XML tag seen by pull parser. Factory generates weights based on term frequency info in the XML tag and its internal IDF knowledge
        Parameters:
        parser - is the XML parser
        Returns:
        the newly minted LSHVector
      • restoreVectorFromSql

        public abstract LSHVector restoreVectorFromSql​(java.lang.String sql)
                                                throws java.io.IOException
        Generate an LSHVector based on string returned from SQL query Factory generates weights based on term frequency info in the string and its internal IDF knowledge
        Parameters:
        sql - is the column data string returned by an SQL query
        Returns:
        the newly minted LSHVector
        Throws:
        java.io.IOException
      • set

        public void set​(WeightFactory wFactory,
                        IDFLookup iLookup,
                        int settings)
        Load the factory with weights and the feature map
        Parameters:
        wFactory - is the weight table of IDF and TF weights
        iLookup - is the map from features int the weight table
        settings - is an integer id for this particular weighting scheme
      • isLoaded

        public boolean isLoaded()
        Returns:
        true if this factory has weights and lookup loaded
      • getSignificanceScale

        public double getSignificanceScale()
        Returns:
        the weighttable's significance scale for this factory
      • getSignificanceAddend

        public double getSignificanceAddend()
        Returns:
        the weighttable's significance addend for this factory
      • getSettings

        public int getSettings()
        Returns:
        settings ID used to generate factory's current weights
      • getSelfSignificance

        public double getSelfSignificance​(LSHVector vector)
        Calculate a vector's significance as compared to itself, normalized for this factory's specific weight settings
        Parameters:
        vector - is the LSHVector
        Returns:
        the vector's significance score
      • calculateSignificance

        public double calculateSignificance​(VectorCompare data)
        Given comparison data generated by the LSHVector.compare() method, calculate the significance of any similarity between the two vectors, normalized for this factory's specific weight settings
        Parameters:
        data - is the comparison object produced when comparing two LSHVectors
        Returns:
        the significance score
      • readWeights

        public void readWeights​(XmlPullParser parser)
                         throws org.xml.sax.SAXException
        Read both the weights and the lookup hashes from an XML stream
        Parameters:
        parser - is the XML parser
        Throws:
        org.xml.sax.SAXException