Package ghidra.program.model.correlate
Class HashStore
- java.lang.Object
-
- ghidra.program.model.correlate.HashStore
-
public class HashStore extends java.lang.Object
HashStore is a sorted, basic-block aware, store for Instruction "n-grams" to help quickly match similar sequences of Instructions between two functions. The Instructions comprising a single n-gram are hashed for quick lookup by the main matching algorithm (HashedFunctionAddressCorrelation). Hash diversity is important to minimize collisions, even though the number of hashes calculated for a single function pair match is small. Hashes are built and sorted respectively using the calcHashes() and insertHashes() methods. The main sort is on the number of collisions for a hash (indicating that there are duplicate or near duplicate instruction sequences), the hashes with fewer (or no) duplicates come first. The secondary sort is on "n", the number of Instructions in the n-gram, which effectively describes the significance of the match, or how unlikely the match is to occur at random. The main matching algorithm effectively creates a HashSort for both functions, and then in a loop calls hash = getFirstEntry() on one side to get the most significant possible match getEntry(has) to see if there is a matching n-gram on the other side If there is a match it is declared to the sort with the matchHash() call, allowing overlapping n-grams to be removed and deconflicting information to be updated. If there is no match, hashes can be removed with the removeHash() method to allow new hashes to move to the top of the sort. The store uses a couple of methods to help deconflict very similar sequences of instructions within the same function. Primarily, the sort is basic-block aware. All n-grams are contained within a single basic block, and when an initial match is found, hashes for other n-grams within that block (and its matching block on the other side) are modified so that n-grams within that block pair can only match each other.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
HashStore.NgramMatch
Class explicitly labeling (one-side of) a matching n-gram pair.
-
Constructor Summary
Constructors Constructor Description HashStore(Function a, TaskMonitor mon)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
calcHashes(int minLength, int maxLength, boolean wholeBlock, boolean matchOnly, HashCalculator hashCalc)
Calculate hashes for all blocksvoid
clearSort()
Clear the main sort structures, but preserve blocks and instructionsstatic void
extendMatch(int nGramSize, InstructHash srcInstruct, HashStore.NgramMatch srcMatch, InstructHash destInstruct, HashStore.NgramMatch destMatch, HashCalculator hashCalc)
Try to extend a match on a pair of n-grams to the Instructions right before and right after the n-gram.Block
getBlock(Address addr)
Get the basic-block with the corresponding start AddressHashEntry
getEntry(Hash hash)
Get the HashEntry corresponding to a given hashHashEntry
getFirstEntry()
TaskMonitor
getMonitor()
int
getTotalInstructions()
java.util.List<Instruction>
getUnmatchedInstructions()
void
insertHashes()
Insert all hashes associated with unknown (i.e not matched) blocks and instructionsboolean
isEmpty()
void
matchHash(HashStore.NgramMatch match, java.util.List<Instruction> instResult, java.util.List<CodeBlock> blockResult)
Mark a particular n-gram hash and instruction as having a match.int
numMatchedInstructions()
void
removeHash(HashEntry hashEntry)
Remove a particular HashEntry.
-
-
-
Constructor Detail
-
HashStore
public HashStore(Function a, TaskMonitor mon) throws CancelledException
- Throws:
CancelledException
-
-
Method Detail
-
getTotalInstructions
public int getTotalInstructions()
- Returns:
- total number of Instructions in the whole function
-
numMatchedInstructions
public int numMatchedInstructions()
- Returns:
- number of instructions that have been matched so far
-
removeHash
public void removeHash(HashEntry hashEntry)
Remove a particular HashEntry. This may affect multiple instructions.- Parameters:
hashEntry
- is the entry
-
calcHashes
public void calcHashes(int minLength, int maxLength, boolean wholeBlock, boolean matchOnly, HashCalculator hashCalc) throws MemoryAccessException
Calculate hashes for all blocks- Parameters:
minLength
- is the minimum length of an n-gram for these passesmaxLength
- is the maximum length of an n-gram for these passeswholeBlock
- if true, allows blocks that are smaller than the minimum length to be considered as 1 n-gram.matchOnly
- if true, only generates n-grams for sequences in previously matched blockshashCalc
- is the hash function- Throws:
MemoryAccessException
-
insertHashes
public void insertHashes()
Insert all hashes associated with unknown (i.e not matched) blocks and instructions
-
matchHash
public void matchHash(HashStore.NgramMatch match, java.util.List<Instruction> instResult, java.util.List<CodeBlock> blockResult)
Mark a particular n-gram hash and instruction as having a match. Set of instructions covered by n-gram are removed, and data structures are updated- Parameters:
match
- is the n-gram being declared as a matchinstResult
- collects the explicit set of Instructions matchedblockResult
- collects the explicit set of CodeBlocks matched
-
extendMatch
public static void extendMatch(int nGramSize, InstructHash srcInstruct, HashStore.NgramMatch srcMatch, InstructHash destInstruct, HashStore.NgramMatch destMatch, HashCalculator hashCalc) throws MemoryAccessException
Try to extend a match on a pair of n-grams to the Instructions right before and right after the n-gram. The match is extended if the Instruction adjacent to the n-gram, and its corresponding pair on the other side, hash to the same value using the hash function. The NgramMatch objects are updated to reflect the original n-gram match plus any additional extension.- Parameters:
nGramSize
- is the original size of the matching n-gram.srcInstruct
- is the first Instruction in the "source" n-gramsrcMatch
- is the "source" NgramMatch object to be populatedestInstruct
- is the first Instruction in the "destination" n-gramdestMatch
- is the "destination" NgramMatch object to populatehashCalc
- is the hash function object- Throws:
MemoryAccessException
-
getUnmatchedInstructions
public java.util.List<Instruction> getUnmatchedInstructions()
- Returns:
- list of unmatched instructions across the whole function
-
clearSort
public void clearSort()
Clear the main sort structures, but preserve blocks and instructions
-
isEmpty
public boolean isEmpty()
- Returns:
- true if there are no n-grams left in the sort
-
getFirstEntry
public HashEntry getFirstEntry()
- Returns:
- the first HashEntry in the sort. The least number of matching n-grams and the biggest n-gram.
-
getEntry
public HashEntry getEntry(Hash hash)
Get the HashEntry corresponding to a given hash- Parameters:
hash
- is the Hash to match- Returns:
- the set of n-grams (HashEntry) matching this hash
-
getBlock
public Block getBlock(Address addr)
Get the basic-block with the corresponding start Address- Parameters:
addr
- is the starting address- Returns:
- the Block object
-
getMonitor
public TaskMonitor getMonitor()
- Returns:
- the TaskMonitor for this store
-
-