Class SleighAssemblerBuilder
- java.lang.Object
-
- ghidra.app.plugin.assembler.sleigh.SleighAssemblerBuilder
-
- All Implemented Interfaces:
AssemblerBuilder
public class SleighAssemblerBuilder extends java.lang.Object implements AssemblerBuilder
AnAssemblerBuilder
capable of supporting almost anySleighLanguage
To build an assembler, please use a static method of theAssemblers
class. SLEIGH-based assembly is a bit of an experimental feature at this time. Nevertheless, it seems to have come along quite nicely. It's not quite as fast as disassembly, since after all, that's what SLEIGH was designed to do. Overall, the method is fairly simple, though its implementation is a bit more complex. First, we gather every pair of pattern and constructor by traversing the decision tree used by disassembly. We then use the "print pieces" to construct a context-free grammar. Each production is associated with the one-or-more constructors with the same sequence of print pieces. We then build a LALR(1) parser for the generated grammar. This now constitutes a generic parser for the given language. Note that this step takes some time, and may be better suited as a build-time step. Because SLEIGH specifications are not generally concerned with eliminating ambiguity of printed instructions (rather, it only does so for instruction bytes), we must consider that the grammar could be ambiguous. To handle this, the action/goto table is permitted multiple entries per cell, and we allow backtracking. There are also cases where tokens are not actually separated by spaces. For example, in theia.sinc
file, there is JMP ... and J^cc, meaning, the lexer must consider J as a token as well as JMP, introducing another source of possible backtracking. Despite that, parsing is completed fairly quickly. To assemble, we first parse the textual instruction, yielding zero or more parse trees. No parse trees implies an error. For each parse tree, we attempt to resolve the instruction bytes, starting at the leaves and working upwards while tracking and solving context changes. The context changes must be considered in reverse. We read the context register of the children (a disassembler would write). We then assume there is at most one variable in the expression, solve for it, and write the solution to the appropriate field (a disassembler would read). If no solution exists, a semantic error is logged. Since it's possible a production in the parse tree is associated with multiple constructors, different combinations of constructors are explored as we move upward in the tree. If all possible combinations yield semantic errors, then the overall result is an error. Some productions are "purely recursive," e.g.,:^instruction
lines in the SLEIGH. These are ignored during parser construction. Let such a production be given as I => I. When resolving the parse tree to bytes, and we encounter a production with I on the left hand side, we then consider the possible application of the production I => I and its consequential constructors. Ideally, we could repeat this indefinitely, stopping when all further applications result in semantic errors; however, there is no guarantee in the SLEIGH specification that such an algorithm will actually halt, so a maximum number (default of 1) of applications are attempted. After all the context changes and operands are resolved, we apply the constructor patterns and proceed up the tree. Thus, each branch yields zero or more "resolved constructors," which each specify two masked blocks of data: one for the instruction, and one for the context. These are passed up to the parent production, which, having obtained results from all its children, attempts to apply the corresponding constructors. Once we've resolved the root node, any resolved constructors returned are taken as successfully assembled instruction bytes. If applicable, the corresponding context registers are compared to the context at the target address in the program and filtered for compatibility.
-
-
Field Summary
Fields Modifier and Type Field Description protected java.util.Map<java.lang.String,AssemblySymbol>
builtSymbols
protected AssemblyContextGraph
ctxGraph
protected static DbgTimer
dbg
protected AssemblyDefaultContext
defaultContext
protected boolean
generated
protected AssemblyGrammar
grammar
protected SleighLanguage
lang
protected AssemblyParser
parser
-
Constructor Summary
Constructors Constructor Description SleighAssemblerBuilder(SleighLanguage lang)
Construct an assembler builder for the given SLEIGH language
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
buildContext()
Build the default context for the languageprotected void
buildContextGraph()
Build the context transition graph for the languageprotected void
buildGrammar()
Build the full grammar for the languageprotected void
buildParser()
Build the parser for the languageprotected AssemblyGrammar
buildSubGrammar(SubtableSymbol subtable)
Build a portion of the grammar representing a table of constructorsprotected void
generateAssembler()
Do the actual work to construct an assembler from a SLEIGH languageSleighAssembler
getAssembler(AssemblySelector selector)
Build an assembler with the given selector callbackSleighAssembler
getAssembler(AssemblySelector selector, Program program)
Build an assembler with the given selector callback and program bindingprotected int
getBitSize(Constructor cons, OperandSymbol opsym)
Obtain the size in bits of a textual operand.protected AssemblyGrammar
getGrammar()
Get the built grammar for the languageSleighLanguage
getLanguage()
Get the language for which this instance builds an assemblerLanguageID
getLanguageID()
Get the ID of the language for which this instance builds an assemblerprotected AssemblyParser
getParser()
Get the built parser for the languageprotected AssemblySymbol
getSymbolFor(Constructor cons, OperandSymbol opsym)
Convert the given operand symbol to anAssemblySymbol
For subtables, this results in a non-terminal, for all others, the result in a terminal.protected org.apache.commons.collections4.MultiValuedMap<java.lang.String,java.lang.Integer>
invNameSymbol(NameSymbol ns)
Invert a name table to a map suitable for use withAssemblyStringMapTerminal
protected java.util.Map<java.lang.Long,java.lang.Integer>
invValueMap(ValueMapSymbol vm)
Invert a value map to a map suitable for use withAssemblyNumericMapTerminal
protected org.apache.commons.collections4.MultiValuedMap<java.lang.String,java.lang.Integer>
invVarnodeList(VarnodeListSymbol vnlist)
Invert a varnode list to a map suitable for use withAssemblyStringMapTerminal
-
-
-
Field Detail
-
dbg
protected static final DbgTimer dbg
-
lang
protected SleighLanguage lang
-
grammar
protected AssemblyGrammar grammar
-
defaultContext
protected AssemblyDefaultContext defaultContext
-
ctxGraph
protected AssemblyContextGraph ctxGraph
-
parser
protected AssemblyParser parser
-
generated
protected boolean generated
-
builtSymbols
protected java.util.Map<java.lang.String,AssemblySymbol> builtSymbols
-
-
Constructor Detail
-
SleighAssemblerBuilder
public SleighAssemblerBuilder(SleighLanguage lang)
Construct an assembler builder for the given SLEIGH language- Parameters:
lang
- the language
-
-
Method Detail
-
generateAssembler
protected void generateAssembler() throws SleighException
Do the actual work to construct an assembler from a SLEIGH language- Throws:
SleighException
- if there's an issue accessing the language
-
getLanguageID
public LanguageID getLanguageID()
Description copied from interface:AssemblerBuilder
Get the ID of the language for which this instance builds an assembler- Specified by:
getLanguageID
in interfaceAssemblerBuilder
- Returns:
- the language ID
-
getLanguage
public SleighLanguage getLanguage()
Description copied from interface:AssemblerBuilder
Get the language for which this instance builds an assembler- Specified by:
getLanguage
in interfaceAssemblerBuilder
- Returns:
- the language
-
getAssembler
public SleighAssembler getAssembler(AssemblySelector selector)
Description copied from interface:AssemblerBuilder
Build an assembler with the given selector callback- Specified by:
getAssembler
in interfaceAssemblerBuilder
- Parameters:
selector
- the selector callback- Returns:
- the built assembler
-
getAssembler
public SleighAssembler getAssembler(AssemblySelector selector, Program program)
Description copied from interface:AssemblerBuilder
Build an assembler with the given selector callback and program binding- Specified by:
getAssembler
in interfaceAssemblerBuilder
- Parameters:
selector
- the selector callbackprogram
- the bound program- Returns:
- the built assembler
-
invVarnodeList
protected org.apache.commons.collections4.MultiValuedMap<java.lang.String,java.lang.Integer> invVarnodeList(VarnodeListSymbol vnlist)
Invert a varnode list to a map suitable for use withAssemblyStringMapTerminal
- Parameters:
vnlist
- the varnode list symbol- Returns:
- the inverted string map
-
invValueMap
protected java.util.Map<java.lang.Long,java.lang.Integer> invValueMap(ValueMapSymbol vm)
Invert a value map to a map suitable for use withAssemblyNumericMapTerminal
- Parameters:
vm
- the value map symbol- Returns:
- the inverted numeric map
-
invNameSymbol
protected org.apache.commons.collections4.MultiValuedMap<java.lang.String,java.lang.Integer> invNameSymbol(NameSymbol ns)
Invert a name table to a map suitable for use withAssemblyStringMapTerminal
- Parameters:
ns
- the name symbol- Returns:
- the inverted string map
-
getSymbolFor
protected AssemblySymbol getSymbolFor(Constructor cons, OperandSymbol opsym)
Convert the given operand symbol to anAssemblySymbol
For subtables, this results in a non-terminal, for all others, the result in a terminal.- Parameters:
cons
- the constructor to which the operand belongsopsym
- the operand symbol to convert- Returns:
- the converted assembly grammar symbol
-
getBitSize
protected int getBitSize(Constructor cons, OperandSymbol opsym)
Obtain the size in bits of a textual operand. This is a little odd, since the variables in pattern expressions do not have an explicit size. However, the value exported by a constructor's pCode may have an explicit size given (in bytes). Thus, there is a special case, where a constructor prints just one operand and exports that same operand with an explicit size. In that case, the size of the operand is printed according to that exported size. For disassembly, this information is used simply to truncate the bits before they are displayed. For assembly, we must do two things: 1) Ensure that the provided value fits in the given size, and 2) Mask the goal when solving the pattern expression for the operand.- Parameters:
cons
- the constructor from which the production is being derivedopsym
- the operand symbol corresponding to the grammatical symbol, whose size we wish to determine.- Returns:
- the size of the operand in bits
-
buildSubGrammar
protected AssemblyGrammar buildSubGrammar(SubtableSymbol subtable)
Build a portion of the grammar representing a table of constructors- Parameters:
subtable
- the table- Returns:
- the partial grammar
-
buildGrammar
protected void buildGrammar()
Build the full grammar for the language
-
buildContext
protected void buildContext()
Build the default context for the language
-
buildContextGraph
protected void buildContextGraph()
Build the context transition graph for the language
-
buildParser
protected void buildParser()
Build the parser for the language
-
getGrammar
protected AssemblyGrammar getGrammar()
Get the built grammar for the language- Returns:
- the grammar
-
getParser
protected AssemblyParser getParser()
Get the built parser for the language- Returns:
- the parser
-
-