You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many compilers generate next to executable files also symbols. These contain additional metadata about the program that is not strictly required for its execution, but help immensely in debugging the target. Information includes for instance mappings from binary code to source code lines, as well as (local) variable names and their types.
While AsmResolver can currently locate PDB files (through CodeView data debug directories in the PE), it is currently impossible to actually interpret PDB files and similar. This proposal is about adding read/write support for files containing these debug symbols, with the idea to modify the general package dependency graph to the following:
New packages are highlighted in green, while modified packages are highlighted in yellow. More packages may be introduced in the future for other types of formats (like MDB, CilDB or DWARF), and hence the package is called AsmResolver.Symbols and not AsmResolver.Pdb. Future packages should follow a similar pattern where they all derive from AsmResolver.Symbols, and have potentially other dependencies to existing packages for their domain. What is important is that the symbol packages can be excluded from the normal executable parsing libraries, such that consumers that don't need to process symbols will not need to include code that they are not going to use.
AsmResolver.Symbols
This package serves as a base package for all packages that are related to dealing with debug symbols, and is meant to unify read access to symbol information from various different file formats. This is done through the means of defining interfaces that are implemented by any of the derivatives of this package.
The exact interfaces are not fully known yet; they will probably emerge once we have a good understanding of what we generally want to extract from symbol files. However, in any case, ultimately it should at least expose the following fundamental types of debugging information:
Global symbols
Local variable symbols
Metadata about compilation units (Files that were used to compile the executable)
Sequence points (Mappings from offset ranges to source code line numbers)
This generalization is especially important for consumers that process .NET binaries, as .NET binaries may use both Windows PDB as well as the new PortablePdb file format.
AsmResolver.Symbols.Pdb
This package is meant for reading and writing PDB files that follow the Windows PDB v7 (and possibly legacy v2) file format. This format is used a lot by compilers targeting the Windows platform (such as MVC++, LLVM-based compilers targeting Windows, and legacy VB and C# compilers).
We will not rely on the official implementations (such as the DIA SDK) by Microsoft. This is important as AsmResolver targets netstandard 2.0 with the intention that it should run on any platform, including non-Windows platforms. The package will therefore be fully responsible for reading and writing PDB files in this format.
A PDB file in this format is stored as a Multi-Stream-Format (MSF) file. Since this is a completely different file format from PE, this package will be almost self containing and not rely on other existing packages in AsmResolver other than AsmResolver.Symbols. This means all models required to represent PDB files using this format will be defined in here, as well as code that implement the parsers and writers for it. Furthermore, the package will be designed in such a way that the PDB models are built on top of the MSF models, similar to how PEImage is built on top of PEFile. This keeps general usage of the API intuitive, while still keeping access to lower level structures, should certain constructions (such as metadata streams for which the exact purpose or format is unknown) be preserved upon rebuilding the file.
AsmResolver.Symbols.PortablePdb and AsmResolver.PE
This package is meant for reading and writing PDB files that follow the new Portable PDB file format introduced in .NET Core, and is now the standard format used by modern VB and C# compilers.
Contrary to AsmResolver.Symbols.Pdb, this package will not contain raw models for the actual PDB file format. This may sound counter intuitive at first, but this is because portable PDB is in fact a direct extension of the .NET PE file format as specified in ECMA-335 itself. In particular, the extension defines mostly new tables within the already existing metadata model used in the tables stream (#~) of the binary, and thus take the form of rows in new metadata tables. As such, to have a more streamlined and consistent API, the reading and writing of the individual models will be directly integrated into AsmResolver.PE (specifically the AsmResolver.PE.DotNet namespace). This should be pretty straight forward. Most of the parsing infrastructure for PE files and metadata streams is already in place. We simply have to define a new PdbStream representing the #Pdb metadata stream, extend types such as TablesStream and TableIndex types, and define new Row structures to represent all new models defined by this format.
The sole purpose of AsmResolver.Symbols.PortablePdb that remains is to implement the interfaces defined by AsmResolver.Symbols, in such a way that AsmResolver.PE is not dependent on AsmResolver.Symbols. This avoids huge amounts of code being pulled in by consumers that want to use AsmResolver.PE or AsmResolver.DotNet but have no need for symbol processing.
Open Design Questions
How do we "integrate" this into AsmResolver.DotNet, if at all? Ideally, we want it to be as easy as possible to attach debug information to IMetadataMembers, but this may introduce tighter coupling that we really want to avoid.
AsmResolver.Symbols
Status
Overview
Many compilers generate next to executable files also symbols. These contain additional metadata about the program that is not strictly required for its execution, but help immensely in debugging the target. Information includes for instance mappings from binary code to source code lines, as well as (local) variable names and their types.
While AsmResolver can currently locate PDB files (through CodeView data debug directories in the PE), it is currently impossible to actually interpret PDB files and similar. This proposal is about adding read/write support for files containing these debug symbols, with the idea to modify the general package dependency graph to the following:
New packages are highlighted in green, while modified packages are highlighted in yellow. More packages may be introduced in the future for other types of formats (like MDB, CilDB or DWARF), and hence the package is called
AsmResolver.Symbols
and notAsmResolver.Pdb
. Future packages should follow a similar pattern where they all derive fromAsmResolver.Symbols
, and have potentially other dependencies to existing packages for their domain. What is important is that the symbol packages can be excluded from the normal executable parsing libraries, such that consumers that don't need to process symbols will not need to include code that they are not going to use.AsmResolver.Symbols
This package serves as a base package for all packages that are related to dealing with debug symbols, and is meant to unify read access to symbol information from various different file formats. This is done through the means of defining interfaces that are implemented by any of the derivatives of this package.
The exact interfaces are not fully known yet; they will probably emerge once we have a good understanding of what we generally want to extract from symbol files. However, in any case, ultimately it should at least expose the following fundamental types of debugging information:
This generalization is especially important for consumers that process .NET binaries, as .NET binaries may use both Windows PDB as well as the new PortablePdb file format.
AsmResolver.Symbols.Pdb
This package is meant for reading and writing PDB files that follow the Windows PDB v7 (and possibly legacy v2) file format. This format is used a lot by compilers targeting the Windows platform (such as MVC++, LLVM-based compilers targeting Windows, and legacy VB and C# compilers).
We will not rely on the official implementations (such as the DIA SDK) by Microsoft. This is important as AsmResolver targets netstandard 2.0 with the intention that it should run on any platform, including non-Windows platforms. The package will therefore be fully responsible for reading and writing PDB files in this format.
A PDB file in this format is stored as a Multi-Stream-Format (MSF) file. Since this is a completely different file format from PE, this package will be almost self containing and not rely on other existing packages in AsmResolver other than
AsmResolver.Symbols
. This means all models required to represent PDB files using this format will be defined in here, as well as code that implement the parsers and writers for it. Furthermore, the package will be designed in such a way that the PDB models are built on top of the MSF models, similar to howPEImage
is built on top ofPEFile
. This keeps general usage of the API intuitive, while still keeping access to lower level structures, should certain constructions (such as metadata streams for which the exact purpose or format is unknown) be preserved upon rebuilding the file.AsmResolver.Symbols.PortablePdb and AsmResolver.PE
This package is meant for reading and writing PDB files that follow the new Portable PDB file format introduced in .NET Core, and is now the standard format used by modern VB and C# compilers.
Contrary to
AsmResolver.Symbols.Pdb
, this package will not contain raw models for the actual PDB file format. This may sound counter intuitive at first, but this is because portable PDB is in fact a direct extension of the .NET PE file format as specified in ECMA-335 itself. In particular, the extension defines mostly new tables within the already existing metadata model used in the tables stream (#~
) of the binary, and thus take the form of rows in new metadata tables. As such, to have a more streamlined and consistent API, the reading and writing of the individual models will be directly integrated intoAsmResolver.PE
(specifically theAsmResolver.PE.DotNet
namespace). This should be pretty straight forward. Most of the parsing infrastructure for PE files and metadata streams is already in place. We simply have to define a newPdbStream
representing the#Pdb
metadata stream, extend types such asTablesStream
andTableIndex
types, and define newRow
structures to represent all new models defined by this format.The sole purpose of
AsmResolver.Symbols.PortablePdb
that remains is to implement the interfaces defined byAsmResolver.Symbols
, in such a way thatAsmResolver.PE
is not dependent onAsmResolver.Symbols
. This avoids huge amounts of code being pulled in by consumers that want to useAsmResolver.PE
orAsmResolver.DotNet
but have no need for symbol processing.Open Design Questions
AsmResolver.DotNet
, if at all? Ideally, we want it to be as easy as possible to attach debug information toIMetadataMember
s, but this may introduce tighter coupling that we really want to avoid.Related
#59
The text was updated successfully, but these errors were encountered: