Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sail configuration system #865

Open
wants to merge 9 commits into
base: sail2
Choose a base branch
from
Open

Sail configuration system #865

wants to merge 9 commits into from

Conversation

Alasdair
Copy link
Collaborator

@Alasdair Alasdair commented Jan 7, 2025

This commit introduces a method for configuring Sail models using a JSON configuration file. It introduces a new construct into the language:

config a.b.c : T

which gets the value of the json object a.b.c in the configuration file and interprets it as the Sail type T

Configuration options can be instantiated at compile time using

sail --config c.json <files>

which is useful for Sail targets like SMT or SystemVerilog, or they can be accessed at runtime, which allows for building e.g. a configurable emulator.

Copy link

github-actions bot commented Jan 7, 2025

Test Results

   12 files  ±  0     25 suites  +1   0s ⏱️ ±0s
  760 tests + 12    760 ✅ + 12  0 💤 ±0  0 ❌ ±0 
2 747 runs  +256  2 747 ✅ +256  0 💤 ±0  0 ❌ ±0 

Results for commit b0e90d3. ± Comparison against base commit cf168b5.

♻️ This comment has been updated with latest results.

/*
* This file sets the runtime JSON config file
*/
void sail_config_set_file(const char *path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done without global state and instead passing a sail_json struct to each of the functions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, but that would break existing users of the generated C. It could be an optional thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was just thinking about potentially instantiating multiple instances of the same model with different config options (e.g. for fuzzing) which is not possible if there is only one global. But that can always be added as an incremental change. I'm not sure if this has any impact on @Timmmm's approach of using C++ classes/namespaces for different model instances.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think removing global state is probably an orthogonal thing (which would be nice to fix one day!).

For this I will just do the same thing I do for the existing global callbacks - copy & paste them into a base class of the model. It's like this:

struct ModelRuntime {
    explicit ModelRuntime(Architecture arch);
...
    void setup_rts();
    void cleanup_rts();
...
    unit load_reservation(mach_bits addr);
    bool match_reservation(mach_bits addr) const;
    unit cancel_reservation(unit u);
...
};

namespace {
struct RiscvModel : public ModelRuntime {
    RiscvModel() : ModelRuntime(Architecture::RV32) {
    }
#include "riscv_model_rv32.c"
};
} // namespace

I will just reimplement all these sail_config_ functions in ModelRuntime. Probably not the best long term solution but I think it's fine for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think there are some nicer C++ json libraries you could wrap and include too. I opted for cJSON for now because it's easy to vendor, but it doesn't do exactly what I want (expose the raw JSON string data for integers so I can just pass them to the GMP parse functions and support arbitrary precision). I'm trying to resist the urge to write my own JSON parser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think https://github.com/nlohmann/json is one of the most popular ones and looks to have a decently nice API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah nlohmann is nice but it also doesn't support arbitrary precision integers.

I'm trying to resist the urge to write my own JSON parser.

Ha, well I definitely wouldn't fancy doing that in C! One of my colleagues wrote one in C++ that supports uint64. I don't really understand why they didn't just use nlohmann but it has been surprisingly nice to have uint64s - especially because we generate the JSON with Python and it is very happy to output any size integer.

@Alasdair Alasdair force-pushed the config branch 5 times, most recently from f997f2b to 241e60f Compare January 8, 2025 11:46
Copy link
Contributor

@Timmmm Timmmm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't read the internal compiler code but this looks good. Some other questions:

  1. It doesn't seem like you can read the config as arrays/lists? That would be really helpful for PMAs. Currently we have this nonsense:
val init_pma_regions : unit -> unit
function init_pma_regions() = {
  // Clear regions so this is idempotent.
  pma_regions = [||];

  // Iterate backwards because pma_regions is a singly linked list that we
  // prepend to.
  foreach (i from (plat_pma_count() - 1) downto 0) {
    let attr : PMA = struct {
      cacheable                        = plat_pma_cacheable(i),
      coherent                         = plat_pma_coherent(i),
      executable                       = plat_pma_executable(i),
      readable                         = plat_pma_readable(i),
      writable                         = plat_pma_writable(i),
      readIdempotent                   = plat_pma_readIdempotent(i),
      writeIdempotent                  = plat_pma_writeIdempotent(i),
      relaxed                          = plat_pma_relaxed(i),
      misalignmentCausesAccessFault    = plat_pma_misalignmentCausesAccessFault(i),
      misalignmentCausesAlignmentFault = plat_pma_misalignmentCausesAlignmentFault(i),
      atomicSupport                    = plat_pma_atomicSupport(i),
      reservability                    = plat_pma_reservability(i),
      taggable                         = plat_pma_taggable(i),
      untaggableCausesAccessFault      = plat_pma_untaggableCausesAccessFault(i),
      supportsCboZero                  = plat_pma_supportsCboZero(i),
    };

    assert(not(attr.misalignmentCausesAccessFault & attr.misalignmentCausesAlignmentFault));
    assert(not(attr.taggable & attr.untaggableCausesAccessFault));

    let region : PMA_Region = struct {
      base       = physaddr(plat_pma_address(i)),
      size       = plat_pma_size(i),
      attributes = attr,
    };
    pma_regions = region :: pma_regions;
  };
}
  1. How would you recommend dealing with enums? E.g. plat_pma_atomicSupport() above returns
enum AtomicSupport = { AMONone, AMOSwap, AMOLogical, AMOArithmetic }

I think it would be easy to use strings - we already have

mapping atomic_support_name : AtomicSupport <-> string = {
  AMONone       <-> "AMONone",
  AMOSwap       <-> "AMOSwap",
  AMOLogical    <-> "AMOLogical",
  AMOArithmetic <-> "AMOArithmetic",
}

so could do:

   ... = config atomic_support_name(pma[i].atomic_support)

which is quite nice, but I vaguely recall you said something about the Sail compiler ensuring that string operations don't affect execution or something? Presumably that wouldn't work nicely with the SVA/SMT backends?

The alternative would be an integer mapping, but then that makes the JSON not very robust or readable.


void sail_config_unwrap_int(sail_int *n, const sail_config_json config)
{
char *str = cJSON_GetStringValue((cJSON *)config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it this means the JSON never uses JSON numbers - only strings of numbers?

I guess that makes sense, especially because cjson's number handling looks whack (silently clamping to INT_MAX???), but it's a slight shame. The JSON spec doesn't actually put a limit on integer size so we theoretically could support arbitrary precision integers "natively". Although I guess that might make it hard to generate them from some programming languages.... And it would require using a better parser. Hmmm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although I probably do need to use integers for the JSON schema output to work correctly. So maybe I will need to find a better parser.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Sail flag --config to statically instantiate the config does support arbitrary precision integers.

@@ -38,6 +38,9 @@ def test_c(name, c_opts, sail_opts, valgrind, compiler='cc'):
basename = os.path.splitext(os.path.basename(filename))[0]
tests[filename] = os.fork()
if tests[filename] == 0:
if basename.startswith('config'):
sail_opts += ' --c-include sail_config.h'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be included automatically?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, the idea is to avoid breaking existing Makefiles. To include it by default everything in lib/json would need to be in lib but that would make it harder to swap out the JSON implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok fair enough. I guess this is one of those things where a --target-version, or --edition type flag would help.

 |
 | fail/config_mismatch.sail:8.18-28:
 | 8 | print_endline(config a.b);
 |  | ^--------^ previous type found here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it infers the type of a config foo.bar from the context the first time you use it?

Copy link
Collaborator Author

@Alasdair Alasdair Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any type used as a configuration option, the requirement will be that it must be well-typed in the global typing context (so it cannot be dependent on runtime values in functions), and it should correspond to some precise JSON schema definition.

For the case where the configuration option is used twice as config x : T and config x : U, there are essentially two options.

  • Globally keep track of the most specific type we've seen. So if we see T first then when encountering U we check if T < U or U < T and remember the more specific type. If T and U are incomparable, then the error in that tests gets raised (this is what is currently implemented)
  • Keep track of all the types we've seen. Interpret each config instance separately with it's own type, and rely on the JSON schema output to ensure everything makes sense. In the case where two types are incompatible, we'll just generate an impossible schema (such as {"allOf": [{"type" : "string"}, {"type" : "integer"}]})

I am leaning towards switching to the second option, as it could allow lifting the well-typed in the global typing context restriction and it doesn't rely on subtyping (as it can fail when both U and T are existentially quantified).

@@ -0,0 +1,3 @@
{
"k" : [true, false, false, true, "not a bit"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't look like a fun bit vector syntax! Could we also support Sail bit vector literals? I.e. "0x1234" and "0b0101"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although I probably can't support underscores as separators due to how JSON schema works.

@Alasdair
Copy link
Collaborator Author

Alasdair commented Jan 8, 2025

It doesn't seem like you can read the config as arrays/lists? That would be really helpful for PMAs. Currently we have this nonsense:

This will be implemented before the PR is merged. I just have an assert false for the unimplemented cases right now.

@Timmmm
Copy link
Contributor

Timmmm commented Jan 8, 2025

A couple more questions (sorry!):

  1. For debugging I have:
    // If set to true this will print logging for Page Table Walks for
    // the specific address.
    std::optional<uint64_t> debug_ptw_address;

It would be cool if you could do:

let debug_ptw_address : option(xlenbits) = config debug_ptw_address

and have it convert JSON null (or absense) to none. Obviously not critical (I think that's the only option setting I have) but maybe it's easy to do.

  1. The C callbacks aren't actually type checked at all, so you can declare a function that returns range(1, 10) and then just return 20 from it and nothing will complain until it all breaks deep in the C at runtime somewhere. Would it be possible to make this do actual checks on the values? (Maybe it does already; I haven't checked.)

@Alasdair
Copy link
Collaborator Author

Alasdair commented Jan 8, 2025

The C callbacks aren't actually type checked at all, so you can declare a function that returns range(1, 10) and then just return 20 from it and nothing will complain until it all breaks deep in the C at runtime somewhere. Would it be possible to make this do actual checks on the values? (Maybe it does already; I haven't checked.)

My plan for this was to generate a JSON schema when building the model, then you can guarantee that any json passed at runtime won't violate type-safety as long as it validates against that schema.

@Timmmm
Copy link
Contributor

Timmmm commented Jan 8, 2025

My plan for this was to generate a JSON schema when building the model, then you can guarantee that any json passed at runtime won't violate type-safety as long as it validates against that schema.

That assumes you actually do the schema check though right? I think it might be quite easy to not do that?

Maybe it could do both - when you generate the schema, also generate type checking functions in C?

I guess that can be added later quite easily without any interface changes.

@Alasdair
Copy link
Collaborator Author

Maybe it could do both - when you generate the schema, also generate type checking functions in C?

I guess that can be added later quite easily without any interface changes

Yes. There is a slight issue that by the time we reach the C generation some type information has been lost, as the types are simplified before then. We could work around that in this case because of the additional requirement that we can generate a JSON schema that represents each type used in the configuration, but that extra information would need to be passed down somehow. I'm not sure this is worth doing in the first version, but I agree it would be nice to have.

@Alasdair Alasdair force-pushed the config branch 4 times, most recently from e57998b to 94781a6 Compare January 13, 2025 20:30
@Alasdair
Copy link
Collaborator Author

Since the meeting on Monday:

  1. The Sail source location information for each schema rule is included in the JSON Schema description field.
  2. Structs are now parsed from JSON objects.
  3. Bitvector parsing has been improved.

This is probably now in a state where it can be merged. There are a few missing things like union parsing (including allowing null for option(T)) and more tests would be good.

@Alasdair Alasdair force-pushed the config branch 4 times, most recently from 11687c8 to 9a396d6 Compare January 16, 2025 16:07
This commit introduces a method for configuring Sail models using
a JSON configuration file. It introduces a new construct into the language:
```
config a.b.c : T
```
which gets the value of the json object a.b.c in the configuration file
and interprets it as the Sail type T

Configuration options can be instantiated at compile time using
```
sail --config c.json <files>
```
which is useful for Sail targets like SMT or SystemVerilog, or
they can be accessed at runtime, which allows for building a configurable emulator
The init instruction used to just have the form:
```
ctyp id = cval
```
which was always equivalent to:
```
ctyp id
id = cval
```

However this doesn't work with some C and C++ constructs where we want
to generate something that is only valid as an initializer, i.e.
```C
const_sail_string[] = {"Hello", "World"};
```

To fix this the init instruction has the form
```
ctyp id = init
```
where `init` can either be a plain cval as before or a special initializer
form that can't be split into two instructions. Before this patch we just had
to hope that the cval that compiled into an initializer wouldn't be constant
propagated, but with this that is statically guaranteed.
Second, for each type [T] attached to a configuration node (which
may have been inferred from context by the type system), we will
also attempt to synthesise a JSON Schema. If [T] cannot be turned
into a valid schema then this function raises a fatal type error
exception. This schema cannot capture every possible invariant of
an ISA configuration, but it does provide enough to guarantee that
a configuration applied at runtime will not break type-safety
assuming it validates against the schema.

Parse more kinds of JSON values, including vectors and lists
This commit removes checking constraints from synonym expansion, so the type:
```
type foo('n), 'n >= 32 = bar('n)
```
when appearing as `foo(N)` will just expand directly to `bar(N)` without checking
`N >= 32`.

This is possible because the well-formedness check already checks this, i.e.
`foo(N)` is well-formed only if `N >= 32` holds, so we don't need to check
again when we do the expansion.

This fixes a bug where synonym expansion wasn't handling existentials correctly.
@Alasdair Alasdair force-pushed the config branch 4 times, most recently from dea22bc to 8389cfe Compare January 16, 2025 16:46
Make sure we allocate enough space for raw string parsing,
and add address santizer mode to set of C tests
@Alasdair Alasdair marked this pull request as ready for review January 16, 2025 18:02
@arichardson
Copy link
Contributor

Looking forward to this improvement! One thing I was wondering is if cJSON supports (or if we could change it to support JSON with comments (https://code.visualstudio.com/docs/languages/json#_json-with-comments) by stripping /* */ blocks and //? That could be quite useful for writing the config?

@Alasdair
Copy link
Collaborator Author

Part of the reason I'm using JSON is because it's a common format I can read in all the different Sail targets, so in that sense I'm actually using it in the more intended use case as a data-interchange format. If I start relying on extensions that becomes a bit more awkward, although comments are widely supported enough that I could do it.

I also want to generate a lot of the configuration from riscv-config, as it has a lot of information about WARL fields and similar. So it partly depends on whether that's successful or whether we do end up having to write our own configurations by hand.

@Timmmm
Copy link
Contributor

Timmmm commented Jan 16, 2025

cJSON does ignore comments.

@Alasdair
Copy link
Collaborator Author

In that case there isn't a problem, as the OCaml library we use (yojson) does too.

I already modified cJSON to include the raw string representation of numbers, so I could pass them to the GMP parse functions.

@arichardson
Copy link
Contributor

Looks like nlohmann/json also supports it (opt-it) so that is good: https://json.nlohmann.me/features/comments/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants