Skip to content
This repository has been archived by the owner on Apr 4, 2022. It is now read-only.

RFC - a place to discuss and refine the API #20

Closed
DesignByOnyx opened this issue Apr 20, 2020 · 17 comments
Closed

RFC - a place to discuss and refine the API #20

DesignByOnyx opened this issue Apr 20, 2020 · 17 comments

Comments

@DesignByOnyx
Copy link

The "RFC" (request for comments) is intended to provide a place to discuss features and such for this library prior to v1. The goal of this is to collect insight from different developers and discuss use cases we have all encountered in the wild. In particular, I'd like to tap into developer experiences with multiple schema and validation libraries such as yup, joi, AJV, JSON Schema, sequelize, mongoose, et al.

Remember to be kind and focus on providing constructive feedback.

@daffl
Copy link
Member

daffl commented Apr 20, 2020

I would say that Joi is to @feathersjs/schema what Express is to Feathers. The underlying foundation that can be replaced if necessary but solid enough to cover most needs while only requiring a small wrapper for the integration. Joi is especially interesting because of the chainable interface and large ecosystem of resources and available validations.

The difference to the lower level validation library is what schema definitions add:

  • An abstraction in the schema definition format that allows to swap out the validation library
  • TypeScript annotations natively built in
  • Most importantly: Asynchronous resolvers and value setters to declare and resolve associations. No validation library supports this and ORMs/ODMs add a lot of unnecessary database specific complexity while not supporting more complex validation (and data protection) use cases. In addition this allows to:
    • Protect properties based on the context (e.g. only admins can see or edit certain properties)
    • "Massage" queries to be safe based on the context (e.g. non-admins can only query their own entries)
    • Target schema based protocols like GraphQL and GRPC
    • Just like for Feathers and Express, allow fall-throughs to the low level target format as well. E.g. @feathersjs/schema-sequelize allows to easily add Sequelize specific property definitions and constraints like you would in a normal model

A lot of this can already be done in individual hooks (in fact, in a Feathers app itself, @featherjs/schema integration is of course done via hooks) but putting it together with your data model makes a lot more sense. Hooks are great for workflows like sending a notification but using them for schema constraints (often split up across different hooks and mixed in with other workflows) turned out to not always be ideal.

There are now npm packages for almost every schema format combination (JSON schema to SQL, GraphQL to SQL, SQL to GraphQL, JSON schema to GraphQL, Joi to JSON schema) but so far there does not seem to be anything that addresses the actual underlying problem of providing a JavaScript way to define a schema that is not tied to an ORM, database or specific validation library.

@DesignByOnyx
Copy link
Author

DesignByOnyx commented Apr 21, 2020

Thanks @daffl, that helps a lot and I'm glad we are kind of piggybacking on an established library, especially Joi now that it works in the browser (since September '19). Here are my initial comments (I tried to keep it short as possible :/):

  1. The resolver function is awesome - sooo useful and easy.

  2. Let's not allow (or show in docs) using primitive constructors in type definitions:

    // let's not do/show this:
    name: String,
    isValid: { type: Boolean },

    It creates ambiguity between primitive values and instance values... and many don't know the difference. I'm even inclined to say that constructors shouldn't be allowed period. If we want to allow it, then let's create Type.constructor(SomeCtor).

  3. For decorators can we support an api like this?:

    @Type.string().email().required();
    email;
    email: string; // typescript (is this even necessary??)

    It's more terse and slightly more "copy and paste-able" with non-decorator schemas.

  4. Can we leverage Joi's link() function for describing relational data, or make our own keyword. Let me describe:

    // instead of the current API:
    todos: { 
        type: [ 'todo' ]
        resolve: (ctx)  => ctx.service('todo').getList(...)
    }
    
    // maybe something like this established Joi syntax
    // feathers should be able to recognize this as a relationship mapping (see next bullet):
    todos: Type.array().items(Type.link('#todo'))
    
    // Or our own custom "relationship" function (see next bullet):
    todos: Type.array().items(
         // note: I like the pound sign for referencing other schemas:
        Type.$ref('#todo', (TodoModel, context) => TodoModel.getList(...))
    );
  5. Provide a query operator for relational data. In conjunction with the previous item ^^, this would allow feathers to load additional data based on a query syntax. The following are just conversation starters:

    // load all related todos
    todos.getList({ query: {
        $populate: ['todos']
    }});
    
    // or this
    todos.getList({ query: {
        $populate: {
            todos: { paginate: false, query: { foo: 'bar' })
        }
    }});
    
    // or this
    todos.getList({ query: {
        $populate: async (context) => { ... }
    }});

@DaddyWarbucks
Copy link

I like @DesignByOnyx notes on $populate. That is something I do regularly. But I do see some caveats with it being baked into the schema, namely that it will require the DB adapters to handle that $populate keyword and allow it to "pass through" the methods w/o affecting the underlying DB call. I believe this can already be handled in user land with the current resolve syntax.

user: {
    type: User,
    resolve: async (todo, context) => {

      // The dev needs to bypass the DB adapter by plucking off invalid query params and stashing
      // theme somewhere for use in after hooks, here it would have been done in a before hook
      // and placed on a prop called stashedQuery
      const { $populate  } = context.params.stashedQuery

      const { params, app } = context;
    
      if ($populate && $populate.includes('user')) {
        return app.service('users').get(todo.userId, params);
      }
    }
  }

Having that baked in would be cool, but it may be quite difficult for it to play nicely with DB adapters unless I am missing something?

I am also curious what happens when the user does not return anything from the resolve function, either because they used an if-block like above or if the resolve function just doesn't return anything. I assume that the result would just be { ...data, user: undefined }. This is probably fine for the context.result. I only bring it up because when working with withData and withQuery, returning undefined had adverse affects with different DBs. When making those hooks, I had been using fastJoin quite a bit, but was not a fan of how I had to mutate the record directly. For example,

resolver () => (todo, context) => {
  todo.user = await context.app.users.get(todouser_.id);
  return todo;
}

So, I made the API such that you returned the actual result and the hook took care of placing it onto todo.user similar to how this resolve function works. Like I said, this probably isn't an issue with context.result as much as it would be context.data or context.params.query, but I just wanted to point out some obstacles I have encountered with similar "resolver" pattern. In the case of the with* hooks, if the result is undefined, then the property is not added to the result at all rather than being added as undefined. Actually, now that I think about the value setters rather than just the resolvers, this may be important.

@daffl
Copy link
Member

daffl commented Apr 21, 2020

  1. 👍
  2. Makes sense for primitive types and keeps it consistent. I guess that's why TypeScript uses indicators like lowercase boolean and string. It would still be nice to have the same constructor references as you would in a TypeScript class but having only one way of doing things is definitely always the better choice especially since Constructor references do not allow for circular dependencies. This would require every schema to have a name though (it creates a unique internal one at the moment if it does not)
  3. That's how it originally was but TypeScript requires the type and you really don't want to duplicate the type indicator. Even I messed it up a few time and had a different type than the TS class property definition and things broke in really weird ways. The callback basically gets the correct type validator from the property definition:
  // Here `validator` is a string validator
  @property<Joi.StringValidator> (validator => validator.email().required())
  email: string;
  1. I think type linking that way (as mentioned in 2) makes sense but I'm not sure I follow how you would tell it what to query for. Say I have a user and want their todos I need some kind of resolver to tell it what to fetch (all todos with query: { userId: user.id }). Doing this declaratively (via a JS object syntax instead of a function that can return anything) is something I'd be very hesitant to support natively. It might make for a good plugin that returns standard relationship resolver functions but it does get into ORM complexity territory (my theory is that ORMs are so difficult not because of what they do but because of the DSL each one has to develop to declare dependencies).
  2. Yes, a populate (or $select) should definitely be in there. It basically defines which resolvers should run. It would not be passed to the database adapter.

@DaddyWarbucks These were my thoughts on resolvers (getters) and value (setters):

const User = schema({
  name: 'user'
}, {
  password: {
    type: Type.string().required(),
    async value (value, user, context) {
      const hashedPassword = await bcrypt(value);

      return hashedPassword;
    },
    async resolve (value, user, context) {
      // Hide password for external call
      if (context.params.provider) {
        return undefined;
      }
      
      return value;
    }
  },
  permission: {
    type: Type.string(),
    async value (value, user, context) {
      const { user } = context.params;

      // Only an admin can change the user permission
      // With authentication there always will be a user for external requests
      if (user && user.permission !== 'admin') {
        return undefined;
      }

      return value;
    }
  }
});

If you return undefined it will not be included in data (from the value setter) or result (from the resolve getter)

@DaddyWarbucks
Copy link

DaddyWarbucks commented Apr 21, 2020

Another feature I think we may want to explore is some "partial" validation. Often we need to run some hooks that only require part of the data in order to get the rest of the data, before being validated. For example this following hook is used to validate some data before sending it off to Cloudinary. Cloudinary then sends back its results which are merged with the original data and then validated and persisted. The following code is from very similar setup to feathers-schema where I store the schema on the service, etc.

const sendToCloudinary = async context => {
  const schema = context.service.schema.clonePartial([
    'entity_id',
    'entity_type',
    'document_type'
  ]);

  // I want to throw the same ValidationError here as what would be thrown from the
  // feathers-schema validation hook, but I am not ready to validate the whole thing yet
  const cloudinaryStuff = await schema.validateData(context.data);

  const cloudinaryRes = await Cloudinary.doTheThing(cloudinaryStuff);

  context.data = { ...context.data, cloudinaryRes };

  // Now the context.data is ready to be validated against the whole schema

  return context;
};

Note the clonePartial method on the schema. This method returns a new Schema that is a subset of the total schema. Also note the validateData method. That method simply calls the validateAsync method and wraps the error in ValidationError. I also have a validateQuery method that also simply calls the validateAsync but wraps the error in a QueryError. I am not sure if that is helpful, just throwing out some experience from a similar setup.

This could be handled manually, but it requires the developer to create an additional Schema and also manually throw the ValidationError.

Another suggestion, we may also want to let the developer pass in the schema via props. For example change this line: https://github.com/feathersjs/schema/blob/c9f345b23263c157efc32224f82eb1fc58e714ff/packages/schema-hooks/src/index.ts#L19
To something like

const schema = getSchema(context.params.schema || service?.options?.Schema || service.Schema);

Perhaps that is a little too close to a validation hook rather than a "This schema describes this service" type of thing, but just a suggestion.

@KidkArolis
Copy link

Hi, I'm not very familiar with the scope and goals of @featherjs/schema, but wanted to share a couple of learnings from using ajv with Feathers.

Ajv async validators

Ajv does support async validators, I use that feature to check things based on context or database. E.g. I've build custom validation keywords such as:

  • foreignKey (checks DB for that key)
  • internal (checks if provider is set before allowing to modify that field)
  • permission (for some field based authorization based on permissions in DB)

This isn't to try and sway you away from yup, just wanted to point out that ajv supports this. The main difference is that it doesn't support inline validation functions (that I know of). For those cases I tend to do extra work in my hooks.

Read vs create vs update

For all my resources, I tend to need 3 things:

  • what's the schema for creating
  • what's the schema for updating
  • what fields can the requesting user read

For that I actually export each of these definitions separately. I was wondering how you were thinking about handling that here.

A typical schema file (e.g. foo.schemas.js) ends up looking like this:

const fields = [
  'id',
  'personId',
  'name',
  'email',
  'phoneNumber',
  'formattedPhoneNumber', // derived in a hook
  'relationship',
  'primary',
  'createdAt',
  'updatedAt',
]

const patch = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    email: { type: 'string', nullable: true },
    phoneNumber: { type: 'string', nullable: true },
    relationship: { type: 'string', nullable: true },
  },
  additionalProperties: false,
}

const create = {
  ...patch,
  properties: {
    ...patch.properties,
    personId: { type: 'string' },
    primary: { type: 'boolean' },
  },
  required: ['personId', 'name'],
}

module.exports.create = create
module.exports.patch = patch

module.exports.read = {
  owner: fields,
  admin: fields,
  user fields,
  self: fields,
}

@DesignByOnyx
Copy link
Author

DesignByOnyx commented Apr 28, 2020

@daffl - I totally agree with you on this point:

  1. ........ I need some kind of resolver to tell it what to fetch (all todos with query: { userId: user.id }). Doing this declaratively (via a JS object syntax) ... does get into ORM complexity.

I agree and would like to avoid that at all costs. I guess my concern was more in describing the relationship of the data rather than how to hydrate the data. These are two separate things, and if the schema focused on describing the relationship, then tools/extensions/plugins can be built to take care of "how" the data is hydrated. This ties into my comment about AJV below.

@DaddyWarbucks - partial schema validation is possible with Joi, though the way they recommend doing it is kind of wonky to me. I second your recommendations about providing a validateData method which wraps the error in a standard ValidationError.

@KidkArolis - I am with you on AJV, which (for those who don't know) is a tool that compiles JSON Schema with the ability to provide custom extensions. My biggest problem with Joi/Yup is that they are so rich that it's not shareable with different languages. This is particularly painful if you have a non-JavaScript backend service with which you need to stay in sync. @daffl mentioned earlier that there are tools to convert in every direction, but in practice I have found that these tools are basically useless. JSON Schema is the "least common denominator" and the most interoperable due to its static nature. Individual teams can share and extend these static schemas in their own native languages. If done thoughtfully, these language-specific extensions can be tested in a uniform way with uniform error messaging. The same cannot be said for sharing Joi/Yup schemas.

If I could have my druthers, I would make the core of feathers-schema based on AJV, encouraging users to describe their models with static JSON Schema, writing custom "resolver" functions and other extensions separate from the static schemas. However, I'm totally fine with sticking with Joi due to the ecosystem surrounding it.

@gustojs
Copy link

gustojs commented May 17, 2020

Just wondering, is the RFC taking into consideration the plans to make the core of Feathersjs compatible with deno? Will Feathersjs + deno users have to wait for Joi/Ajv to be released on deno too?

@daffl
Copy link
Member

daffl commented May 26, 2020

I am working on making the core libraries Deno compatible and seeing how much work that already is (I basically have to create our own Deno and Node compatible replacements for every module that's being used) this is probably not going to happen right away for schema definitions.

As mentioned in feathersjs/feathers#1964 initial Deno support will focus on core functionality only and not include any module that has significant third party dependencies.

@daffl
Copy link
Member

daffl commented Jul 6, 2020

I retract my previous statement. Given that Joi's future is somewhat uncertain I opted for making schema core validation library independent and have it stay flexible enough to use it with whichever validation library you choose.

The core concept here are schema functions (async (value, object context, metadata)) that can be used for types, values (validation) and resolvers:

import { schema, string, number, boolean } from '@feathersjs/schema';

const User = schema({
  name: 'users'
}, {
  email: {
    type: string(),
    resolve: async (email, user, context, metadata) {
      if (!context.params.user.isAdmin) {
        // Non admins can not see the email
        return undefined;
      }

      return email;
    }
  },
  age: {
    type: number(),
    value: async (age, user, context, metadata) {
      if (age < 0) {
        throw new Error('Age can not be negative');
      }

      return age;
    }
  },
  enabled: {
    type: boolean(),
    value: async (enabled, user, context, metadata) {
      if (context.params.provider) {
        throw new Error('Can not be set externally');
      }

      return enabled;
    }
  }
});

It is possible to make chainable schema functions (e.g. value: required(email())) and add a description to it so the schema can be introspected (e.g. required(email()) would return something like { required: true, email: true }.

@DesignByOnyx
Copy link
Author

DesignByOnyx commented Jul 6, 2020

I trust that you don't make any decision lightly, but this starts to tread into dangerous maintenance territory. It also feels kind of can-define'y in a not-so-good way: What's the difference between resolve and value? What about synchronous variants on both of these functions? Can users handle the difference between sync/async? Is there "pending" and "error" states around the async functions? Is the "value" a promise or the resolved value? [insert all of the asynchrony troubles from can-define here].

In an effort to not be redundant, my previous comment highlights my thoughts around AJV and the separation of 1) static schema descriptions (JSON Schema) and 2) how instances get hydrated/validated. I would greatly encourage that feathers-schema keeps from conflating those concepts (which is what can-define suffered from - not trying to pick on that tool, I just know you're intimate with it).

@daffl
Copy link
Member

daffl commented Jul 7, 2020

To answer the (easier) first two questions:

  • There will be no synchronous anything. All functions async, always.
  • The difference is that value converts (sets) the (incoming) value and resolve resolves (outgoing) value (same thing that is done now - often quite awkwardly - in hooks). This could also be done with two different schemas (good point, thank you for bringing that up).

I really don't feel static schema descriptions are the best way to do this. JSON schema isn't bad but it's one of those things that developers (sometimes) use but don't actually like using:

  • It's much more limited in its extensibility and flexibility than plain JS
  • After quite a bit of research around the JSON schema to anything else (Sequelize, GraphQL, Joi etc.) converters, none of them appeared to work well (or were unmaintained, assuming the authors gave up on that approach)
  • There is no widely used ORM using JSON schema for its definitions. There must be a reason for this other than that everybody wanted to reinvent the wheel.
  • As I mentioned before, to me the whole point of this project is that we already have a DSL for doing all of this and it's called JavaScript (or TypeScript if you want). If I have a converter function that does something wouldn't it be the best place to add metadata to that describes what it does instead of having to put it somewhere else?

Do you have an example why it is bad to conflate those two things? To me they really belong together and it is basically what e.g. GraphQLJS is doing (one might say successfully). I think can-define's weakness was more that it wasn't really universally usable and not targeting anything useful (you basically had to re-define your server schema in a custom DSL again on the client).

@daffl
Copy link
Member

daffl commented Jul 7, 2020

Maybe value and resolve is not super aptly named (set and get doesn't really fit though either) but the user schema is a good example for unifying things that currently have to be done with (imo clunky) hashPassword and protect hooks:

const UserSchema = schema({
  name: 'user'
}, {
  email: {
    type: string()
  },
  password: {
    type: string(),
    value: async password => {
      // Hash the password before saving to the database
      return bcrypt(password);
    },
    resolve: async (password, user, context) {
      // Never return the actual value for external requests
      if (context.params.provider) {
        return undefined;
      }

      return password;
    }
  }
});

This also allows assembling queries with more complex conditions in one place:

const MessageQuerySchema = schema({
  name: 'message-query'
}, {
  userId: {
    type: string(),
    value: async (userId, query, context) => {
      const { user } = context.params;
      
      // Non admins can only query their own messages
      return user.isAdmin ? userId : user.id;
    }
  },

  private: {
    type: boolean(),
    value: async (private, query, context) => {
      const { user } = context.params;

      // Free plan members can never query private messages
      return user.plan === 'free' ? false : private;
    }
  }
});

@KidkArolis
Copy link

KidkArolis commented Jul 7, 2020 via email

@daffl
Copy link
Member

daffl commented Aug 22, 2020

Sorry for being a little slow but I just wanted to let you know that you fine folks are right 😄 The thing I'd like to solve here is context based resolving of properties, not re-inventing all of validation and coercion. I think a pluggable solution (similar to Feathers transport adapters) with JSON schema as the internal format should be doable here.

@marshallswain marshallswain pinned this issue Aug 28, 2020
@marshallswain marshallswain unpinned this issue Aug 28, 2020
@mrfrase3
Copy link

So a bit of a distraction from value/resolve, I was just pondering this and thought it might be a useful fit into feathers-schema.

Let's say we have a database with three services:

  • students
  • classes
  • rooms
    classes has a studentIds and roomId field.

How would one easily/cleanly query all the students that have classes in one room, or what rooms a student has classes in? (assuming we don't care about what classes for this query)

You could query the classes, and then pull the students/rooms, but this can lead to a lot of boilerplate code in the client. I've kind of gotten around this by implementing hooks on find that will detect a field and swap it out, e.g. detect a roomId field in a find on the students service, do a find for all classes with that roomId, and then modify the student query to be like { _id: { $in: classes.reduce((a, class) => [...a, ...class.studentIds], []) } }, throwing theses hooks everywhere is in no-way elegant, doesn't account for nested queries, and doesn't get documented in the schema so the other developers on my team don't even know they can do it. The other solution is to save roomIds/studentIds to the students/rooms services, but this creates magic values/two sources of truth that have to be updated when the other changes.

What would be cool is the ability to define a "virtual ID" of sorts in the schema, that has a definition of where the relationship is actually stored. Say on the students service there is a roomVIds field (or similar) which isn't actually stored in the database but queriable and populates when resolved.

Food for thought.

@daffl
Copy link
Member

daffl commented Apr 22, 2021

Thank you again for the feedback everybody! I moved the discussion with the latest proposal to feathersjs/feathers#2312 and development to https://github.com/feathersjs/feathers/tree/schema since it currently makes more sense to have it in core vs. in a separate repository. I will archive this repository for now but we may bring it back later if it turns out to be something that could be generally useful.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants