December 12, 2017

GraphQL Schema Stitching explained: Schema Delegation

Understanding GraphQL schema stitching (Part II)

In the last article, we discussed the ins and outs of remote (executable) schemas. These remote schemas are the foundation for a set of tools and techniques referred to as schema stitching.

Schema stitching is a brand new topic in the GraphQL community. In general, it refers to the act of combining and connecting multiple GraphQL schemas (or schema definitions) to create a single GraphQL API.

There are two major concepts in schema stitching:

  • Schema delegation: The core idea of schema delegation is to forward (delegate) the invocation of a specific resolver to another resolver. In essence, the respective fields of the schema definitions are being “rewired”.
  • Schema merging: Schema merging is the idea of creating the union of two (or more) existing GraphQL APIs. This is not problematic if the involved schemas are entirely disjunct — if they are not, there needs to be a way for resolving their naming conflicts.

Notice that in most cases, delegation and merging will actually be used together and we’ll end up with a hybrid approach that uses both. In this article series, we’ll cover them separately to make sure each concept can be well understood by itself.

Example: Building a custom GitHub API

Let’s start with an example based on the public GitHub GraphQL API. Assume we want to build a small app that provides information about the Prisma GitHub organization.

The API we need for the app should expose the following capabilities:

  • retrieve information about the Prisma organization (like its ID, email address, avatar URL or the pinned repositories)
  • retrieve a list of repositories from the Prisma organization by their names
  • retrieve a short description about the app itself

Let’s explore the Query type from GitHub’s GraphQL schema definition to see how we can map our requirements to the schema’s root fields.

Requirement 1: Retrieve info about Graphcool organization

The first feature, retrieving information about the Prisma organization, can be achieved by using the repositoryOwner root field on the Query type:

type Query {

  # ...

  # Lookup a repository owner (ie. either a User or an Organization) by login.
  repositoryOwner(
    # The username to lookup the owner by.
    login: String!
  ): RepositoryOwner

  # ...

}

We can send the following query to ask for information about the Prisma organization:

query {
  repositoryOwner(login: "prismagraphql") {
    id
    url
    pinnedRepositories(first:100) {
      edges {
        node {
          name
        }
      }
    }
    # ask for more data here
  }
}

It works when we provide "prismagraphql" as the login to the repositoryOwner field.

One issue here is that we can’t ask for the email in a straightforward way, because RepositoryOwner is only an interface that doesn’t have an email field. However, since we know that the concrete type of the Prisma organization is indeed Organization, we can work around this issue by using an inline fragment inside the query:

query {
  repositoryOwner(login: "prismagraphql") {
    id
    ... on Organization {
      email
    }
  }
}

Ok, so this will work but we’re already hitting some friction points that don’t allow for a straightforward use of the GitHub GraphQL API for the purpose of our app.

Ideally, our API would just expose a root field that allowed to ask directly for the info we want without needing to provide an argument upon every query and letting us ask for fields on Organization directly:

type Query {
  prismagraphql: Organization!
}

Requirement 2: Retrieve list of Graphcool repositories by name

How about the second requirement, retrieving a list of the Graphcool repositories by their names. Looking at the Query type again, this becomes a bit more complicated. The API doesn’t allow to retrieve a list of repositories directly— instead you can ask for single repositories by providing the owner and the repo’s name using the following root field:

type Query {

  # ...

  # Lookup a given repository by the owner and repository name.
  repository(
    # The login field of a user or organization
    owner: String!

    # The name of the repository
    name: String!
  ): Repository

  # ...

}

Here’s a corresponding query:

query {
  repository(owner: "prismagraphql", name: "graphql-yoga") {
    name
    description
    # ask for more data here
  }
}

However, what we actually want for our app (to avoid having to make multiple requests) is a root field looking as follows:

type Query {
  prismagraphqlRepositories(names: [String!]): [Repository!]!
}

Requirement 3: Retrieve short description about the app itself

Our API should be able to return a sentence describing our app, such as "This app provides information about the Prisma GitHub organization".

This is of course a completely custom requirement we can’t fulfil based on the GitHub API — but rather it’s clear that we need to implement it ourselves, potentially with a simple Query root field like this:

type Query {
  info: String!
}

Defining the application schema

We’re now aware of the required capabilities of our API and the ideal Query type we need to define for the schema:

type Query {
  prismagraphql: Organization!
  prismagraphqlRepositories(names: [String!]): [Repository!]!
  info: String!
}

Obviously, this schema definition in itself is incomplete: it misses the definitions for the Organization and the Repository types. One straightforward way of solving this problem is to just manually copy and paste the definitions from GitHub’s schema definition.

This approach quickly becomes cumbersome, since these type definitions themselves depend on other types in the schema (for example, the Repository type has a field codeOfconduct of type CodeOfConduct) which you then need to manually copy over as well. There is no limit to how deep this dependency chain goes into the schema and you might even end up copying the full schema definition by hand.

Note that when manually copying over types, there are three ways this can be done:

  • The entire type is copied over, no additional fields are added
  • The entire type is copied over and additional fields are added (or existing ones are renamed)
  • Only a subset of the type’s fields are copied over

The first approach of simply copying over the full type is the most straightforward. This can be automated using graphql-import, as explained in the next section.

If additional fields are added to the type definition or existing ones are renamed, you need to make sure to implement corresponding resolvers as the underlying API of course cannot take care of resolving these new fields.

Lastly, you might decide to only copy over a subset of the type’s fields. This can be desirable if you don’t want to expose all the fields of a type (the underlying schema might have a password field on the User type which you don’t want to be exposed in your application schema).

Importing GraphQL type definitions

The package graphql-import saves you from that manual work by letting you share type definitions across different .graphql-files. You can import types from another GraphQL schema definition like so:

# import Repository from "./github.graphql"
# import Organization from "./github.graphql"

type Query {
  info: String!
  graphcoolRepositories(names: [String!]): [Repository!]!
  graphcool: Organization!
}

In your JavaScript code, you can now use the importSchema function and it will resolve the dependencies for you, ensuring your schema definition is complete.

Implementing the API

With the above schema definition, we’re only halfway there. What’s still missing is the schema’s implementation in the form of resolver functions.

If you’re feeling lost at this point, make sure to read this article which introduces the basic mechanics and inner workings of GraphQL schemas.

Let’s think about how to implement these resolvers! A first version could look as follows:

const { importSchema } = require('graphql-import')

// Import the application schema, including the
// types it depends on from `schemas/github.graphql`
const typeDefs = importSchema('schemas/app.graphql')

// Implement resolver functions for our three custom
// root fields on the `Query` type
const resolvers = {
  Query: {
    info: (parent, args) => 'This app provides information about the Prisma GitHub organization',
    prismagraphqlRepositories: (parent, { names }, context, info) => {
      // ???
    },
    prismagraphql: (parent, args, context, info) => {
      // ???
    }
  }
}

The resolver for info is trivial, we can return a simple string describing our app. But how to deal with the ones for prismagraphql and prismagraphqlRepositories where we actually need to return information from the GitHub GraphQL API?

The naive way of implementing this here would be to look at the info argument to retrieve the selection set of the incoming query — then construct another GraphQL query from scratch that has the same selection set and send it to the GitHub API. This can even be facilitated by creating a remote schema for the GitHub GraphQL API but overall is still quite a verbose and cumbersome process.

This is exactly where schema delegation comes into play! We saw before that GitHub’s schema exposes two root fields that (somewhat) cater the needs for our requirements: repositoryOwner and repository. We can now leverage this to save the work of creating a completely new query and instead forward the incoming one.

Delegating to other schemas

So, rather than trying to construct a whole new query, we simply take the incoming query and delegate its execution to another schema. The API we’re going to use for that is called delegateToSchema provided by graphql-tools.

delegateToSchema receives seven arguments (in the following order):

  1. schema: An executable instance of GraphQLSchema (this is the target schema we want to delegate the execution to)
  2. fragmentReplacements: An object containing inline fragments (this is for more advanced cases we’ll not discuss in this article)
  3. operation: A string with either of three values ( "query" , "mutation" or "subscription") indicating to which root type we want to delegate
  4. fieldName: The name of the root field we want to delegate to
  5. args: The input arguments for the root field we’re delegating to
  6. context: The context object that’s passed through the resolver chain of the target schema
  7. info: An object containing information about the query to be delegated

In order for us to use this approach, we first need an executable instance of GraphQLSchema that represents the GitHub GraphQL API. We can obtain it using makeRemoteExecutableSchema from graphql-tools.

Notice that GitHub’s GraphQL API requires authentication, so you’ll need an authentication token to make this work. You can follow this guide to obtain one.

In order to create the remote schema for the GitHub API, we need two things:

  • its schema definition (in the form of a GraphQLSchema instance)
  • an HttpLink that knows how to fetch data from it

We can achieve this using the following code:

// Read GitHub's schema definition from local file
const gitHubTypeDefs = fs.readFileSync('./schemas/github.graphql', {encoding: 'utf8'})

// Instantiate `GraphQLSchema` with schema definition 
const introspectionSchema = makeExecutableSchema({ typeDefs: gitHubTypeDefs })

// Create `HttpLink` based using person auth token
const link = new GitHubLink(TOKEN)

// Create remote executable schema based on schema definition and link
const schema = makeRemoteExecutableSchema({
  schema: introspectionSchema,
  link,
})

GitHubLink is just a simple wrapper on top of HttpLink, providing a bit of convenience around creating the required Link component.

Awesome, we now have an executable version of the GitHub GraphQL API that we can delegate to in our resolvers! 🎉 Let’s start by implementing the prismagraphql resolver first:

const resolvers = {
  Query: {
    // ... other resolvers
    prismagraphql: (parent, args, context, info) => {
      return delegateToSchema(
        schema,
        {},
        'query',
        'repositoryOwner',
        {login: 'prismagraphql'},
        context,
        info
      )
    }
  }
}

We’re passing the seven arguments expected by the delegateToSchema function. Overall there are no surprises: The schema is the remote executable schema for the GitHub GraphQL API. In there, we want to delegate execution of our own prismagraphql query, to the repositoryOwner query from GitHub’s API. Since that field expects a login argument, we provide it with "prismagraphql" as its value. Finally we’re simply passing on the info and context objects through the resolver chain.

The resolver for prismagraphqlRepositories can be approached in a similar fashion, yet it’s a bit trickier. What makes it different from the previous implementation is that the types of our prismagraphqlRepositories: [Repository!]! and the original field repository: Repository from GitHub’s schema definition don’t match up as nicely as before. We now need to return an array of repos, instead of a single one.

Therefore, we go ahead and use Promise.all to make sure we can delegate multiple queries at once and bundle their execution results into an array of promises:

const resolvers = {
  Query: {
    // ... other resolvers
    prismagraphqlRepositories: (parent, { names }, context, info) => {
      return Promise.all(
        names.map(name => {
          return delegateToSchema(
            schema,
            {},
            'query',
            'repository',
            { owner: 'prismagraphql', name },
            context,
            info,
          )
        })
      )
    },
  }
}

This is it! We have now implemented all three resolvers for our custom GraphQL API. While the first one (for info) is trivial and simply returns a custom string, prismagraphql and prismagraphqlRepositories are using schema delegation to forward execution of queries to the underlying GitHub API.

If you want to see a working example of this code, check out this repository.

Schema delegation with graphql-tools

In the above example of building a custom GraphQL API on top of GitHub, we saw how delegateToSchema can save us from writing boilerplate code for query execution. Instead of constructing a new query from scratch and sending it over with fetch, graphql-request or some other HTTP tool, we can use the API provided by graphql-tools to delegate the execution of the query to another (executable) instance of GraphQLSchema. Conveniently, this instance can be created as a remote schema.

At a high-level, delegateToSchema simply acts as a “proxy” for the execute function from GraphQL.js. This means that under the hood it will reassemble a GraphQL query (or mutation) based on the information passed as arguments. Once the query has been constructed, all it does is invoke execute with the schema and the query.

Consequently, schema delegation doesn’t necessarily require the target schema to be a remote schema, it can also be done with local schemas. In that regard, schema delegation is a very flexible tool— you might even want to delegate inside the same schema. This is basically the approach taken in mergeSchemas from graphql-tools, where multiple schemas are first merged into a single one, then the resolvers are rewired.

In essence, schema delegation is about being able to easily forward queries to an existing GraphQL API.

Schema binding: An easy way to reuse GraphQL APIs

Equipped with our newly acquired knowledge about schema delegation, we can introduce a new concept which is nothing but a thin convenience layer on top of schema delegation, called schema binding.

Bindings for public GraphQL APIs

The core idea of a schema binding is to provide an easy way for making an existing GraphQL API reusable so that other developers can now pull into their projects via NPM. This allows for an entirely new approach of building GraphQL “gateways” where it’s extremely easy to combine the functionality of multiple GraphQL APIs.

With a dedicated binding for the GitHub API, we can now simplify the example from above. Rather than creating the remote executable schema by hand, this part is now done by the graphql-binding-github package. Here’s what the full implementation looks like where all the initial setup code we previously needed to delegate to the GitHub API is removed:

const { GitHub } = require('graphql-binding-github')
const { GraphQLServer } = require('graphql-yoga')
const { importSchema } = require('graphql-import')

const TOKEN = '__YOUR_GITHUB__TOKEN__' // https://developer.github.com/v4/guides/forming-calls/#authenticating-with-graphql
const github = new GitHub(TOKEN)

const typeDefs = importSchema('schemas/app.graphql')

const resolvers = {
  Query: {
    info: (parent, args) => 'This app provides information about the Prisma GitHub organization',
    prismagraphqlRepositories: (parent, { names }, context, info) => {
      return Promise.all(
        names.map(name => {
          return github.delegate(
            'query',
            'repository',
            { owner: 'prismagraphql', name },
            context,
            info,
          )
        })
      )
    },
    prismagraphql: (parent, args, context, info) => {
      return github.delegate(
        'query',
        'repositoryOwner',
        {login: 'prismagraphql'},
        context,
        info
      )
    }
  }
}

const server = new GraphQLServer({ typeDefs, resolvers })
server.start(() => console.log('Server running on http://localhost:4000'))

Instead of creating the remote schema ourselves, we’re simply instantiating the GitHub class imported from graphql-binding-github and use its delegate function. It will then use delegateToSchema under the hood to actually perform the request.

Schema bindings for public GraphQL APIs can be shared among developers. Next to the graphql-binding-github, there also already is a binding available for the Yelp GraphQL API: graphql-binding-yelp by Devan Beitel

Auto-generated delegate functions

The API for these sorts of schema bindings can even be improved to a level where delegate functions are automatically generated. Rather than writing the following github.delegate('query', 'repository', ... ), the binding could expose a function named after the corresponding root field: github.query.repository( ... ).

When these delegate functions are generated in a build-step and based on a strongly typed language (like TypeScript or Flow), this approach will even provide compile-time type safety for interacting with other GraphQL APIs!

To get a glimpse of what this approach looks like, check out the prisma-binding repository which allows to easily generate schema bindings for Graphcool services, and uses the mentioned approach of automatically generating delegate functions.

Summary

This is our second article of the series “Understanding GraphQL schema stitching”. In the first article, we did some groundwork and learned about remote (executable) schemas which are the foundation for most schema stitching scenarios.

In this article, we mainly discussed the concept of schema delegation by providing a comprehensive example based on the GitHub GraphQL API (the code for the example is available here). Schema delegation is a mechanism to forward (delegate) the execution of a resolver function to another resolver in a different (or even the same) GraphQL schema. Its key benefit is that we don’t have to construct an entirely new query from scratch but instead can reuse and forward (parts of) the incoming query.

When using schema delegation as the foundation, it is possible to create dedicated NPM packages to easily share reusable schema bindings for existing GraphQL APIs. To get a sense of what these look like, you can check out the bindings for the GitHub API as well as the prisma-binding which allows to easily generate bindings for any Graphcool service.

Comments

Comments

Don’t miss the next post!