Skip to content

Quickstart

Matthieu Monsch edited this page Jan 2, 2017 · 44 revisions

Types

What is a Type?

Each Avro type maps to a corresponding JavaScript Type:

  • int maps to IntType.
  • arrays map to ArrayTypes.
  • records map to RecordTypes.
  • etc.

An instance of a Type knows how to decode and encode its corresponding values. For example the StringType knows how to handle JavaScript strings:

const stringType = new avro.types.StringType();
const buf = stringType.toBuffer('Hi'); // Buffer containing 'Hi''s Avro encoding.
const str = stringType.fromBuffer(buf); // === 'Hi'

The toBuffer and fromBuffer methods above are convenience functions which encode and decode a single object into/from a standalone buffer.

Each type also provides a variety of other methods. Here are a few (refer to the API documentation for the full list):

  • JSON-encoding:

    const jsonString = type.toString('Hi'); // === '"Hi"'
    const str = type.fromString(jsonString); // === 'Hi'
  • Validity checks:

    const b1 = stringType.isValid('hello'); // === true ('hello' is a valid string.)
    const b2 = stringType.isValid(-2); // === false (-2 is not.)
  • Random object generation:

    const s = stringType.random(); // A random string.

How do I get a Type?

It is possible to instantiate types directly by calling their constructors (available in the avro.types namespace; this is what we used earlier), but in the vast majority of use-cases they will be automatically generated by parsing an existing schema.

avsc exposes a static method, Type.forSchema, to do the heavy lifting and generate a type from its Avro schema definition:

// Equivalent to what we did earlier.
const stringType = avro.Type.forSchema('string');

// A slightly more complex type.
const mapType = avro.Type.forSchema({type: 'map', values: 'long'});

// The sky is the limit!
const personType = avro.Type.forSchema({
  name: 'Person',
  type: 'record',
  fields: [
    {name: 'name', type: 'string'},
    {name: 'phone', type: ['null', 'string'], default: null},
    {name: 'address', type: {
      name: 'Address',
      type: 'record',
      fields: [
        {name: 'city', type: 'string'},
        {name: 'zip', type: 'int'}
      ]
    }}
  ]
});

Of course, all the type methods are available. For example:

personType.isValid({
  name: 'Ann',
  phone: null,
  address: {city: 'Cambridge', zip: 02139}
}); // === true

personType.isValid({
  name: 'Bob',
  phone: {string: '617-000-1234'},
  address: {city: 'Boston'}
}); // === false (Missing the zip code.)

For advanced use-cases, Type.forSchema also has a few options which are detailed the API documentation.

What about Avro files?

Avro files (meaning Avro object container files) hold serialized Avro records along with their schema. Reading them is as simple as calling createFileDecoder:

const personStream = avro.createFileDecoder('./persons.avro');

personStream is a readable stream of decoded records, which we can for example use as follows:

personStream.on('data', function (person) {
  if (person.address.city === 'San Francisco') {
    doSomethingWith(person);
  }
});

In case we need the records' type or the file's codec, they are available by listening to the 'metadata' event:

personStream.on('metadata', function (type, codec) { /* Something useful. */ });

To access a file's header synchronously, there also exists an extractFileHeader method:

const header = avro.extractFileHeader('persons.avro');

Writing to an Avro container file is possible using createFileEncoder:

const encoder = avro.createFileEncoder('./processed.avro', type);

Next steps

The API documentation provides a comprehensive list of available functions and their options. The Advanced usage section goes through a few examples to show how the API can be used, including remote procedure calls.

Services

Using Avro's RPC interface, we can implement portable and "type-safe" APIs:

  • Clients and servers can be implemented once and reused for many different communication protocols (in-memory, TCP, HTTP, etc.).
  • All data that flows through the API is automatically validated using its corresponding schema. Function arguments and return values are therefore guaranteed to match the type specified in the API.

In this section, we'll walk through an example of building a simple link management service similar to bitly.

Defining the Service

The first step to creating a service is to define its protocol, describing the available API calls and their signature. There are a couple ways of doing so; we can write JSON definitions directly, or we can use Avro's IDL syntax (which can then be compiled to JSON definitions). The latter is typically more convenient so we will use this here.

/** A simple service to shorten URLs. */
protocol LinkService {

  /** Map a URL to an alias. */
  null createAlias(string alias, string url);

  /** Expand an alias, returning null if the alias doesn't exist. */
  union { null, string } expandAlias(string alias);
}

With the above spec saved to a file, say LinkService.avdl, we can instantiate the corresponding service as follows:

// We first compile the IDL specification into a JSON protocol.
avro.assembleProtocol('./LinkService.avdl', function (err, protocol) {
  // From which we can create our service.
  const service = avro.Service.fromProtocol(protocol);
});

The service object can then be used generate clients and servers, as described in the following sections.

Server implementation

So far, we haven't said anything about how API responses will be computed. This is where servers come in: server provide the logic powering our API.

const urlCache = new Map(); // We'll use an in-memory map to store links.

// We instantiate a server corresponding to our API and implement both calls.
const server = service.createServer()
  .onCreateAlias(function (alias, url, cb) {
    if (urlCache.has(alias)) {
      cb(new Error('alias already exists'));
    } else {
      urlCache.set(alias, url); // Add the alias to the cache.
      cb();
    }
  })
  .onExpandAlias(function (alias, cb) {
    cb(null, urlCache.get(alias));
  });

Notice that no part of the above implementation is coupled to a particular communication scheme (e.g. HTTP, TCP, AMQP): the code we wrote is transport-agnostic. The following section shows how to instantiate two different clients.

Calling our service

The simplest way to call our service is use an in-memory client, passing in our server above as option to service.createClient:

const client = service.createClient({server});

// We first send a request to create an alias.
client.createAlias('hn', 'https://news.ycombinator.com/', function (err) {
  // Which we can now expand.
  client.expandAlias('hn', function (err, url) {
    console.log(`hn is currently aliased to ${url}`);
  });
});

We can also use the same server and client to communicate over any binary streams, for example TCP sockets:

const net = require('net');

// Set up the server to listen to incoming connections on port 24950.
net.createServer()
  .on('connection', function (con) { server.createChannel(con); })
  .listen(24950);

// And create a matching client:
const client = service.createClient({transport: net.connect(24950)});
Clone this wiki locally