Skip to content

GSIP 69 Catalog scalability enhancements

jdeolive edited this page Jun 11, 2014 · 1 revision

GSIP 69 - Catalog scalability enhancements

Overview

Improved vertical scalability of Catalog resources (i.e. being able to efficiently manage hundreds of thousands of layers, styles, etc).

Proposed By

Gabriel Roldan

Assigned to Release

GeoServer 2.3.x master branch.

State

Under Discussion, In Progress, Completed, Rejected, Deferred

Motivation

With the arrival of Virtual Services 44 - Virtual services with workspaces , Workspace Local Services 66 - Workspace Local Services , and Workspace Local Settings 67 - Workspace Local Settings GeoServer gets more suited to Multitenancy and hence supporting a large number of configuration resources becomes even more important.

Prior art on this regard includes the development of the [DBConfig Module], which allows to externalize the storage of the configuration objects to a RDBMS using Hibernate O/R mapping, and hence adds the ability for the Catalog to scale up to an unbounded number of workspaces, stores, layers, etc.

Regardless of the Catalog’s backend ability to scale up, GeoServer itself doesn’t gracefully scale as the number of config objects in the catalog increases, since given the way the current Catalog API is designed, assumptions are made that full scans and defensive copies of lists of catalog resources are cheap both in processing time and memory consumption.

This proposal aims to provide a means to solve this problem in a way that allows to progressively adopt any API change throughout the code base where the benefits are clear and measurable.

Scope

In Scope

Given a relatively large number of Catalog configuration objects:

  • Identify some exemplary use cases that result in scalability/performance bottle necks throughout the GeoServer code base;
  • Identify the needed requirements and main QA goals to satisfactorily solve the problems described in the use cases;
  • Design Catalog API enhancements that fulfill the requirements;
  • To validate the API design by providing more than one concrete backend implementation, and to upgrade the Catalog client code from the exemplary use cases.
  • To provide general guidelines on how and when to progressively adopt the new API methods.

Not in Scope

  • It is not in this proposal’s scope to allow applications outside GeoServer to directly edit the backend’s (RDBMS or other) configuration objects. CatalogFacade and GeoServerFacade implementations are free to use whatever storage format and mechanisms they see fit. That said, this proposal also doesn’t forbid Catalog/Config backend implementations to allow for applications outside GeoServer to directly edit the configuration objects.

Use Case Drivers

Check the [GSIP 69 - Use Cases](GSIP 69 - Use Cases) page for further detail.

Requirements

In attention to the above use cases, the following list of high level requirements and QA goals shall be met by Catalog API change proposal:

  1. Filtering: Shall allow for filtering of catalog objects through arbitrary query criteria;
  2. Streaming: Shall allow for a streamed approach to catalog objects retrieval;
  3. Paging: Shall allow for paged queries. Catalog backends shall provide a consistent “natural order” of resources. Doesn’t need to be based on id or any other prescribed property.
  4. Leverage query engines: Shall allow to move any in-process filtering criteria back to the backend, allowing for optimization in the common cases;
  5. Query generality: in-process filtering shall work out of the box for the general case;
  6. Compactness: API changes should be additive and minimal;
  7. Usability: Easy of use and compactness is highly desired;
  8. Incremental adoption: Shall allow for progressive/iterative adoption;
  9. Leverage sub-system cohesion: Shall introduce no external dependencies at the API level.

Proposed Catalog API extensions

Check the [GSIP 69 - API Proposal](GSIP 69 - API Proposal) page for further detail.

API Validation

In this section two ways of validating the Catalog API extension from this proposal is presented. First, we’ll migrate the code from the use cases to the new API to verify its usability and correctness. Then we’ll provide a couple Catalog back end implementations to verify its implementability and effectiveness.

Migration of identified sample offending code

[GSIP 69 - Use Case Code Migration](GSIP 69 - Use Case Code Migration)

Multiple Back-End Implementations

In addition to the default Catalog implementation , a JDBC based catalog and configuration storage has been developed.

The current prototype for the JDBC backend is located at this github branch. The jdbcconfig community module is based on the spring-jdbc framework, and utilizes a RDBMS (either H2 or PostgreSQL at the time of writing) as a key/value store with extra indices for Catalog objects ‘searchable’ properties. The key on this single-table store is the object identifier and the value it’s XStream representation, leveraging exactly the same serialization mechanism GeoServer uses for the on-disk catalog persistence. This is so to minimize the maintenance costs while the Catalog and configuration object model evolves, hence having to maintain only the XStream persistence code for both the on-disk and database back ends.

API Adoption Guidelines

  • If you need to get a count of Catalog objects, use the count method instead of getXXX ().size ():

    int allLayers = catalog.count(LayerInfo.class, Predicates.acceptAll()); int workspaceLayers = catalog.count(LayerInfo.class, Predicates.equal("resource.workspace.id", workspaceId);

  • If only a subset of objects is needed, consider using a Filter instead of in-process filtering:

    //BAD: for(LayerInfo layer : catalog.getLayers()){ if("topp".equals(layer.getResource().getStore().getWorkspace().getName()){ //do something with layer } } //GOOD: Filter filter = Predicates.equal("resource.store.workspace.name", "topp"); Iterator layers = catalog.list(LayerInfo.class, filter); try{ LayerInfo layer; while(layers.hasNext()){ layer = layers.next(); // do something with layer } }finally{ CloseableIteratorAdapter.close(layers); }

  • Push sorting to the backend:

    //BAD: List styles = new ArrayList(catalog.getStyles()); Comparator comparator = new Comparator{ @Override public int compare(StyleInfo s1, StyleInfo s2){ return s1.getName().compareTo(s2.getName()); } } Collections.sort(styles);

    //GOOD: boolean ascending = true; SortBy sortOrder = Predicates.sortBy("name", ascending); Iterator styles = catalog.list(StyleInfo.class, acceptAll(), null, null, sortOrder);

  • Use catalog backend’s paging, even if what you really want is a List and not an Iterator:

    int startIndex = 50; int pageSize = 25; //BAD: List layers = catalog.getLayers(); List page = layers.subList(startIndex, startIndex + pageSize);

    //GOOD: Iterator pageIterator = catalog.list(LayerInfo.class, acceptAll(), startIndex, pageSize, null); List page; try{ page = com.google.common.collect.Lists.newArrayList(pageIterator); }finally{ CloseableIteratorAdapter.close(pageIterator); }

Feedback

This section should contain feedback provided by PSC members who may have a problem with the proposal.

Backwards Compatibility

Backwards compatibility is preserved since the API changes are additive only. All existing code using the current API will keep working untouched.

Voting

Andrea Aime: +1 Alessio Fabiani: Ben Caradoc Davies: +1 Gabriel Roldan: +1 Justin Deoliveira: +1 Jody Garnett: +1 Simone Giannecchini: +1

Links

Clone this wiki locally