From Types in Elasticsearch to Type-Less Indices in OpenSearch

From Types in Elasticsearch to Type-Less Indices in OpenSearch

In earlier versions of Elasticsearch, types were a convenient way to categorically organize documents within a single index. However, types were eventually deprecated, pushing developers toward a type-less structure in Elasticsearch 7 and later versions. Recently while working on a project to migrate a Rails application from Elasticsearch 2 to OpenSearch 2, we faced the challenge of identifying a way to replicate this behavior in a way that is allowed with OpenSearch. In this blog we will discuss how types were used for document organization and how to achieve the same behavior in OpenSearch.

Understanding Types in Elasticsearch

Before Elasticsearch 7, the availability of types helped to organize documents that had unique mapping definitions within a single index, making it easier to categorize and search across multiple types of documents with one call.

In this example, if we take a look at /content_resources/_mapping, we can identify 5 types of documents stored in one index, ContentResource:

{
  "content_resources": {
    "mappings": {
      "Document": {...},
      "File": {...},
      "Image": {...},
      "Audio": {...},
      "Video": {...}
    }
  }
}

When a search is conducted on the ContentResource index, this means that the index, having access to the 5 different content resources types, would be able to query and include results from the 5 types.

If we take a closer look at an individual document, we can see that each document contains a _type metadata. This defines the type that the document belongs to:

{
  "_index": "content_resources",
  "_type": "Image",
  "_id": "14",
  "_score": 1,
  "_source": {...}
}

While this provided the benefit of categorization, it was deprecated due to some limitations that presented consistent issues such as:

  • Mapping conflicts: If two separate documents had the same field with a different format, Elasticsearch would struggle to resolve the difference from within the same index. The complexity of managing mappings and routing across multiple types in large clusters essentially created more overhead.
  • Performance Impact: Types added storage and processing overhead because each type introduced extra metadata, increasing the storage footprint across large datasets. Managing multiple types within a single index required additional schema layers, which made indexing slower and added resource strain on servers. Moreover, even when querying a single type, Elasticsearch had to process multiple mappings, slowing down retrieval for large, multi-type indices.
  • ID Conflicts: In Elasticsearch, document IDs were shared across types within an index, which meant that two documents with the same ID in different types could conflict. Moving to a type-less structure avoided this risk by isolating documents within their own indices.

The deprecation was announced in Elasticsearch 6, but indices would still support one type per index to give users a chance to start moving to a single-type structure. The removal of types opens a new window became concrete with the release of Elasticsearch 7, leading to type-less indices.

Replicating Types in OpenSearch

OpenSearch is a fork of Elasticsearch 7 opens a new window , which means that it inherits the deprecation and type-less indices. For us, this meant that while the application used a version of Elasticsearch which allowed the use of types, migrating to OpenSearch would not, so we had to come up with a solution that behaved the same way and was safe.

Our goal in the end was to ensure that when a search is conducted on the ContentResource index, we could also search across the 5 different types of resource documents in an efficient way. To solve this problem, we looked to add aliases using the OpenSearch alias API. opens a new window

Aliases provide a way to group multiple indices under a single alias name, making it easier to search and manage data across different types of data that are related in some way. This approach works well if you separate what would have been document types into individual indices but want to query them together when needed. The added benefit is that while aliases allow searching across multiple indices, each index can also be managed independently (updating mappings, etc.), avoiding the conflicts that led to types’ deprecation. Aliases enable simultaneous querying across multiple indices without enforcing a single schema. This is helpful because each index can evolve independently without impacting others in the alias. While some fields should ideally be consistent across indices to support uniform queries, they don’t need to share the exact schema, which allows flexibility.

Conversely, because each type now resides in a separate index, managing and updating mappings individually may require more overhead than it did with types. Some challenges may be faced when querying with aliases can be more complex, as each index must be individually maintained, and performing aggregations across different indices with varying mappings may yield unexpected results if fields aren’t aligned.

The steps we followed to complete this restructuring:

  1. Create individual index and mapping definitions for each of the 5 types meaning we would now be adding 5 new indices to work with.
  2. We removed the ContentResource index entirely. Since we separated each type into its own index, the ContentResource index became unnecessary. By deleting it and creating individual indices, we ensured each document type would be housed in a dedicated index.
  3. Once the indices were created, we sent a request to add an alias named content_resources to each of these newly created indices:
POST /_aliases
{
  "actions": [
    { "add": { "index": "documents", "alias": "content_resources" } },
    { "add": { "index": "files", "alias": "content_resources" } },
    { "add": { "index": "images", "alias": "content_resources" } },
    { "add": { "index": "audios", "alias": "content_resources" } },
    { "add": { "index": "videos", "alias": "content_resources" } }
  ]
}

Using this setup meant that now a query using the content_resources alias will search all 5 indices, or really any additional index defined with this alias.

It is important to note that when moving towards a type-less schema structure, any parts of code that rely on the _type metadata should be updated as well. An alternative approach, such as including a type field in each document to indicate its category, could also allow users to still filter by “type” within their type-less setup.

Conclusion

While migrating away from types requires an initial restructuring, the alias-based setup in OpenSearch offers flexibility and scalability advantages. For those moving away from types, aliases in OpenSearch can offer a practical solution to manage multiple document types. By using separate indices for each type and adding them under a single alias, you avoid mapping conflicts while maintaining easy access for cross-type queries. It may seem more complex, however, the long-term benefits of schema isolation and maintenance outweigh the initial effort. Give aliases a try to experience structured, efficient document management in OpenSearch!

Need help with your Rails application? Talk to us today! opens a new window

Get the book