Elasticsearch has two ways of representing null
values:
- Implicit, i.e. by absence / non-existance of a key and
- Explicit, i.e. by defining a certain value as representing
null
in a the index mapping.
Note: The two are not easily combinable within one field, i.e. querying for (not) null
is not a simple query in this case.
Implicit null
values are simply keys that are not set in the document.
Just omit the key that should e considered null
.
PUT my-null-index/_doc/3
{
}
PUT my-null-index/_doc/4
{
"status_code": "123"
}
Checking for null
is done with an exists
query in the implicit case. Either positive (i.e. NOT NULL
) or, in the following negative (i.e. NULL
):
GET my-null-index/_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "status_code"
}
}
}
}
}
A wildcard
or query_string
query with only an asterisk as search term has the same effect, as it should be handled as an exists
query internally (TODO check if this is true for ES or only for Lucene directly...:
GET my-null-index/_search
{
"query": {
"wildcard": {
"status_code": {
"value": "*"
}
}
}
}
or
GET my-null-index/_search
{
"query": {
"query_string": {
"fields": [
"status_code*"
],
"query": "*"
}
}
}
or inline:
GET my-null-index/_search?q=status_code:*
This will return record 3
from the example above.
Explicit null
values are set per field when creating or modifying an index mapping by specifying the key "null_value"
:
PUT my-null-index
{
"mappings": {
"properties": {
"status_code": {
"type": "keyword",
"null_value": "NULL"
}
}
}
}
Documents that are indexed with the respective field being set to null
will then be assigned the value of null_value
instead.
The following three documents have the field status_code
set differently, once to the null_value
, once to an empty list, and once not at all, i.e. implicitly null
.
PUT my-null-index/_doc/1
{
"status_code": null
}
PUT my-null-index/_doc/2
{
"status_code": []
}
PUT my-null-index/_doc/3
{
}
A query for explicit null
values is done with a term
query:
GET my-null-index/_search
{
"query": {
"term": {
"status_code": "NULL"
}
}
}
This query to get the null
records from the example data above will return record 1
, while paradoxically, the query for the implicit null
values above will return records 2
and 3
.