Skip to content

[BUG] Unified highlighter does not highlight nested fields when match_phrase_prefix is used #19106

@myronmarston

Description

@myronmarston

Describe the bug

When using the unified highlighter on a search that uses match_phrase_prefix on a nested field, no highlights are returned.

Notably, highlights work in all of these slightly different cases:

  • Using the unified highlighter with match_phrase_prefix on a non-nested field
  • Using the unified highlighter with match on a nested field
  • Using the unified highlighter with match_phrase on a nested field
  • Using the unified highlighter with match_bool_prefix on a nested field
  • Using the plain highlighter with match_bool_prefix on a nested field

In addition, it works correctly on modern versions of Elasticsearch (I tested on 9.1.0).

Related component

Search:Query Insights

To Reproduce

Create an executable script with this content:

#!/bin/bash

# OpenSearch/Elasticsearch Highlighting Bug Demonstration Script
# Bug: unified highlighter does not correctly highlight nested fields when match_phrase_prefix is used
# Author: Generated for bug report
# Usage: ./opensearch_highlighting_bug_demo.sh [opensearch|elasticsearch] [version]

set -e

# Default values
ENGINE="opensearch"
VERSION="3.1.0"
CONTAINER_NAME="search_engine_test"
PORT="9200"

# Arrays to track test results
declare -a TEST_RESULTS
declare -a TEST_DESCRIPTIONS

# Parse command line arguments
if [ $# -ge 1 ]; then
    ENGINE="$1"
fi

if [ $# -ge 2 ]; then
    VERSION="$2"
fi

# Validate engine choice
if [[ "$ENGINE" != "opensearch" && "$ENGINE" != "elasticsearch" ]]; then
    echo "Error: Engine must be 'opensearch' or 'elasticsearch'"
    echo "Usage: $0 [opensearch|elasticsearch] [version]"
    exit 1
fi

echo "========================================="
echo "OpenSearch/Elasticsearch Highlighting Bug Demo"
echo "Engine: $ENGINE"
echo "Version: $VERSION"
echo "========================================="

# Function to wait for the search engine to be ready
wait_for_engine() {
    echo "Waiting for $ENGINE to be ready..."
    local max_attempts=30
    local attempt=1

    while [ $attempt -le $max_attempts ]; do
        # Try to get a response
        if curl -s -f "http://localhost:$PORT" > /dev/null 2>&1; then
            echo "$ENGINE is ready!"
            return 0
        fi

        # Check if container is still running on failure
        if [ $attempt -eq 15 ] && ! docker ps --filter "name=$CONTAINER_NAME" --format "{{.Names}}" | grep -q "$CONTAINER_NAME"; then
            echo "ERROR: Container $CONTAINER_NAME stopped running!"
            echo "Container logs:"
            docker logs "$CONTAINER_NAME" 2>/dev/null || echo "No logs available"
            exit 1
        fi

        printf "."
        sleep 2
        ((attempt++))
    done

    echo ""
    echo "Error: $ENGINE failed to start within expected time"
    echo "Container logs:"
    docker logs "$CONTAINER_NAME" 2>/dev/null || echo "No logs available"
    exit 1
}

# Function to check if highlighting worked
check_highlighting() {
    local response="$1"
    local field_name="$2"

    # Check if the response contains highlight data for the specified field
    if echo "$response" | jq -e ".hits.hits[0].highlight[\"$field_name\"]" > /dev/null 2>&1; then
        return 0  # Highlighting worked
    else
        return 1  # No highlighting
    fi
}

# Function to make HTTP requests with error handling and result tracking
make_request() {
    local method="$1"
    local url="$2"
    local data="$3"
    local description="$4"
    local test_type="$5"
    local highlight_field="$6"

    echo "--- $description ---"

    if [ -n "$data" ]; then
        response=$(curl -s -X "$method" -H "Content-Type: application/json" -d "$data" "$url")
    else
        response=$(curl -s -X "$method" "$url")
    fi

    # If this is a test request, check the highlighting and record the result
    if [ -n "$test_type" ] && [ -n "$highlight_field" ]; then
        if check_highlighting "$response" "$highlight_field"; then
            TEST_RESULTS+=("PASS")
            echo "✅ RESULT: Highlighting worked correctly"
        else
            TEST_RESULTS+=("FAIL")
            echo "❌ RESULT: No highlighting found"
        fi
        TEST_DESCRIPTIONS+=("$test_type")
    else
        # For non-test requests, just show success/failure
        if echo "$response" | jq -e '.error' > /dev/null 2>&1; then
            echo "❌ Request failed"
            echo "$response" | jq '.error' 2>/dev/null || echo "$response"
        else
            echo "✅ Request successful"
        fi
    fi
    echo ""
}

# Clean up any existing container
echo "Cleaning up any existing containers..."
docker stop "$CONTAINER_NAME" 2>/dev/null || true
docker rm "$CONTAINER_NAME" 2>/dev/null || true

# Start the appropriate search engine
echo "Starting $ENGINE:$VERSION..."

if [ "$ENGINE" = "opensearch" ]; then
    echo "Starting OpenSearch with security disabled..."

    # For OpenSearch 3.x, we need different configuration
    if [[ "$VERSION" =~ ^3\. ]]; then
        echo "Using OpenSearch 3.x configuration..."
        docker run -d \
            --name "$CONTAINER_NAME" \
            -p "$PORT:9200" \
            -e "discovery.type=single-node" \
            -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=Admin123!" \
            -e "DISABLE_INSTALL_DEMO_CONFIG=true" \
            -e "DISABLE_SECURITY_PLUGIN=true" \
            -e "bootstrap.memory_lock=false" \
            opensearchproject/opensearch:"$VERSION"
    else
        echo "Using OpenSearch 2.x configuration..."
        docker run -d \
            --name "$CONTAINER_NAME" \
            -p "$PORT:9200" \
            -e "discovery.type=single-node" \
            -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=Admin123!" \
            -e "plugins.security.disabled=true" \
            opensearchproject/opensearch:"$VERSION"
    fi
else
    echo "Starting Elasticsearch with security disabled..."
    docker run -d \
        --name "$CONTAINER_NAME" \
        -p "$PORT:9200" \
        -e "discovery.type=single-node" \
        -e "xpack.security.enabled=false" \
        elasticsearch:"$VERSION"
fi

# Wait for the engine to be ready
wait_for_engine

# Create index with nested field mapping
INDEX_NAME="test_highlighting"
MAPPING='{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "content": {
        "type": "text"
      },
      "nested_field": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "text"
          },
          "description": {
            "type": "text"
          }
        }
      }
    }
  }
}'

make_request "PUT" "http://localhost:$PORT/$INDEX_NAME" "$MAPPING" "Creating index with nested field mapping"

# Index a test document
DOCUMENT='{
  "title": "Sample Document Title",
  "content": "This is the main content of the document with some searchable text",
  "nested_field": [
    {
      "name": "First nested item",
      "description": "This is a description for the first nested item with searchable content"
    },
    {
      "name": "Second nested item",
      "description": "Another description for the second nested item with more searchable text"
    }
  ]
}'

make_request "POST" "http://localhost:$PORT/$INDEX_NAME/_doc/1" "$DOCUMENT" "Indexing test document"

# Refresh the index to make the document searchable
make_request "POST" "http://localhost:$PORT/$INDEX_NAME/_refresh" "" "Refreshing index"

echo ""
echo "========================================="
echo "DEMONSTRATING THE BUG"
echo "========================================="

# Test 1: match_phrase_prefix with nested field using unified highlighter (BUG in OpenSearch, works in Elasticsearch)
echo ""
echo "TEST 1: match_phrase_prefix + nested field + unified highlighter"

QUERY1='{
  "query": {
    "nested": {
      "path": "nested_field",
      "query": {
        "match_phrase_prefix": {
          "nested_field.description": "searchable"
        }
      }
    }
  },
  "highlight": {
    "type": "unified",
    "fields": {
      "nested_field.description": {}
    }
  }
}'

make_request "POST" "http://localhost:$PORT/$INDEX_NAME/_search" "$QUERY1" "TEST 1: match_phrase_prefix + nested + unified highlighter" "match_phrase_prefix + nested + unified" "nested_field.description"

# Test 2: match_phrase_prefix with non-nested field using unified highlighter (WORKS)
echo ""
echo "TEST 2: match_phrase_prefix + non-nested field + unified highlighter"

QUERY2='{
  "query": {
    "match_phrase_prefix": {
      "content": "searchable"
    }
  },
  "highlight": {
    "type": "unified",
    "fields": {
      "content": {}
    }
  }
}'

make_request "POST" "http://localhost:$PORT/$INDEX_NAME/_search" "$QUERY2" "TEST 2: match_phrase_prefix + non-nested + unified highlighter" "match_phrase_prefix + non-nested + unified" "content"

# Test 3: match (not match_phrase_prefix) with nested field using unified highlighter (WORKS)
echo ""
echo "TEST 3: match + nested field + unified highlighter"

QUERY3='{
  "query": {
    "nested": {
      "path": "nested_field",
      "query": {
        "match": {
          "nested_field.description": "searchable"
        }
      }
    }
  },
  "highlight": {
    "type": "unified",
    "fields": {
      "nested_field.description": {}
    }
  }
}'

make_request "POST" "http://localhost:$PORT/$INDEX_NAME/_search" "$QUERY3" "TEST 3: match + nested + unified highlighter" "match + nested + unified" "nested_field.description"

# Test 4: match_phrase with nested field using unified highlighter (WORKS)
echo ""
echo "TEST 4: match_phrase + nested field + unified highlighter"

QUERY4='{
  "query": {
    "nested": {
      "path": "nested_field",
      "query": {
        "match_phrase": {
          "nested_field.description": "searchable content"
        }
      }
    }
  },
  "highlight": {
    "type": "unified",
    "fields": {
      "nested_field.description": {}
    }
  }
}'

make_request "POST" "http://localhost:$PORT/$INDEX_NAME/_search" "$QUERY4" "TEST 4: match_phrase + nested + unified highlighter" "match_phrase + nested + unified" "nested_field.description"

# Test 5: match_bool_prefix with nested field using unified highlighter (WORKS)
echo ""
echo "TEST 5: match_bool_prefix + nested field + unified highlighter"

QUERY5='{
  "query": {
    "nested": {
      "path": "nested_field",
      "query": {
        "match_bool_prefix": {
          "nested_field.description": "searchable"
        }
      }
    }
  },
  "highlight": {
    "type": "unified",
    "fields": {
      "nested_field.description": {}
    }
  }
}'

make_request "POST" "http://localhost:$PORT/$INDEX_NAME/_search" "$QUERY5" "TEST 5: match_bool_prefix + nested + unified highlighter" "match_bool_prefix + nested + unified" "nested_field.description"

# Test 6: match_phrase_prefix with nested field using plain highlighter (WORKS)
echo ""
echo "TEST 6: match_phrase_prefix + nested field + plain highlighter"

QUERY6='{
  "query": {
    "nested": {
      "path": "nested_field",
      "query": {
        "match_phrase_prefix": {
          "nested_field.description": "searchable"
        }
      }
    }
  },
  "highlight": {
    "type": "plain",
    "fields": {
      "nested_field.description": {}
    }
  }
}'

make_request "POST" "http://localhost:$PORT/$INDEX_NAME/_search" "$QUERY6" "TEST 6: match_phrase_prefix + nested + plain highlighter" "match_phrase_prefix + nested + plain" "nested_field.description"

echo ""
echo "========================================="
echo "TEST RESULTS SUMMARY"
echo "========================================="
echo ""
echo "Engine: $ENGINE $VERSION"
echo ""

for i in "${!TEST_RESULTS[@]}"; do
    result="${TEST_RESULTS[$i]}"
    description="${TEST_DESCRIPTIONS[$i]}"

    if [ "$result" = "PASS" ]; then
        echo "✅ PASS: $description"
    else
        echo "❌ FAIL: $description"
    fi
done

# Cleanup function
cleanup() {
    echo "Cleaning up..."
    docker stop "$CONTAINER_NAME" 2>/dev/null || true
    docker rm "$CONTAINER_NAME" 2>/dev/null || true
}

# Ask user if they want to keep the container running
echo "Would you like to keep the $ENGINE container running for further testing? (y/N)"
read -r response
if [[ "$response" =~ ^[Yy]$ ]]; then
    echo "Container '$CONTAINER_NAME' is still running on port $PORT"
    echo "You can access it at: http://localhost:$PORT"
    echo "To stop it later, run: docker stop $CONTAINER_NAME && docker rm $CONTAINER_NAME"
else
    cleanup
    echo "Container cleaned up."
fi

echo ""
echo "Script completed successfully!"

Then run it with elasticsearch or opensearch and a version:

$ ./opensearch_highlighting_bug_demo.sh opensearch 3.1.0

Expected behavior

I expect output indicating that highlights were returned. Here's example successful output from running the script against Elasticsearch 9.1.0:

$ ./opensearch_highlighting_bug_demo.sh elasticsearch 9.1.0
=========================================
OpenSearch/Elasticsearch Highlighting Bug Demo
Engine: elasticsearch
Version: 9.1.0
=========================================
Cleaning up any existing containers...
Starting elasticsearch:9.1.0...
Starting Elasticsearch with security disabled...
7543208dbc107a7d51fe7ccf59995e1942de11d5f5f8cd8fbf90035cc56aa4b5
Waiting for elasticsearch to be ready...
......elasticsearch is ready!
--- Creating index with nested field mapping ---
✅ Request successful

--- Indexing test document ---
✅ Request successful

--- Refreshing index ---
✅ Request successful


=========================================
DEMONSTRATING THE BUG
=========================================

TEST 1: match_phrase_prefix + nested field + unified highlighter
--- TEST 1: match_phrase_prefix + nested + unified highlighter ---
✅ RESULT: Highlighting worked correctly


TEST 2: match_phrase_prefix + non-nested field + unified highlighter
--- TEST 2: match_phrase_prefix + non-nested + unified highlighter ---
✅ RESULT: Highlighting worked correctly


TEST 3: match + nested field + unified highlighter
--- TEST 3: match + nested + unified highlighter ---
✅ RESULT: Highlighting worked correctly


TEST 4: match_phrase + nested field + unified highlighter
--- TEST 4: match_phrase + nested + unified highlighter ---
✅ RESULT: Highlighting worked correctly


TEST 5: match_bool_prefix + nested field + unified highlighter
--- TEST 5: match_bool_prefix + nested + unified highlighter ---
✅ RESULT: Highlighting worked correctly


TEST 6: match_phrase_prefix + nested field + plain highlighter
--- TEST 6: match_phrase_prefix + nested + plain highlighter ---
✅ RESULT: Highlighting worked correctly


=========================================
TEST RESULTS SUMMARY
=========================================

Engine: elasticsearch 9.1.0

✅ PASS: match_phrase_prefix + nested + unified
✅ PASS: match_phrase_prefix + non-nested + unified
✅ PASS: match + nested + unified
✅ PASS: match_phrase + nested + unified
✅ PASS: match_bool_prefix + nested + unified
✅ PASS: match_phrase_prefix + nested + plain
Would you like to keep the elasticsearch container running for further testing? (y/N)
n
Cleaning up...
search_engine_test
search_engine_test
Container cleaned up.

Script completed successfully!

In contrast, here's the output I get from OpenSearch 3.1.0:

./opensearch_highlighting_bug_demo.sh opensearch 3.1.0
=========================================
OpenSearch/Elasticsearch Highlighting Bug Demo
Engine: opensearch
Version: 3.1.0
=========================================
Cleaning up any existing containers...
Starting opensearch:3.1.0...
Starting OpenSearch with security disabled...
Using OpenSearch 3.x configuration...
ce8cee86e80f28b51dd5efc6a368d441b41eb40ddaa8e6d3447f7e866e719431
Waiting for opensearch to be ready...
....opensearch is ready!
--- Creating index with nested field mapping ---
✅ Request successful

--- Indexing test document ---
✅ Request successful

--- Refreshing index ---
✅ Request successful


=========================================
DEMONSTRATING THE BUG
=========================================

TEST 1: match_phrase_prefix + nested field + unified highlighter
--- TEST 1: match_phrase_prefix + nested + unified highlighter ---
❌ RESULT: No highlighting found


TEST 2: match_phrase_prefix + non-nested field + unified highlighter
--- TEST 2: match_phrase_prefix + non-nested + unified highlighter ---
✅ RESULT: Highlighting worked correctly


TEST 3: match + nested field + unified highlighter
--- TEST 3: match + nested + unified highlighter ---
✅ RESULT: Highlighting worked correctly


TEST 4: match_phrase + nested field + unified highlighter
--- TEST 4: match_phrase + nested + unified highlighter ---
✅ RESULT: Highlighting worked correctly


TEST 5: match_bool_prefix + nested field + unified highlighter
--- TEST 5: match_bool_prefix + nested + unified highlighter ---
✅ RESULT: Highlighting worked correctly


TEST 6: match_phrase_prefix + nested field + plain highlighter
--- TEST 6: match_phrase_prefix + nested + plain highlighter ---
✅ RESULT: Highlighting worked correctly


=========================================
TEST RESULTS SUMMARY
=========================================

Engine: opensearch 3.1.0

❌ FAIL: match_phrase_prefix + nested + unified
✅ PASS: match_phrase_prefix + non-nested + unified
✅ PASS: match + nested + unified
✅ PASS: match_phrase + nested + unified
✅ PASS: match_bool_prefix + nested + unified
✅ PASS: match_phrase_prefix + nested + plain
Would you like to keep the opensearch container running for further testing? (y/N)
n
Cleaning up...
search_engine_test
search_engine_test
Container cleaned up.

Script completed successfully!

Additional Details

Plugins

None.

Screenshots

None, but see the script output above.

Host/Environment (please complete the following information):

  • OS: Mac OS X, but we've observed this on linux (e.g. via managed AWS OpenSearch) as well
  • Version [e.g. 22] 3.1.0, but it seems like it's a bug in all OpenSearch versions

Additional context

None.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions