Connecting Wazuh to Quickwit: A Tale of Hubris, Header Files, and Claude-Driven Development

Baptiste Leterrier

23 Dec 2025 • 18 min read

Why Would Anyone Do This to Themselves?

Picture this: You have Wazuh, a perfectly functional security monitoring platform that uses Wazuh Indexer (Née Opensearch - forked from Elasticsearch) as its backend. It works. People use it. Life is good.

But then you discover Quickwit—a search engine that's like if Rust programmers decided OpenSearch was too bloated and said "hold my artisanal, locally-sourced, memory-safe beer."

The Tale of Two Indexers

OpenSearch is great and well-known. It's got:

✅ Battle-tested in production
✅ Every feature you could possibly want
✅ Documentation so extensive you could print it and use it as a door
❌ Resource consumption that makes Chrome look lightweight
❌ JVM heap settings that require a PhD in numerology
❌ Shards

Quickwit is the new hotness:

✅ Written in Rust
✅ Actually designed for log data from the ground up
✅ Object storage native (S3 go brrr)
✅ Resource consumption with a small footprint
❌ Documentation that assumes you already know what you're doing
❌ API compatibility that's more "inspired by" than "compatible with"
❌ The exciting feeling of being an early adopter (read: guinea pig)

So naturally, I decided to make them talk to each other.

Act I: The Proxy of False Hope

My first brilliant idea: "I'll just write a proxy that translates OpenSearch requests to Quickwit! How hard could it be?"

"it was in fact, very hard" photo by By Georges Biard, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=72580707

The beauty of this approach was its simplicity. The horror was... everything else.

# Actual conversation between the systems:
Wazuh: "I need to bulk index these 10,000 documents with nested fields and custom routing"
Proxy: "Quickwit says... uh... 'yes'?"
Quickwit: "What's a routing?"

Act II: Going Deep (Into the C++ Abyss)

After the proxy approach proved about as stable as a house of cards in a hurricane, I decided to do what any reasonable person would do: dive into Wazuh's C++ codebase and implement native Quickwit support.

The HPP Files Incident

Coming from Python, where everything is a dictionary and types are more of a suggestion than a rule, C++ header files were... an experience.

// What I expected:
import quickwit

// What I got:
#include "indexer_connector/include/serverSelector.hpp"
#include "shared_modules/utils/monitoring.hpp"
#include "why_are_there_so_many_headers.hpp"
#include "seriously_another_one.hpp"
#include "help.hpp"

The revelation that .hpp files are just headers pretending to be hip and modern (as opposed to .h files which are headers that have given up on life) was just the beginning.

Act III: Claude Ex Machina

This is where our story takes a turn. Enter Claude, my AI pair programmer, who approached this codebase with confidence.

Looking at the commit messages, you can actually track my descent into madness:

The Early Optimism Phase

"Add Quickwit indexer integration to Wazuh 4.14.1"

The Reality Sets In

"Fix incomplete ServerSelector type error and conflicting declaration"

The AsyncDispatcher Arc

"Fix AsyncDispatcher copy-constructibility and overload collision to support move-only types"

This is where Claude really shined, implementing move semantics with the casual air of someone who definitely understands what an rvalue reference is and isn't just pattern-matching from Stack Overflow answers.

The Grand Finale

"Add automatic Quickwit index creation for wazuh-states indexes"

44 commits later, we finally got to actually creating Quickwit indexes! I like this dynamic index creation approach (looks like the proxy solution) but this is a PoC and in the end, index should be correctly formed at the initialization of the Wazuh process, like with Wazuh Indexer.

1. Architectural Overview

1.1 Integration Strategy

I chose a unified codebase approach rather than creating separate connectors for each indexer type. This architectural decision provides several key advantages:

Single maintenance surface: One codebase to maintain, test, and deploy
Reduced code duplication: Shared logic for common operations (connection management, error handling, retry mechanisms)
Simplified deployment: No need for operators to manage multiple connector versions
Seamless migration path: Organizations can switch between indexers through configuration changes alone
Consistent behavior: Unified error handling and operational characteristics across both backends

The implementation uses runtime type detection based on configuration, allowing the same binary to adapt its behavior dynamically.

1.2 Technical Foundation: NDJSON vs. Bulk API

A fundamental difference between Quickwit and OpenSearch lies in their data ingestion formats:

OpenSearch Bulk API Format

POST /_bulk
{ "delete": { "_index": "movies", "_id": "tt2229499" } }
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }
{ "create": { "_index": "movies", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
{ "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }

Quickwit NDJSON Format

{"timestamp": "2024-01-01", "message": "log entry"}
{"timestamp": "2024-01-02", "message": "another entry"}

Pure document-only format without action lines, optimized for append-only log data ingestion.

This architectural difference reflects Quickwit's design philosophy: optimized for high-throughput log ingestion with immutable data patterns, whereas OpenSearch maintains flexibility for various document operations (create, update, delete).

2. Implementation Evolution

Phase 1: Core Connectivity

Indexer Type Detection

This enables conditional behavior without code duplication, using the same connection management infrastructure for both backends.

There is a new parameter in the ossec.conf where you set the type to "quickwit". Simple, elegant.

The connector also dynamically selects appropriate endpoints based on the detected indexer type.

Phase 2: Data Format Adaptation

Builder Functions

New specialized builder functions handle the format differences:

The builderQuickwitDelete() function was implemented as a no-op, acknowledging that Quickwit handles data deletion through retention policies rather than explicit delete operations—a design choice aligned with immutable log data patterns.

Other than that, schema evolution was needed

Phase 3: Schema Validation & Data Normalization

Quickwit's stricter schema validation requirements necessitated defensive data transformations:

Timestamp Injection

if (!hasTimestamp)
        {
            nlohmann::json timestampField;
            timestampField["name"] = "timestamp";
            timestampField["type"] = "datetime";
            timestampField["input_formats"] = nlohmann::json::array({"rfc3339", "unix_timestamp"});
            timestampField["fast"] = true;
            timestampField["indexed"] = true;
            fieldMappings.insert(fieldMappings.begin(), timestampField);
        }

Array Field Normalization

Quickwit schemas often expect single-valued fields where Wazuh produces arrays. The solution: serialize arrays as JSON strings to maintain data integrity while satisfying schema requirements:

try
    {
        // Parse the document
        auto doc = nlohmann::json::parse(data);
        // Add timestamp field if it doesn't exist
        if (!doc.contains("timestamp") && !doc.contains("@timestamp"))
        {
            // Get current time in RFC3339 format
            auto now = std::chrono::system_clock::now();
            auto time_t_now = std::chrono::system_clock::to_time_t(now);
            auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()) % 1000;
            std::stringstream ss;
            ss << std::put_time(std::gmtime(&time_t_now), "%Y-%m-%dT%H:%M:%S");
            ss << '.' << std::setfill('0') << std::setw(3) << ms.count() << 'Z';
            doc["timestamp"] = ss.str();
        }
        // Fix process.args if it's an array - convert to JSON string
        if (doc.contains("process") && doc["process"].is_object())
        {
            auto& process = doc["process"];
            if (process.contains("args") && process["args"].is_array())
            {
                // Convert array to JSON string representation
                process["args"] = process["args"].dump();
            }
        }
        // Serialize and append
        bulkData.append(doc.dump());
        bulkData.append("\n");
    }
    catch (const nlohmann::json::exception& e)
    {
        // If JSON parsing fails, append the original data
        logWarn(IC_NAME, "Failed to parse document for Quickwit index '%s': %s",
                std::string(index).c_str(), e.what());
        bulkData.append(data);
        bulkData.append("\n");
    }
This approach preserves data fidelity while working within Quickwit's constraints. But we need correct doc mappings. Fortunately, doc mapping can be dynamically updated in Quickwit.
{
  "field_mappings": [
    {
      "name": "agent",
      "type": "object",
      "field_mappings": [
        {
          "fast": {
            "normalizer": "raw"
          },
          "fieldnorms": false,
          "indexed": true,
          "name": "id",
          "record": "basic",
          "stored": true,
          "tokenizer": "raw",
          "type": "text"
        },
        {
          "fast": {
            "normalizer": "raw"
          },
          "fieldnorms": false,
          "indexed": true,
          "name": "name",
          "record": "basic",
          "stored": true,
          "tokenizer": "raw",
          "type": "text"
        },
        {
          "fast": {
            "normalizer": "raw"
          },
          "fieldnorms": false,
          "indexed": true,
          "name": "version",
          "record": "basic",
          "stored": true,
          "tokenizer": "raw",
          "type": "text"
        }
      ]
    },
    {
      "name": "group",
      "type": "object",
      "field_mappings": [
        {
          "coerce": true,
          "fast": true,
          "indexed": true,
          "name": "id",
          "output_format": "number",
          "stored": true,
          "type": "i64"
        },
        {
          "coerce": true,
          "fast": true,
          "indexed": true,
          "name": "id_signed",
          "output_format": "number",
          "stored": true,
          "type": "i64"
        },
        {
          "fast": false,
          "indexed": true,
          "name": "is_hidden",
          "stored": true,
          "type": "bool"
        },
        {
          "fast": {
            "normalizer": "raw"
          },
          "fieldnorms": false,
          "indexed": true,
          "name": "name",
          "record": "basic",
          "stored": true,
          "tokenizer": "raw",
          "type": "text"
        },
        {
          "fast": {
            "normalizer": "raw"
          },
          "fieldnorms": false,
          "indexed": true,
          "name": "users",
          "record": "basic",
          "stored": true,
          "tokenizer": "raw",
          "type": "text"
        }
      ]
    },
    {
      "name": "timestamp",
      "type": "datetime",
      "fast": true,
      "fast_precision": "seconds",
      "indexed": true,
      "input_formats": [
        "rfc3339",
        "unix_timestamp"
      ],
      "output_format": "rfc3339",
      "stored": true
    },
    {
      "name": "wazuh",
      "type": "object",
      "field_mappings": [
        {
          "field_mappings": [
            {
              "fast": {
                "normalizer": "raw"
              },
              "fieldnorms": false,
              "indexed": true,
              "name": "name",
              "record": "basic",
              "stored": true,
              "tokenizer": "raw",
              "type": "text"
            }
          ],
          "name": "cluster",
          "type": "object"
        },
        {
          "field_mappings": [
            {
              "fast": {
                "normalizer": "raw"
              },
              "fieldnorms": false,
              "indexed": true,
              "name": "version",
              "record": "basic",
              "stored": true,
              "tokenizer": "raw",
              "type": "text"
            }
          ],
          "name": "schema",
          "type": "object"
        }
      ]
    }
  ],
  "tag_fields": [],
  "store_source": false,
  "index_field_presence": false,
  "timestamp_field": null,
  "mode": "dynamic",
  "dynamic_mapping": {
    "indexed": true,
    "tokenizer": "raw",
    "record": "basic",
    "stored": true,
    "expand_dots": true,
    "fast": {
      "normalizer": "raw"
    }
  },
  "max_num_partitions": 200,
  "tokenizers": []
}

Phase 4: Automatic Recovery & Failsafe Mechanisms

Dynamic Index Creation

A sophisticated failsafe mechanism automatically creates missing indexes:

void createQuickwitIndexDynamic(const std::string& indexName, 
                                const Document& sampleDoc) {
    QuickwitSchema schema;
    
    // Infer field types from sample document
    for (const auto& [field, value] : sampleDoc) {
        FieldType type = inferType(value);
        
        if (isNestedObject(value)) {
            // Recursive field mapping for nested structures
            schema.addFieldMapping(buildFieldMapping(field, value));
        } else {
            schema.addField(field, type);
        }
    }
    
    // Generate Quickwit index configuration
    auto indexConfig = schema.toQuickwitConfig();
    
    // POST to /api/v1/indexes
    httpClient.post("/api/v1/indexes", indexConfig);
}

First, we parse sampleData into a nlohmann::json object. Then we build a field_mappings array by iterating sample document keys and inferring a Quickwit field type for each value using an inferType lambda:

strings -> "text" by default, but:
strings containing 'T' and 'Z' → "datetime"
strings with three dots '.' → "ip"
integer numbers -> "i64"
floating numbers -> "f64"
booleans -> "bool"
objects -> "object"
default fallback -> "text"

We add quickwit flags: indexed = true; fast = true for text/i64/f64/ip/datetime; tokenizer = "raw" for text;

We ensure a timestamp field exists: if not present, it inserts a "timestamp" field mapping of type "datetime" (with input_formats and fast/indexed flags) at the beginning.

Finally, we build the Quickwit index configuration JSON (version, index_id, doc_mapping.field_mappings, mode "dynamic", some indexing/search settings, and timestamp_field if added) and we POST it to the Quickwit API.

The "Dynamic" Elephant in the Room

One of the harshest wake-up calls in this project was realizing just how spoiled we are by OpenSearch’s dynamic mapping.

OpenSearch is the golden retriever of databases: You throw a JSON object at it, and it enthusiastically catches it. New field? "I'll map it!" Weird data type? "I'll guess!" It doesn't always guess right (mapping a version number 1.20 as a float instead of a string is a classic classic headache), but it rarely rejects the food you give it.

Quickwit is a German librarian: It has rules. It likes order. While Quickwit does support a "dynamic" mode (which we are using), it is far less forgiving about structural ambiguity, especially when you want performance.

The Array Problem: In Wazuh, a field like process.args might be a single string in one log and an array of strings in the next. OpenSearch shrugs. Quickwit’s dynamic mode can handle this, but if you want to query it efficiently as a native column, you have to be consistent.

The Type Mismatch: If you defined a field as an integer yesterday, and a rogue agent sends a string today, OpenSearch might try to coerce it or just drop the field. Quickwit is more likely to look at you with disappointment.

This is why my C++ code has to do so much "defensive normalization" (serializing arrays to strings, checking timestamps). We are essentially building a compliance layer between Wazuh’s chaotic "send whatever you find" approach and Quickwit’s disciplined storage engine.

3. Engineering Decisions & Trade-offs

You can see that all implementations here are quick(wit) and not very clean. I'm a CISO, not a developer, but I'm good at finding pros and cons and suggesting mitigation.

Data Transformation Philosophy

Defensive transformation vs strict validation:

Pro: Maximizes data ingestion success rate and provides automatic recovery from common issues
Con: May obscure original data shapes (arrays become strings)
Mitigation: Comprehensive logging of transformations for audit trails

Error Handling Strategy

Prioritizes data preservation:

Pro: Attempt automatic recovery (index creation, data normalization)
Con: Original data type is not preserved, and new types can be uncovered by our dynamic index generation.
Mitigation: Never silently drop documents

4. Performance Considerations

Batch Processing

The NDJSON format enables efficient batch processing:

Reduced HTTP overhead (single request for multiple documents)
Optimized network utilization
Configurable batch sizes based on deployment characteristics

System impact

The datastore system can be linked to simpler S3 based storage, it is more durable, less complex, and more flexible.

As Quickwit nodes do not store data, a loss of a node is not a loss of data which, in the context of critical security logs, is more than wanted.

But I need to be absolutely transparent here: I have not stress-tested this.

In my lab, with a unique agent, this integration hums along beautifully. The C++ binary barely registers on CPU usage, and memory consumption is a flat line. But I have no idea what happens when you point 10,000 agents generating 50,000 EPS at it.

The Bottleneck Unknown: Will the C++ curl implementation choke on thousands of concurrent connections? I don't know.
The Batch Size Gamble: Is my hardcoded batch size of 5MB optimal for high throughput? Probably not.
The "Thundering Herd": What happens when Quickwit momentarily slows down to merge splits? Does the Wazuh buffer fill up and crash the service?

OpenSearch’s scaling characteristics are battle-scarred and well-documented (usually involving adding more RAM until the problem goes away). This Quickwit connector is currently a "functional prototype." If you deploy this into a massive production environment without testing, you are braver than I am.

5. Overall conclusion: The Morning After

So, did we kill OpenSearch? No. Did we create a monster? Maybe just a little one.

After diving into the C++ abyss and emerging with a binary that actually compiles, here is the sober reality of making Wazuh talk to Quickwit.

Where is your Dashboard?

Let’s address the elephant in the server room: There is no GUI (hence the lack of images to illustrate what I did). The Wazuh Dashboard (a fork of Kibana/OpenSearch Dashboards) is tightly coupled to the OpenSearch API. It expects specific aggregations, mappings, and behaviors that Quickwit simply doesn't emulate yet.

Theory vs. Reality

This project proved that just because an architectural idea looks beautiful on a whiteboard (or in a chat with Claude), it doesn't mean it's practical for immediate production use.

The True Use Case: The Immutable Vault

However, this wasn't just an exercise. While Quickwit isn't a drop-in replacement for the "Hot" data layer of Wazuh yet, it shines as a WORM (Write-Once-Read-Many) solution.

In the security world, compliance is king and evidence is sacred. OpenSearch creates mutable indices—great for updating documents, terrible for proving chain of custody. Quickwit, with its append-only nature and S3-native storage, is the perfect candidate for:

Compliance Archival: Storing 5 years of logs on S3 for pennies, legally immutable.
Evidence Preservation: Imagine a feature in Wazuh where you finish a Threat Hunting exercise and click "Archive to Evidence." instead of leaving those logs in a mutable OpenSearch index (that rotates out in 30 days), the system migrates that specific slice of time to a Quickwit index. It becomes a frozen, searchable artifact for future correlation or legal discovery.

Final Thoughts

This journey highlights the absolute beauty of Open Source. If Wazuh were a proprietary black box, this blog post would have been a feature request ticket sitting in a vendor's queue labeled "WontFix" until the heat death of the universe.

Because it's open source, I was able to rip open the hood, misuse C++ headers, rely too heavily on AI, and actually make the thing work in my own lab. And that, despite the headaches, is why we do this to ourselves.

The repo link here: https://github.com/Baptiste-Leterrier/wazuh-quickwit

Happy Christmas ! 🎄