Connecting Wazuh to Quickwit: A Tale of Hubris, Header Files, and Claude-Driven Development
Why Would Anyone Do This to Themselves?
Picture this: You have Wazuh, a perfectly functional security monitoring platform that uses Wazuh Indexer (Née Opensearch - forked from Elasticsearch) as its backend. It works. People use it. Life is good.
But then you discover Quickwit—a search engine that's like if Rust programmers decided OpenSearch was too bloated and said "hold my artisanal, locally-sourced, memory-safe beer."
The Tale of Two Indexers
OpenSearch is great and well-known. It's got:
- ✅ Battle-tested in production
- ✅ Every feature you could possibly want
- ✅ Documentation so extensive you could print it and use it as a door
- ❌ Resource consumption that makes Chrome look lightweight
- ❌ JVM heap settings that require a PhD in numerology
- ❌ Shards
Quickwit is the new hotness:
- ✅ Written in Rust
- ✅ Actually designed for log data from the ground up
- ✅ Object storage native (S3 go brrr)
- ✅ Resource consumption with a small footprint
- ❌ Documentation that assumes you already know what you're doing
- ❌ API compatibility that's more "inspired by" than "compatible with"
- ❌ The exciting feeling of being an early adopter (read: guinea pig)
So naturally, I decided to make them talk to each other.
Act I: The Proxy of False Hope
My first brilliant idea: "I'll just write a proxy that translates OpenSearch requests to Quickwit! How hard could it be?"

The beauty of this approach was its simplicity. The horror was... everything else.
# Actual conversation between the systems:
Wazuh: "I need to bulk index these 10,000 documents with nested fields and custom routing"
Proxy: "Quickwit says... uh... 'yes'?"
Quickwit: "What's a routing?"Act II: Going Deep (Into the C++ Abyss)
After the proxy approach proved about as stable as a house of cards in a hurricane, I decided to do what any reasonable person would do: dive into Wazuh's C++ codebase and implement native Quickwit support.
The HPP Files Incident
Coming from Python, where everything is a dictionary and types are more of a suggestion than a rule, C++ header files were... an experience.
// What I expected:
import quickwit
// What I got:
#include "indexer_connector/include/serverSelector.hpp"
#include "shared_modules/utils/monitoring.hpp"
#include "why_are_there_so_many_headers.hpp"
#include "seriously_another_one.hpp"
#include "help.hpp"The revelation that .hpp files are just headers pretending to be hip and modern (as opposed to .h files which are headers that have given up on life) was just the beginning.
Act III: Claude Ex Machina
This is where our story takes a turn. Enter Claude, my AI pair programmer, who approached this codebase with confidence.
Looking at the commit messages, you can actually track my descent into madness:
The Early Optimism Phase
"Add Quickwit indexer integration to Wazuh 4.14.1"
The Reality Sets In
"Fix incomplete ServerSelector type error and conflicting declaration"
The AsyncDispatcher Arc
"Fix AsyncDispatcher copy-constructibility and overload collision to support move-only types"
This is where Claude really shined, implementing move semantics with the casual air of someone who definitely understands what an rvalue reference is and isn't just pattern-matching from Stack Overflow answers.
The Grand Finale
"Add automatic Quickwit index creation for wazuh-states indexes"
44 commits later, we finally got to actually creating Quickwit indexes! I like this dynamic index creation approach (looks like the proxy solution) but this is a PoC and in the end, index should be correctly formed at the initialization of the Wazuh process, like with Wazuh Indexer.
1. Architectural Overview
1.1 Integration Strategy
I chose a unified codebase approach rather than creating separate connectors for each indexer type. This architectural decision provides several key advantages:
- Single maintenance surface: One codebase to maintain, test, and deploy
- Reduced code duplication: Shared logic for common operations (connection management, error handling, retry mechanisms)
- Simplified deployment: No need for operators to manage multiple connector versions
- Seamless migration path: Organizations can switch between indexers through configuration changes alone
- Consistent behavior: Unified error handling and operational characteristics across both backends
The implementation uses runtime type detection based on configuration, allowing the same binary to adapt its behavior dynamically.
1.2 Technical Foundation: NDJSON vs. Bulk API
A fundamental difference between Quickwit and OpenSearch lies in their data ingestion formats:
OpenSearch Bulk API Format
POST /_bulk
{ "delete": { "_index": "movies", "_id": "tt2229499" } }
{ "index": { "_index": "movies", "_id": "tt1979320" } }
{ "title": "Rush", "year": 2013 }
{ "create": { "_index": "movies", "_id": "tt1392214" } }
{ "title": "Prisoners", "year": 2013 }
{ "update": { "_index": "movies", "_id": "tt0816711" } }
{ "doc" : { "title": "World War Z" } }Quickwit NDJSON Format
{"timestamp": "2024-01-01", "message": "log entry"}
{"timestamp": "2024-01-02", "message": "another entry"}Pure document-only format without action lines, optimized for append-only log data ingestion.
This architectural difference reflects Quickwit's design philosophy: optimized for high-throughput log ingestion with immutable data patterns, whereas OpenSearch maintains flexibility for various document operations (create, update, delete).
2. Implementation Evolution
Phase 1: Core Connectivity
Indexer Type Detection
This enables conditional behavior without code duplication, using the same connection management infrastructure for both backends.
There is a new parameter in the ossec.conf where you set the type to "quickwit". Simple, elegant.
The connector also dynamically selects appropriate endpoints based on the detected indexer type.
Phase 2: Data Format Adaptation
Builder Functions
New specialized builder functions handle the format differences:
The builderQuickwitDelete() function was implemented as a no-op, acknowledging that Quickwit handles data deletion through retention policies rather than explicit delete operations—a design choice aligned with immutable log data patterns.
Other than that, schema evolution was needed
Phase 3: Schema Validation & Data Normalization
Quickwit's stricter schema validation requirements necessitated defensive data transformations:
Timestamp Injection
if (!hasTimestamp)
{
nlohmann::json timestampField;
timestampField["name"] = "timestamp";
timestampField["type"] = "datetime";
timestampField["input_formats"] = nlohmann::json::array({"rfc3339", "unix_timestamp"});
timestampField["fast"] = true;
timestampField["indexed"] = true;
fieldMappings.insert(fieldMappings.begin(), timestampField);
}Array Field Normalization
Quickwit schemas often expect single-valued fields where Wazuh produces arrays. The solution: serialize arrays as JSON strings to maintain data integrity while satisfying schema requirements:
try
{
// Parse the document
auto doc = nlohmann::json::parse(data);
// Add timestamp field if it doesn't exist
if (!doc.contains("timestamp") && !doc.contains("@timestamp"))
{
// Get current time in RFC3339 format
auto now = std::chrono::system_clock::now();
auto time_t_now = std::chrono::system_clock::to_time_t(now);
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()) % 1000;
std::stringstream ss;
ss << std::put_time(std::gmtime(&time_t_now), "%Y-%m-%dT%H:%M:%S");
ss << '.' << std::setfill('0') << std::setw(3) << ms.count() << 'Z';
doc["timestamp"] = ss.str();
}
// Fix process.args if it's an array - convert to JSON string
if (doc.contains("process") && doc["process"].is_object())
{
auto& process = doc["process"];
if (process.contains("args") && process["args"].is_array())
{
// Convert array to JSON string representation
process["args"] = process["args"].dump();
}
}
// Serialize and append
bulkData.append(doc.dump());
bulkData.append("\n");
}
catch (const nlohmann::json::exception& e)
{
// If JSON parsing fails, append the original data
logWarn(IC_NAME, "Failed to parse document for Quickwit index '%s': %s",
std::string(index).c_str(), e.what());
bulkData.append(data);
bulkData.append("\n");
}
This approach preserves data fidelity while working within Quickwit's constraints. But we need correct doc mappings. Fortunately, doc mapping can be dynamically updated in Quickwit.
{
"field_mappings": [
{
"name": "agent",
"type": "object",
"field_mappings": [
{
"fast": {
"normalizer": "raw"
},
"fieldnorms": false,
"indexed": true,
"name": "id",
"record": "basic",
"stored": true,
"tokenizer": "raw",
"type": "text"
},
{
"fast": {
"normalizer": "raw"
},
"fieldnorms": false,
"indexed": true,
"name": "name",
"record": "basic",
"stored": true,
"tokenizer": "raw",
"type": "text"
},
{
"fast": {
"normalizer": "raw"
},
"fieldnorms": false,
"indexed": true,
"name": "version",
"record": "basic",
"stored": true,
"tokenizer": "raw",
"type": "text"
}
]
},
{
"name": "group",
"type": "object",
"field_mappings": [
{
"coerce": true,
"fast": true,
"indexed": true,
"name": "id",
"output_format": "number",
"stored": true,
"type": "i64"
},
{
"coerce": true,
"fast": true,
"indexed": true,
"name": "id_signed",
"output_format": "number",
"stored": true,
"type": "i64"
},
{
"fast": false,
"indexed": true,
"name": "is_hidden",
"stored": true,
"type": "bool"
},
{
"fast": {
"normalizer": "raw"
},
"fieldnorms": false,
"indexed": true,
"name": "name",
"record": "basic",
"stored": true,
"tokenizer": "raw",
"type": "text"
},
{
"fast": {
"normalizer": "raw"
},
"fieldnorms": false,
"indexed": true,
"name": "users",
"record": "basic",
"stored": true,
"tokenizer": "raw",
"type": "text"
}
]
},
{
"name": "timestamp",
"type": "datetime",
"fast": true,
"fast_precision": "seconds",
"indexed": true,
"input_formats": [
"rfc3339",
"unix_timestamp"
],
"output_format": "rfc3339",
"stored": true
},
{
"name": "wazuh",
"type": "object",
"field_mappings": [
{
"field_mappings": [
{
"fast": {
"normalizer": "raw"
},
"fieldnorms": false,
"indexed": true,
"name": "name",
"record": "basic",
"stored": true,
"tokenizer": "raw",
"type": "text"
}
],
"name": "cluster",
"type": "object"
},
{
"field_mappings": [
{
"fast": {
"normalizer": "raw"
},
"fieldnorms": false,
"indexed": true,
"name": "version",
"record": "basic",
"stored": true,
"tokenizer": "raw",
"type": "text"
}
],
"name": "schema",
"type": "object"
}
]
}
],
"tag_fields": [],
"store_source": false,
"index_field_presence": false,
"timestamp_field": null,
"mode": "dynamic",
"dynamic_mapping": {
"indexed": true,
"tokenizer": "raw",
"record": "basic",
"stored": true,
"expand_dots": true,
"fast": {
"normalizer": "raw"
}
},
"max_num_partitions": 200,
"tokenizers": []
}Phase 4: Automatic Recovery & Failsafe Mechanisms
Dynamic Index Creation
A sophisticated failsafe mechanism automatically creates missing indexes:
void createQuickwitIndexDynamic(const std::string& indexName,
const Document& sampleDoc) {
QuickwitSchema schema;
// Infer field types from sample document
for (const auto& [field, value] : sampleDoc) {
FieldType type = inferType(value);
if (isNestedObject(value)) {
// Recursive field mapping for nested structures
schema.addFieldMapping(buildFieldMapping(field, value));
} else {
schema.addField(field, type);
}
}
// Generate Quickwit index configuration
auto indexConfig = schema.toQuickwitConfig();
// POST to /api/v1/indexes
httpClient.post("/api/v1/indexes", indexConfig);
}First, we parse sampleData into a nlohmann::json object. Then we build a field_mappings array by iterating sample document keys and inferring a Quickwit field type for each value using an inferType lambda:
- strings -> "text" by default, but:
- strings containing 'T' and 'Z' → "datetime"
- strings with three dots '.' → "ip"
- integer numbers -> "i64"
- floating numbers -> "f64"
- booleans -> "bool"
- objects -> "object"
- default fallback -> "text"
We add quickwit flags: indexed = true; fast = true for text/i64/f64/ip/datetime; tokenizer = "raw" for text;
We ensure a timestamp field exists: if not present, it inserts a "timestamp" field mapping of type "datetime" (with input_formats and fast/indexed flags) at the beginning.
Finally, we build the Quickwit index configuration JSON (version, index_id, doc_mapping.field_mappings, mode "dynamic", some indexing/search settings, and timestamp_field if added) and we POST it to the Quickwit API.
The "Dynamic" Elephant in the Room
One of the harshest wake-up calls in this project was realizing just how spoiled we are by OpenSearch’s dynamic mapping.
OpenSearch is the golden retriever of databases: You throw a JSON object at it, and it enthusiastically catches it. New field? "I'll map it!" Weird data type? "I'll guess!" It doesn't always guess right (mapping a version number 1.20 as a float instead of a string is a classic classic headache), but it rarely rejects the food you give it.
Quickwit is a German librarian: It has rules. It likes order. While Quickwit does support a "dynamic" mode (which we are using), it is far less forgiving about structural ambiguity, especially when you want performance.
The Array Problem: In Wazuh, a field like process.args might be a single string in one log and an array of strings in the next. OpenSearch shrugs. Quickwit’s dynamic mode can handle this, but if you want to query it efficiently as a native column, you have to be consistent.
The Type Mismatch: If you defined a field as an integer yesterday, and a rogue agent sends a string today, OpenSearch might try to coerce it or just drop the field. Quickwit is more likely to look at you with disappointment.
This is why my C++ code has to do so much "defensive normalization" (serializing arrays to strings, checking timestamps). We are essentially building a compliance layer between Wazuh’s chaotic "send whatever you find" approach and Quickwit’s disciplined storage engine.
3. Engineering Decisions & Trade-offs
You can see that all implementations here are quick(wit) and not very clean. I'm a CISO, not a developer, but I'm good at finding pros and cons and suggesting mitigation.
Data Transformation Philosophy
Defensive transformation vs strict validation:
- Pro: Maximizes data ingestion success rate and provides automatic recovery from common issues
- Con: May obscure original data shapes (arrays become strings)
- Mitigation: Comprehensive logging of transformations for audit trails
Error Handling Strategy
Prioritizes data preservation:
- Pro: Attempt automatic recovery (index creation, data normalization)
- Con: Original data type is not preserved, and new types can be uncovered by our dynamic index generation.
- Mitigation: Never silently drop documents
4. Performance Considerations
Batch Processing
The NDJSON format enables efficient batch processing:
- Reduced HTTP overhead (single request for multiple documents)
- Optimized network utilization
- Configurable batch sizes based on deployment characteristics
System impact
The datastore system can be linked to simpler S3 based storage, it is more durable, less complex, and more flexible.
As Quickwit nodes do not store data, a loss of a node is not a loss of data which, in the context of critical security logs, is more than wanted.
But I need to be absolutely transparent here: I have not stress-tested this.
In my lab, with a unique agent, this integration hums along beautifully. The C++ binary barely registers on CPU usage, and memory consumption is a flat line. But I have no idea what happens when you point 10,000 agents generating 50,000 EPS at it.
- The Bottleneck Unknown: Will the C++ curl implementation choke on thousands of concurrent connections? I don't know.
- The Batch Size Gamble: Is my hardcoded batch size of 5MB optimal for high throughput? Probably not.
- The "Thundering Herd": What happens when Quickwit momentarily slows down to merge splits? Does the Wazuh buffer fill up and crash the service?
OpenSearch’s scaling characteristics are battle-scarred and well-documented (usually involving adding more RAM until the problem goes away). This Quickwit connector is currently a "functional prototype." If you deploy this into a massive production environment without testing, you are braver than I am.
5. Overall conclusion: The Morning After
So, did we kill OpenSearch? No. Did we create a monster? Maybe just a little one.
After diving into the C++ abyss and emerging with a binary that actually compiles, here is the sober reality of making Wazuh talk to Quickwit.
Where is your Dashboard?
Let’s address the elephant in the server room: There is no GUI (hence the lack of images to illustrate what I did). The Wazuh Dashboard (a fork of Kibana/OpenSearch Dashboards) is tightly coupled to the OpenSearch API. It expects specific aggregations, mappings, and behaviors that Quickwit simply doesn't emulate yet.
Theory vs. Reality
This project proved that just because an architectural idea looks beautiful on a whiteboard (or in a chat with Claude), it doesn't mean it's practical for immediate production use.
The True Use Case: The Immutable Vault
However, this wasn't just an exercise. While Quickwit isn't a drop-in replacement for the "Hot" data layer of Wazuh yet, it shines as a WORM (Write-Once-Read-Many) solution.
In the security world, compliance is king and evidence is sacred. OpenSearch creates mutable indices—great for updating documents, terrible for proving chain of custody. Quickwit, with its append-only nature and S3-native storage, is the perfect candidate for:
- Compliance Archival: Storing 5 years of logs on S3 for pennies, legally immutable.
- Evidence Preservation: Imagine a feature in Wazuh where you finish a Threat Hunting exercise and click "Archive to Evidence." instead of leaving those logs in a mutable OpenSearch index (that rotates out in 30 days), the system migrates that specific slice of time to a Quickwit index. It becomes a frozen, searchable artifact for future correlation or legal discovery.
Final Thoughts
This journey highlights the absolute beauty of Open Source. If Wazuh were a proprietary black box, this blog post would have been a feature request ticket sitting in a vendor's queue labeled "WontFix" until the heat death of the universe.
Because it's open source, I was able to rip open the hood, misuse C++ headers, rely too heavily on AI, and actually make the thing work in my own lab. And that, despite the headaches, is why we do this to ourselves.
The repo link here: https://github.com/Baptiste-Leterrier/wazuh-quickwit
Happy Christmas ! 🎄
