Analyze your {A|N}LB logs with ClickHouse
Written by: Andrei Tserakhau, DoubleCloud Tech Lead
Enabling the power of ClickHouse to consolidate logs offers a game-changing approach. It’s cost-effective, lightning-fast, and effortlessly scales to accommodate your needs.
AWS stores Load Balancer Logs in S3, employing a queue system for a seamless CDC (Change Data Capture) style delivery. This synergy between S3's queuing mechanism and ClickHouse lays a robust foundation for efficient log analysis. And DoubleCloud makes this even easier with the power of Transfer service.
With Terraform at your disposal, setting up log aggregation becomes a straightforward process.
Lay the groundwork
So, how do you build the essential components for a scalable and fast observability stack, dedicated to dissecting and understanding AWS NLB or ALB access logs?
To set the stage, envision a baseline scenario — a typical application setup incorporating an AWS Load Balancer. As a fundamental part of AWS services, Load Balancers funnel logs into an S3 bucket. Now, let’s take a guided tour through the step-by-step process, unveiling how these essential load-balancer logs seamlessly become accessible for real-time analysis in ClickHouse, all orchestrated through Terraform.
But before we venture into the Terraform magic, it’s essential to picture the existing infrastructure and the pivotal Terraform code that configures the seamless interplay between S3 and SQS for the ALB.
S3 bucket where the logs are stored by AWS
At the outset, our priority is setting up log storage for your ALB, a crucial step that involves binding an S3 bucket to your existing ALB.
Let’s kick off with the foundational element: the creation of an S3 Bucket.
Click to view an example on Github
resource "aws_s3_bucket" "nlb_logs" {
bucket = var.bucket_name
}
This snippet lays the groundwork by creating an S3 bucket dedicated to storing ALB logs, thereby serving as a repository for AWS ALB logs.
resource "aws_lb" "alb" {
/* your config
*/
dynamic "access_logs" {
for_each = var.access_logs_bucket != null ? { enabled = true } : {}
content {
enabled = true
bucket = var.bucket_name
prefix = var.access_logs_bucket_prefix
}
}
}
Now, let’s move forward with setting up the necessary configuration for an SQS queue to complement this S3 bucket.
Click to view an example on Github
resource "aws_sqs_queue" "nlb_logs_queue" {
name = var.sqs_name
policy = <<POLICY
{
"Version": "2012-10-17",
"Id": "sqspolicy",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "sqs:SendMessage",
"Resource": "arn:aws:sqs:*:*:${var.sqs_name}",
"Condition": {
"ArnEquals": { "aws:SourceArn": "${aws_s3_bucket.nlb_logs.arn}" }
}
}
]
}
POLICY
}
This code initiates the creation of an SQS queue, facilitating the seamless delivery of ALB logs to the designated S3 bucket.
As logs are delivered, they are automatically organized within a dedicated folder:
Regularly generated new log files demand a streamlined approach for notification and processing. To establish a seamless notification channel, we’ll configure an optimal push notification system via SQS. Referencing the guidelines outlined in Amazon S3's notification configuration documentation, our next step involves the creation of an SQS queue. This queue will serve as the conduit for receiving timely notifications, ensuring prompt handling and processing of newly generated log files within our S3 bucket.
This linkage is solidified through the creation of the SQS queue:
Click to view an example on Github
resource "aws_s3_bucket_notification" "nlb_logs_bucket_notification" {
bucket = aws_s3_bucket.nlb_logs.id
queue {
queue_arn = aws_sqs_queue.nlb_logs_queue.arn
events = ["s3:ObjectCreated:*"]
}
}
All the configurations implemented so far serve as the foundational infrastructure for streamlined log storage. From setting up the S3 bucket to configuring the SQS queue and establishing the intricate linkage between them, this orchestrated setup forms the backbone for efficient log management and processing:
The illustration above showcases the composed architecture, where the S3 bucket, SQS queue, and their interconnection stand as pivotal components for storing and managing logs effectively within the AWS environment.
Logs are now in your S3 bucket, but reading them may be challenging. Lets take a look at a data sample:
tls 2.0 2024-01-02T23:58:58 net/preprod-public-api-dt-tls/9f8794be28ab2534 4d9af2ddde90eb82 84.247.112.144:33342 10.0.223.207:443 244 121 0 15 - arn:aws:acm:eu-central-1:840525340941:certificate/5240a1e4-c7fe-44c1-9d89-c256213c5d23 - ECDHE-RSA-AES128-GCM-SHA256 tlsv12 - 18.193.17.109 - - "%ef%b5%bd%8" 2024-01-02T23:58:58
The snippet above represents a sample of the log data residing within the S3 bucket. Understanding data’s format and content will help us to build an efficient strategy to parse and store it.
Let’s move this data to DoubleCloud’s Managed ClickHouse.
Setup VPC and ClickHouse at DoubleCloud
Next, DoubleCloud comes to our rescue. We will add VPC (Virtual Private Cloud) and managed ClickHouse to serve as the dedicated storages for our logs.
Click to view an example on Github
resource "doublecloud_network" "nlb-network" {
project_id = var.project_id
name = var.network_name
region_id = var.region
cloud_type = var.cloud_type
ipv4_cidr_block = var.ipv4_cidr
}
Let’s proceed by illustrating the integration of VPC and ClickHouse within this setup for our log storage. Let’s create an instance of ClickHouse within the VPC:
Click to view an example on Github
resource "doublecloud_clickhouse_cluster" "nlb-logs-clickhouse-cluster" {
project_id = var.project_id
name = var.clickhouse_cluster_name
region_id = var.region
cloud_type = var.cloud_type
network_id = resource.doublecloud_network.nlb-network.id
resources {
clickhouse {
resource_preset_id = var.clickhouse_cluster_resource_preset
disk_size = 34359738368
replica_count = 1
}
}
config {
log_level = "LOG_LEVEL_INFORMATION"
max_connections = 120
}
access {
data_services = ["transfer"]
ipv4_cidr_blocks = [
{
value = var.ipv4_cidr
description = "VPC CIDR"
}
]
}
}
data "doublecloud_clickhouse" "nlb-logs-clickhouse" {
project_id = var.project_id
id = doublecloud_clickhouse_cluster.nlb-logs-clickhouse-cluster.id
}
Managed Service for ClickHouse
Fully managed service from the creators of the world’s 1st managed ClickHouse. Backups, 24/7 monitoring, auto-scaling, and updates.
Connect ClickHouse with S3 Logs via Transfer at DoubleCloud
To connect S3 with ClickHouse, we’ll use DoubleCloud’s Transfer service, which is a powerful ELT tool.
Establishing a DoubleCloud Transfer (ELT pipeline) involves setting up a source and target endpoint. The Terraform code snippet below outlines the source endpoint:
Click to view an example on Github
resource "doublecloud_transfer_endpoint" "nlb-s3-s32ch-source" {
name = var.transfer_source_name
project_id = var.project_id
settings {
object_storage_source {
provider {
bucket = var.bucket_name
path_prefix = var.bucket_prefix
aws_access_key_id = var.aws_access_key_id
aws_secret_access_key = var.aws_access_key_secret
region = var.region
endpoint = var.endpoint
use_ssl = true
verify_ssl_cert = true
}
format {
csv {
delimiter = " " // space as delimiter
advanced_options {
}
additional_options {
}
}
}
event_source {
sqs {
queue_name = var.sqs_name
}
}
result_table {
add_system_cols = true
table_name = var.transfer_source_table_name
table_namespace = var.transfer_source_table_namespace
}
result_schema {
data_schema {
fields {
field {
name = "type"
type = "string"
required = false
key = false
path = "0"
}
field {
name = "version"
type = "string"
required = false
key = false
path = "1"
}
/*
Rest of Fields
*/
field {
name = "tls_connection_creation_time"
type = "datetime"
required = false
key = false
path = "21"
}
}
}
}
}
}
}
This Terraform code snippet delineates the source endpoint configuration, encompassing S3 connection details, data format, event notifications using an SQS queue, and the schema definition for data stored within the S3 bucket.
Once we have a source, let’s make a target. It easier, since it’s just a ClickHouse:
Click to view an example on Github
resource "doublecloud_transfer_endpoint" "nlb-ch-s32ch-target" {
name = var.transfer_target_name
project_id = var.project_id
settings {
clickhouse_target {
clickhouse_cleanup_policy = "DROP"
connection {
address {
cluster_id = doublecloud_clickhouse_cluster.nlb-logs-clickhouse-cluster.id
}
database = "default"
password = data.doublecloud_clickhouse.nlb-logs-clickhouse.connection_info.password
user = data.doublecloud_clickhouse.nlb-logs-clickhouse.connection_info.user
}
}
}
}
These two pieces can then be composed into a transfer:
Click to view an example on Github
resource "doublecloud_transfer" "nlb-logs-s32ch" {
name = var.transfer_name
project_id = var.project_id
source = doublecloud_transfer_endpoint.nlb-s3-s32ch-source.id
target = doublecloud_transfer_endpoint.nlb-ch-s32ch-target.id
type = "INCREMENT_ONLY"
activated = false
}
Once the transfer has been established, a comprehensive delivery pipeline takes shape:
No-code ELT tool: Data Transfer
A cloud agnostic service for aggregating, collecting, and migrating data from various sources.
The illustration above represents the culmination of our efforts — a complete delivery pipeline primed for seamless data flow. This integrated system, incorporating S3, SQS, VPC, ClickHouse, and the orchestrated configurations, stands ready to handle, process, and analyze log data efficiently and effectively at any scale.
Looking into logs
Now, it’s time to explore your data in ClickHouse. Let’s dive into querying your structured logs and uncover insights within this robustly organized dataset. To connect to your newly create DB you can use clickhouse-client:
clickhouse-client \\
--host $CH_HOST \\
--port 9440 \\
--secure \\
--user admin \\
--password $CH_PASSWORD
Let’s start by getting an overview of the total count of logs. This simple query will provide us with an initial grasp of the volume of data we’re working on within our ClickHouse dataset:
SELECT count(*)
FROM logs_alb
Query id: 6cf59405-2a61-451b-9579-a7d340c8fd5c
┌──count()─┐
│ 15935887 │
└──────────┘
1 row in set. Elapsed: 0.457 sec.
Now, let’s zoom in and retrieve a specific row from our dataset. This targeted query will fetch a single record, allowing us to closely examine the details encapsulated within this individual log entry.
SELECT *
FROM logs_alb
LIMIT 1
FORMAT Vertical
Query id: 44fc6045-a5be-47e2-8482-3033efb58206
Row 1:
──────
type: tls
version: 2.0
time: 2023-11-20 21:05:01
elb: net/*****/*****
listener: 92143215dc51bb35
client_port: 10.0.246.57:55534
destination_port: 10.0.39.32:443
connection_time: 1
tls_handshake_time: -
received_bytes: 0
sent_bytes: 0
incoming_tls_alert: -
chosen_cert_arn: -
chosen_cert_serial: -
tls_cipher: -
tls_protocol_version: -
tls_named_group: -
domain_name: -
alpn_fe_protocol: -
alpn_be_protocol: -
alpn_client_preference_list: -
tls_connection_creation_time: 2023-11-20 21:05:01
__file_name: api/AWSLogs/******/elasticloadbalancing/eu-central-1/2023/11/20/****-central-1_net.****.log.gz
__row_index: 1
__data_transfer_commit_time: 1700514476000000000
__data_transfer_delete_time: 0
1 row in set. Elapsed: 0.598 sec.
Time for a quick analysis: let’s execute a straightforward group by query to determine the most frequently used destination ports.
SELECT
destination_port,
count(*)
FROM logs_alb
GROUP BY destination_port
Query id: a4ab55db-9208-484f-b019-a5c13d779063
┌─destination_port──┬─count()─┐
│ 10.0.234.156:443 │ 10148 │
│ 10.0.205.254:443 │ 12639 │
│ 10.0.209.51:443 │ 13586 │
│ 10.0.223.207:443 │ 10125 │
│ 10.0.39.32:443 │ 4860701 │
│ 10.0.198.39:443 │ 13837 │
│ 10.0.224.240:443 │ 9546 │
│ 10.10.162.244:443 │ 416893 │
│ 10.0.212.130:443 │ 9955 │
│ 10.0.106.172:443 │ 4860359 │
│ 10.10.111.92:443 │ 416908 │
│ 10.0.204.18:443 │ 9789 │
│ 10.10.24.126:443 │ 416881 │
│ 10.0.232.19:443 │ 13603 │
│ 10.0.146.100:443 │ 4862200 │
└───────────────────┴─────────┘
15 rows in set. Elapsed: 1.101 sec. Processed 15.94 million rows, 405.01 MB (14.48 million rows/s., 368.01 MB/s.)
Explore the complete example in our Terraform project for a hands-on experience with log querying in ClickHouse.