Cheap and dirty: how to collect logs from remote clients over HTTP

Do you have an application spread across hundreds of client devices? Or a fleet of IoT sensors sending telemetry? Sooner or later the question arises: “What’s actually happening over there?” And right after it — “How do I collect logs without bankrupting myself on Splunk or Datadog?”

If your clients can send HTTP requests, you already have ninety percent of the solution. HTTP(S) is a universal and firewall-friendly protocol. All we need is a listener (endpoint) that will accept these logs.

In this article we’ll examine three budget ways to organize log collection over HTTP — from “fast, simple, and works” to “scalable and almost free”.

Why HTTP

Universality — every programming language and platform has an HTTP client.
Simplicity — a POST request with a JSON body is intuitive.
Availability — port 443 (HTTPS) is almost never blocked, unlike specific ports for syslog or GELF.

Golden rule: always use HTTPS. Free Let’s Encrypt certificates can be set up in ten minutes and protect your data from interception.

Option 1. Script on a VPS (fast and very cheap)

The simplest, “hacky”, but also the fastest to deploy method. Suitable for pet projects or collecting logs from a small number of clients.

What you’ll need

A cheap VPS (Vultr, DigitalOcean, Hetzner, etc.) costing around five dollars a month.
Stack: Nginx + any backend script (Python/Flask, Node.js/Express, PHP).

How it works

Rent a virtual server.
Bring up Nginx as a reverse proxy and for SSL termination.
Write a tiny web application (twenty lines of code) that:
- accepts POST requests to /log;
- if necessary, checks a secret key in the X-API-Key header;
- writes the JSON log to a file.

Example in Python (Flask)

from flask import Flask, request, abort
import os
import json

app = Flask(__name__)

API_KEY = os.environ.get("LOG_API_KEY", "super-secret-key")

@app.route('/log', methods=['POST'])
def receive_log():
    if request.headers.get('X-API-Key') != API_KEY:
        abort(401)

    data = request.get_json(silent=True)
    if not data:
        abort(400)

    try:
        with open("/var/log/my-app/events.log", "a") as f:
            f.write(json.dumps(data) + "\n")
    except Exception as e:
        print(f"Failed to write log: {e}")
        abort(500)

    return "OK", 200

if __name__ == '__main__':
    app.run(host='127.0.0.1', port=5000)

Advantages

Minimal costs (around five dollars a month).
Simplicity and control over storage.
Deployment in thirty minutes.

Disadvantages

Poor scalability: writing to a single file quickly becomes a bottleneck.
Maintenance required (log rotation, disk monitoring, security).
Analysis — manually via SSH and grep.

Option 2. Server-side collector (reliable and flexible)

An evolution of the first approach — using a ready optimized open-source stack.

What you’ll need

VPS for $5–10/month.
Stack: Vector, Loki, Grafana.

How this works

Vector — an HTTP server that accepts logs.
Loki — Grafana’s log database, which indexes only labels, making storage cheap.
Grafana — a convenient interface for searching and analyzing logs using LogQL.

Example Vector configuration (`vector.yaml`)

sources:
  http_logs:
    type: "http"
    address: "0.0.0.0:8080"
    decoding:
      codec: "json"
    auth:
      strategy: "header"
      header_key: "X-API-Key"
      api_keys: ["super-secret-key"]

transforms:
  my_transform:
    type: "remap"
    inputs: ["http_logs"]
    source: |
      .app = "my_mobile_app"
      .level = get!(.level, default: "info")

sinks:
  loki:
    type: "loki"
    inputs: ["my_transform"]
    endpoint: "http://localhost:3100"
    labels:
      app: "{{ app }}"
      level: "{{ level }}"
    encoding:
      codec: "json"

Advantages

High performance (thousands of logs per second).
Convenient search and filtering via Grafana.
Vector can send logs not only to Loki, but also to S3, ClickHouse, and other storages.

Disadvantages

More complex setup: three components that require Docker or systemd.

Option 3. Serverless (almost free and virtually infinitely scalable)

The most modern and flexible option. No servers — you pay only for the number of requests and the amount of data.

What you’ll need

A cloud account (AWS, Google Cloud, or Yandex Cloud).
Stack (example on AWS): API Gateway + Lambda + S3 or CloudWatch Logs.

How it works

The client sends a POST request.
API Gateway receives the request as a managed HTTP endpoint.
It invokes Lambda.
Lambda:
- saves the JSON to S3 by date;
- sends the log to CloudWatch Logs.

Example Lambda in Python (writing to S3)

import json
import boto3
import time
import os

s3 = boto3.client('s3')
BUCKET_NAME = os.environ['LOG_BUCKET_NAME']

def lambda_handler(event, context):
    body = event.get('body')
    if not body:
        return {'statusCode': 400, 'body': 'No data'}

    try:
        log_data = json.loads(body)
    except json.JSONDecodeError:
        return {'statusCode': 400, 'body': 'Invalid JSON'}

    now = time.strftime('%Y/%m/%d/%H', time.gmtime())
    file_name = f"{context.aws_request_id}.json"
    s3_key = f"logs/{now}/{file_name}"

    try:
        s3.put_object(
            Bucket=BUCKET_NAME,
            Key=s3_key,
            Body=json.dumps(log_data),
            ContentType='application/json'
        )
        return {'statusCode': 200, 'body': 'OK'}
    except Exception as e:
        print(e)
        return {'statusCode': 500, 'body': 'Error saving log'}

Advantages

Free tiers of AWS/Google/Yandex cover millions of logs per month.
Automatic scalability.
No servers or updates — minimal maintenance.

Disadvantages

Analysis requires setup (for example, AWS Athena or BigQuery).
Vendor lock-in to a specific cloud provider.

Client-side best practices

Asynchronous sending — don’t block the main thread of the application.
Batch transfer — collect logs into an array and send in batches (every 30 seconds or after 50 events).
Retries — store offline logs locally (for example, in SQLite) and send them later.
Filtering — send only important levels (INFO, WARN, ERROR).

Conclusion

Collecting logs from remote clients cheaply is entirely feasible. The main thing is to choose the approach that fits your needs.

Scenario	Recommendation
Starting a project	Serverless
You already have a VPS and need `grep`	Script on a VPS
Need a powerful self-hosted solution	Vector + Loki

Why HTTP

Option 1. Script on a VPS (fast and very cheap)

How it works

Example in Python (Flask)

Option 2. Server-side collector (reliable and flexible)

How this works

Example Vector configuration (`vector.yaml`)

Option 3. Serverless (almost free and virtually infinitely scalable)

How it works

Example Lambda in Python (writing to S3)

Client-side best practices

Conclusion

Need help?

Related Posts

111 | WebDAV — a Hybrid Approach to File Access over HTTP

110 | The Modern Era — S3 and Object Storage over HTTP

029 | Proxy Servers in Action: HAProxy — High-Performance Load Balancer

Cheap and dirty: how to collect logs from remote clients over HTTP

Why HTTP

Option 1. Script on a VPS (fast and very cheap)

How it works

Example in Python (Flask)

Option 2. Server-side collector (reliable and flexible)

How this works

Example Vector configuration (vector.yaml)

Option 3. Serverless (almost free and virtually infinitely scalable)

How it works

Example Lambda in Python (writing to S3)

Client-side best practices

Conclusion

Need help?

Related Posts

111 | WebDAV — a Hybrid Approach to File Access over HTTP

110 | The Modern Era — S3 and Object Storage over HTTP

029 | Proxy Servers in Action: HAProxy — High-Performance Load Balancer

Example Vector configuration (`vector.yaml`)