Русский flag Русский

Cheap and dirty: how to collect logs from remote clients over HTTP

Published on 2025-11-12

Do you have an application spread across hundreds of client devices? Or a fleet of IoT sensors sending telemetry? Sooner or later the question arises: “What’s actually happening over there?” And right after it — “How do I collect logs without bankrupting myself on Splunk or Datadog?”

If your clients can send HTTP requests, you already have ninety percent of the solution. HTTP(S) is a universal and firewall-friendly protocol. All we need is a listener (endpoint) that will accept these logs.

In this article we’ll examine three budget ways to organize log collection over HTTP — from “fast, simple, and works” to “scalable and almost free”.


Why HTTP

  • Universality — every programming language and platform has an HTTP client.
  • Simplicity — a POST request with a JSON body is intuitive.
  • Availability — port 443 (HTTPS) is almost never blocked, unlike specific ports for syslog or GELF.

Golden rule: always use HTTPS. Free Let’s Encrypt certificates can be set up in ten minutes and protect your data from interception.


Option 1. Script on a VPS (fast and very cheap)

The simplest, “hacky”, but also the fastest to deploy method. Suitable for pet projects or collecting logs from a small number of clients.

What you’ll need

  • A cheap VPS (Vultr, DigitalOcean, Hetzner, etc.) costing around five dollars a month.
  • Stack: Nginx + any backend script (Python/Flask, Node.js/Express, PHP).

How it works

  1. Rent a virtual server.

  2. Bring up Nginx as a reverse proxy and for SSL termination.

  3. Write a tiny web application (twenty lines of code) that:

    • accepts POST requests to /log;
    • if necessary, checks a secret key in the X-API-Key header;
    • writes the JSON log to a file.

Example in Python (Flask)

from flask import Flask, request, abort
import os
import json

app = Flask(__name__)

API_KEY = os.environ.get("LOG_API_KEY", "super-secret-key")

@app.route('/log', methods=['POST'])
def receive_log():
    if request.headers.get('X-API-Key') != API_KEY:
        abort(401)

    data = request.get_json(silent=True)
    if not data:
        abort(400)

    try:
        with open("/var/log/my-app/events.log", "a") as f:
            f.write(json.dumps(data) + "\n")
    except Exception as e:
        print(f"Failed to write log: {e}")
        abort(500)

    return "OK", 200

if __name__ == '__main__':
    app.run(host='127.0.0.1', port=5000)

Advantages

  • Minimal costs (around five dollars a month).
  • Simplicity and control over storage.
  • Deployment in thirty minutes.

Disadvantages

  • Poor scalability: writing to a single file quickly becomes a bottleneck.
  • Maintenance required (log rotation, disk monitoring, security).
  • Analysis — manually via SSH and grep.

Option 2. Server-side collector (reliable and flexible)

An evolution of the first approach — using a ready optimized open-source stack.

What you’ll need

  • VPS for $5–10/month.
  • Stack: Vector, Loki, Grafana.

How this works

  1. Vector — an HTTP server that accepts logs.
  2. Loki — Grafana’s log database, which indexes only labels, making storage cheap.
  3. Grafana — a convenient interface for searching and analyzing logs using LogQL.

Example Vector configuration (vector.yaml)

sources:
  http_logs:
    type: "http"
    address: "0.0.0.0:8080"
    decoding:
      codec: "json"
    auth:
      strategy: "header"
      header_key: "X-API-Key"
      api_keys: ["super-secret-key"]

transforms:
  my_transform:
    type: "remap"
    inputs: ["http_logs"]
    source: |
      .app = "my_mobile_app"
      .level = get!(.level, default: "info")

sinks:
  loki:
    type: "loki"
    inputs: ["my_transform"]
    endpoint: "http://localhost:3100"
    labels:
      app: "{{ app }}"
      level: "{{ level }}"
    encoding:
      codec: "json"

Advantages

  • High performance (thousands of logs per second).
  • Convenient search and filtering via Grafana.
  • Vector can send logs not only to Loki, but also to S3, ClickHouse, and other storages.

Disadvantages

  • More complex setup: three components that require Docker or systemd.

Option 3. Serverless (almost free and virtually infinitely scalable)

The most modern and flexible option. No servers — you pay only for the number of requests and the amount of data.

What you’ll need

  • A cloud account (AWS, Google Cloud, or Yandex Cloud).
  • Stack (example on AWS): API Gateway + Lambda + S3 or CloudWatch Logs.

How it works

  1. The client sends a POST request.

  2. API Gateway receives the request as a managed HTTP endpoint.

  3. It invokes Lambda.

  4. Lambda:

    • saves the JSON to S3 by date;
    • sends the log to CloudWatch Logs.

Example Lambda in Python (writing to S3)

import json
import boto3
import time
import os

s3 = boto3.client('s3')
BUCKET_NAME = os.environ['LOG_BUCKET_NAME']

def lambda_handler(event, context):
    body = event.get('body')
    if not body:
        return {'statusCode': 400, 'body': 'No data'}

    try:
        log_data = json.loads(body)
    except json.JSONDecodeError:
        return {'statusCode': 400, 'body': 'Invalid JSON'}

    now = time.strftime('%Y/%m/%d/%H', time.gmtime())
    file_name = f"{context.aws_request_id}.json"
    s3_key = f"logs/{now}/{file_name}"

    try:
        s3.put_object(
            Bucket=BUCKET_NAME,
            Key=s3_key,
            Body=json.dumps(log_data),
            ContentType='application/json'
        )
        return {'statusCode': 200, 'body': 'OK'}
    except Exception as e:
        print(e)
        return {'statusCode': 500, 'body': 'Error saving log'}

Advantages

  • Free tiers of AWS/Google/Yandex cover millions of logs per month.
  • Automatic scalability.
  • No servers or updates — minimal maintenance.

Disadvantages

  • Analysis requires setup (for example, AWS Athena or BigQuery).
  • Vendor lock-in to a specific cloud provider.

Client-side best practices

  1. Asynchronous sending — don’t block the main thread of the application.
  2. Batch transfer — collect logs into an array and send in batches (every 30 seconds or after 50 events).
  3. Retries — store offline logs locally (for example, in SQLite) and send them later.
  4. Filtering — send only important levels (INFO, WARN, ERROR).

Conclusion

Collecting logs from remote clients cheaply is entirely feasible. The main thing is to choose the approach that fits your needs.

ScenarioRecommendation
Starting a projectServerless
You already have a VPS and need grepScript on a VPS
Need a powerful self-hosted solutionVector + Loki

Need help?

Get in touch with me and I'll help solve the problem

Related Posts