Half the Internet isn’t human: understanding bot traffic in 2025

Below is a practical, operator-level breakdown focused on (a) what percentages typically look like by site type and (b) how to measure and filter bot traffic using server logs and Cloudflare.


Typical bot traffic by site type (real-world ranges)

These are observed ranges, not marketing numbers. Individual sites can exceed them.

  • Small personal or hobby sites:
    40–70% bot traffic
    Low human traffic makes crawlers and scanners dominate logs.
  • Forums and community sites (read-heavy):
    50–75% bot traffic
    Search engines, archive bots, AI crawlers, and scrapers are constant.
  • Content sites (blogs, how-tos, reference):
    60–85% bot traffic
    Especially high for older, well-linked content.
  • E-commerce:
    30–55% bot traffic
    Price scrapers, inventory bots, and card-testing bots are common.
  • APIs / feeds:
    70–95% automated traffic
    Often intentional, but still “non-human.”

If you are seeing numbers in the 60–80% range, that is now normal for publicly visible content sites.


How to measure bot traffic accurately (without databases)

Since you prefer plain log files, this is the most reliable approach.

1. Start with access logs, not analytics

Analytics tools under-report bots by design.

From Apache or Nginx logs:

  • Count total requests
  • Separate by:
    • User-Agent
    • Request rate per IP
    • Request depth (pages hit per visit)
    • Time between requests

Bots reveal themselves through behavior, not labels.


2. Strong indicators of non-human traffic

Treat traffic as automated if you see any combination of:

  • Request intervals under ~1 second, sustained
  • Hundreds of pages fetched with no assets (CSS, JS, images)
  • Perfect crawl order (page1 → page2 → page3)
  • HEAD requests or excessive 304s
  • No referrer on deep content
  • User-Agent claiming Windows 10 + Chrome but:
    • No cookies
    • No JS execution
    • No asset loading

User-Agent strings are not trustworthy.


3. Cloudflare: separating humans from bots

At the CDN level, Cloudflare already classifies traffic:

  • Verified bots (Googlebot, Bingbot, etc.)
  • Likely bots
  • Definitely automated

What to check:

  • Bot score distribution
  • Request volume by bot category
  • Crawl spikes after publishing new content

Cloudflare will often show:

  • Humans: low request count, high asset usage
  • Bots: high request count, low asset usage

This matches what you’ll see in raw logs.


Filtering strategy (without breaking legitimate crawlers)

Allow:

  • Verified search engine bots
  • Known archive bots (Internet Archive, etc.)

Rate-limit or challenge:

  • High-rate crawlers
  • AI training bots that ignore crawl-delay
  • Scrapers pulling entire directories

Block outright:

  • IPs hitting hundreds of URLs per minute
  • Bots requesting only HTML, never assets
  • Known malicious ASN ranges

You do not want to block all bots—only unbounded or abusive ones.


Why Windows 10 appears so often in bot traffic

Windows 10 + Chrome is the most common spoofed user-agent

  • Bots use it to bypass naïve filters
  • It does not indicate real Windows 10 users or servers

If you filter by behavior instead of OS strings, this noise disappears.


Bottom line

  • ~50% of global web traffic is non-human
  • Content and forum sites often see 60–80% bot traffic
  • Windows 10 user-agents are frequently spoofed
  • Log-based behavioral analysis is still the most accurate method
  • Cloudflare already gives you most of the signal—you just need to interpret it correctly

Leave a Comment

Licensed under CC BY-NC 4.0

DevOps viewpoints are those of its owner. You may share and adapt this article for non-commercial purposes, provided proper attribution is given. Attribution should include:

Title: Half the Internet isn’t human: understanding bot traffic in 2025
Author: peter arthur martin
Original URL: https://www.woodcentral.com/-/peter/half-the-internet-isnt-human-understanding-bot-traffic-in-2025/
License: CC BY-NC 4.0

Site Index

👍 This page answered my questions

Your vote helps other woodworkers quickly find the answers and techniques that actually work in the shop.