Below is a practical, operator-level breakdown focused on (a) what percentages typically look like by site type and (b) how to measure and filter bot traffic using server logs and Cloudflare.
Typical bot traffic by site type (real-world ranges)
These are observed ranges, not marketing numbers. Individual sites can exceed them.
- Small personal or hobby sites:
40–70% bot traffic
Low human traffic makes crawlers and scanners dominate logs. - Forums and community sites (read-heavy):
50–75% bot traffic
Search engines, archive bots, AI crawlers, and scrapers are constant. - Content sites (blogs, how-tos, reference):
60–85% bot traffic
Especially high for older, well-linked content. - E-commerce:
30–55% bot traffic
Price scrapers, inventory bots, and card-testing bots are common. - APIs / feeds:
70–95% automated traffic
Often intentional, but still “non-human.”
If you are seeing numbers in the 60–80% range, that is now normal for publicly visible content sites.
How to measure bot traffic accurately (without databases)
Since you prefer plain log files, this is the most reliable approach.
1. Start with access logs, not analytics
Analytics tools under-report bots by design.
From Apache or Nginx logs:
- Count total requests
- Separate by:
- User-Agent
- Request rate per IP
- Request depth (pages hit per visit)
- Time between requests
Bots reveal themselves through behavior, not labels.
2. Strong indicators of non-human traffic
Treat traffic as automated if you see any combination of:
- Request intervals under ~1 second, sustained
- Hundreds of pages fetched with no assets (CSS, JS, images)
- Perfect crawl order (page1 → page2 → page3)
- HEAD requests or excessive 304s
- No referrer on deep content
- User-Agent claiming Windows 10 + Chrome but:
- No cookies
- No JS execution
- No asset loading
User-Agent strings are not trustworthy.
3. Cloudflare: separating humans from bots
At the CDN level, Cloudflare already classifies traffic:
- Verified bots (Googlebot, Bingbot, etc.)
- Likely bots
- Definitely automated
What to check:
- Bot score distribution
- Request volume by bot category
- Crawl spikes after publishing new content
Cloudflare will often show:
- Humans: low request count, high asset usage
- Bots: high request count, low asset usage
This matches what you’ll see in raw logs.
Filtering strategy (without breaking legitimate crawlers)
Allow:
- Verified search engine bots
- Known archive bots (Internet Archive, etc.)
Rate-limit or challenge:
- High-rate crawlers
- AI training bots that ignore crawl-delay
- Scrapers pulling entire directories
Block outright:
- IPs hitting hundreds of URLs per minute
- Bots requesting only HTML, never assets
- Known malicious ASN ranges
You do not want to block all bots—only unbounded or abusive ones.
Why Windows 10 appears so often in bot traffic
Windows 10 + Chrome is the most common spoofed user-agent
- Bots use it to bypass naïve filters
- It does not indicate real Windows 10 users or servers
If you filter by behavior instead of OS strings, this noise disappears.
Bottom line
- ~50% of global web traffic is non-human
- Content and forum sites often see 60–80% bot traffic
- Windows 10 user-agents are frequently spoofed
- Log-based behavioral analysis is still the most accurate method
- Cloudflare already gives you most of the signal—you just need to interpret it correctly