{"id":1486,"date":"2025-12-21T04:43:19","date_gmt":"2025-12-21T04:43:19","guid":{"rendered":"https:\/\/www.woodcentral.com\/-\/peter\/?p=1486"},"modified":"2026-05-24T11:28:10","modified_gmt":"2026-05-24T11:28:10","slug":"half-the-internet-isnt-human-understanding-bot-traffic-in-2025","status":"publish","type":"post","link":"https:\/\/www.woodcentral.com\/-\/peter\/half-the-internet-isnt-human-understanding-bot-traffic-in-2025\/","title":{"rendered":"Half the Internet isn\u2019t human: understanding bot traffic in 2025"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Below is a <strong>practical, operator-level breakdown<\/strong> focused on (a) what percentages typically look like by site type and (b) how to <strong>measure and filter bot traffic using server logs and Cloudflare<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Typical bot traffic by site type (real-world ranges)<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">These are <strong>observed ranges<\/strong>, not marketing numbers. Individual sites can exceed them.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Small personal or hobby sites:<\/strong><br>40\u201370% bot traffic<br>Low human traffic makes crawlers and scanners dominate logs.<\/li>\n\n\n\n<li><strong>Forums and community sites (read-heavy):<\/strong><br>50\u201375% bot traffic<br>Search engines, archive bots, AI crawlers, and scrapers are constant.<\/li>\n\n\n\n<li><strong>Content sites (blogs, how-tos, reference):<\/strong><br>60\u201385% bot traffic<br>Especially high for older, well-linked content.<\/li>\n\n\n\n<li><strong>E-commerce:<\/strong><br>30\u201355% bot traffic<br>Price scrapers, inventory bots, and card-testing bots are common.<\/li>\n\n\n\n<li><strong>APIs \/ feeds:<\/strong><br>70\u201395% automated traffic<br>Often intentional, but still \u201cnon-human.\u201d<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">If you are seeing numbers in the <strong>60\u201380% range<\/strong>, that is now normal for publicly visible content sites.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">How to measure bot traffic accurately (without databases)<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Since you prefer <strong>plain log files<\/strong>, this is the most reliable approach.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Start with access logs, not analytics<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Analytics tools under-report bots by design.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">From Apache or Nginx logs:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Count total requests<\/li>\n\n\n\n<li>Separate by:\n<ul class=\"wp-block-list\">\n<li>User-Agent<\/li>\n\n\n\n<li>Request rate per IP<\/li>\n\n\n\n<li>Request depth (pages hit per visit)<\/li>\n\n\n\n<li>Time between requests<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Bots reveal themselves through <strong>behavior<\/strong>, not labels.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">2. Strong indicators of non-human traffic<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Treat traffic as automated if you see any combination of:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Request intervals under ~1 second, sustained<\/li>\n\n\n\n<li>Hundreds of pages fetched with no assets (CSS, JS, images)<\/li>\n\n\n\n<li>Perfect crawl order (page1 \u2192 page2 \u2192 page3)<\/li>\n\n\n\n<li>HEAD requests or excessive 304s<\/li>\n\n\n\n<li>No referrer on deep content<\/li>\n\n\n\n<li>User-Agent claiming Windows 10 + Chrome but:\n<ul class=\"wp-block-list\">\n<li>No cookies<\/li>\n\n\n\n<li>No JS execution<\/li>\n\n\n\n<li>No asset loading<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">User-Agent strings are <strong>not trustworthy<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">3. Cloudflare: separating humans from bots<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">At the CDN level, Cloudflare already classifies traffic:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Verified bots<\/strong> (Googlebot, Bingbot, etc.)<\/li>\n\n\n\n<li><strong>Likely bots<\/strong><\/li>\n\n\n\n<li><strong>Definitely automated<\/strong><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">What to check:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Bot score distribution<\/li>\n\n\n\n<li>Request volume by bot category<\/li>\n\n\n\n<li>Crawl spikes after publishing new content<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Cloudflare will often show:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Humans: low request count, high asset usage<\/li>\n\n\n\n<li>Bots: high request count, low asset usage<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">This matches what you\u2019ll see in raw logs.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Filtering strategy (without breaking legitimate crawlers)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Allow:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Verified search engine bots<\/li>\n\n\n\n<li>Known archive bots (Internet Archive, etc.)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Rate-limit or challenge:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>High-rate crawlers<\/li>\n\n\n\n<li>AI training bots that ignore crawl-delay<\/li>\n\n\n\n<li>Scrapers pulling entire directories<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Block outright:<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>IPs hitting hundreds of URLs per minute<\/li>\n\n\n\n<li>Bots requesting only HTML, never assets<\/li>\n\n\n\n<li>Known malicious ASN ranges<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">You do <strong>not<\/strong> want to block all bots\u2014only <strong>unbounded or abusive ones<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Why Windows 10 appears so often in bot traffic<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Windows 10 + Chrome is the <strong>most common spoofed user-agent<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Bots use it to bypass na\u00efve filters<\/li>\n\n\n\n<li>It does <strong>not<\/strong> indicate real Windows 10 users or servers<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">If you filter by behavior instead of OS strings, this noise disappears.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Bottom line<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>~50% of global web traffic is non-human<\/strong><\/li>\n\n\n\n<li>Content and forum sites often see <strong>60\u201380% bot traffic<\/strong><\/li>\n\n\n\n<li>Windows 10 user-agents are frequently spoofed<\/li>\n\n\n\n<li>Log-based behavioral analysis is still the most accurate method<\/li>\n\n\n\n<li>Cloudflare already gives you most of the signal\u2014you just need to interpret it correctly<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Below is a practical, operator-level breakdown focused on (a) what percentages typically look like by site type and (b) how to measure and filter bot traffic using server logs and Cloudflare. Typical bot traffic by site type (real-world ranges) These are observed ranges, not marketing numbers. Individual sites can exceed them. If you are seeing &#8230; <a title=\"Half the Internet isn\u2019t human: understanding bot traffic in 2025\" class=\"read-more\" href=\"https:\/\/www.woodcentral.com\/-\/peter\/half-the-internet-isnt-human-understanding-bot-traffic-in-2025\/\" aria-label=\"Read more about Half the Internet isn\u2019t human: understanding bot traffic in 2025\">Read more<\/a><\/p>\n","protected":false},"author":7,"featured_media":1487,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-1486","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/www.woodcentral.com\/-\/peter\/wp-json\/wp\/v2\/posts\/1486","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.woodcentral.com\/-\/peter\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.woodcentral.com\/-\/peter\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.woodcentral.com\/-\/peter\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/www.woodcentral.com\/-\/peter\/wp-json\/wp\/v2\/comments?post=1486"}],"version-history":[{"count":0,"href":"https:\/\/www.woodcentral.com\/-\/peter\/wp-json\/wp\/v2\/posts\/1486\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.woodcentral.com\/-\/peter\/wp-json\/wp\/v2\/media\/1487"}],"wp:attachment":[{"href":"https:\/\/www.woodcentral.com\/-\/peter\/wp-json\/wp\/v2\/media?parent=1486"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.woodcentral.com\/-\/peter\/wp-json\/wp\/v2\/categories?post=1486"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.woodcentral.com\/-\/peter\/wp-json\/wp\/v2\/tags?post=1486"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}