Why categorizing information is hard — and smarter alternatives

Why strict categories are difficult

  1. Boundaries aren’t clear
    Many concepts don’t have sharp edges. For example, if you’re categorizing animals: is a whale in “fish” (it lives in water), “mammals” (it nurses its young), or both? Categories assume clear rules, but reality is fuzzy.
  2. Overlapping fits
    Information often fits in multiple categories at once. An article about “woodworking with recycled plastic” could belong to woodworking, plastics, sustainability, and recycling. Forcing it into only one feels reductive.
  3. Outliers and edge cases
    Some items fit into none of the available categories. You then have to either create a new category (which leads to category sprawl) or put it into a “miscellaneous” bin (which isn’t useful to users).
  4. Category drift
    Categories change meaning over time. For instance, “computers” in the 1970s meant something very different than it does now. Maintaining categories requires constant revision.
  5. User perspective differences
    What feels like the “right” category depends on the user’s purpose. A chemist, an environmentalist, and an industrial engineer might categorize the same article differently.

Alternatives to strict categories

  1. Tags (folksonomy or metadata labels)
    Instead of one “correct” category, you apply multiple descriptive labels (tags) to an item.
    • A whale might get tags like ocean, mammal, large animal, endangered.
    • Tags are non-hierarchical and flexible, so users can search or filter by any combination.
    • Downside: tags can become messy or inconsistent without some guidelines.
  2. Hierarchies (taxonomies)
    A tree structure allows broader-to-narrower categories.
    • Example: Animals → Mammals → Marine mammals → Whales.
    • This works well for structured domains but still struggles when something belongs in multiple branches.
  3. Faceted classification
    Instead of one category tree, you classify along multiple facets (dimensions).
    • For a book: Genre: Science Fiction, Time: 19th century, Place: France.
    • Users can combine facets (e.g., Science Fiction + 19th century).
    • This is essentially how modern e-commerce filters work.
  4. Search-driven discovery
    With good full-text search, categorization becomes less critical. People can simply search keywords across content. Categories or tags can still improve results, but aren’t the only access path.
  5. Recommendation & similarity systems
    Instead of predefining categories, algorithms suggest related items (“people who read this also liked…”). This bypasses rigid classification altogether.

The key tradeoff:

  • Categories = structured, clean, but rigid.
  • Tags/facets = flexible, overlapping, user-driven, but potentially messy.
  • Search/recommendation = fluid and adaptive, but less predictable.

A side-by-side comparison:

ApproachHow it worksProsConsGood for
Strict Categories (taxonomy)Each item belongs to one predefined category (or sub-category).– Easy to browse – Structured & predictable – Works well for well-defined domains– Doesn’t handle overlap well – Edge cases end up in “misc” – Needs constant maintenanceLibraries, formal archives, tightly defined collections
Tags (folksonomy / labels)Multiple descriptive labels assigned to each item.– Flexible & lightweight – Allows overlap naturally – User-driven and adaptable– Can get messy/inconsistent – No inherent structure – Tag sprawl without moderationBlogs, forums, photo/video sites, dynamic collections
Hierarchical CategoriesNested tree (broad → narrow).– Intuitive drill-down browsing – Shows relationships– Still forces single placement – Hard when item belongs in multiple branchesBiological classification, product catalogs
Faceted ClassificationMultiple dimensions (facets) like genre, time, place, format.– Handles multi-dimensionality well – Lets users filter dynamically – Good for large/complex data– More complex to design – Requires structured metadataE-commerce, databases, academic resources
Search-DrivenUsers type queries across all content.– Very flexible – No need for rigid structure – Handles long-tail/rare items well– Results vary in quality – Users may not know what to search – Hard to browse serendipitouslyLarge content repositories, modern websites
Recommendation / SimilarityAlgorithms suggest “related” items based on content or behavior.– Adaptive to user behavior – Can surface unexpected but relevant items– Black-box feel – Requires lots of data – Less predictableStreaming services, news feeds, e-commerce personalization

In practice, most modern systems combine these approaches. For example:

  • YouTube uses tags (metadata), search, and recommendations.
  • Amazon uses hierarchical categories, facets (price, brand, rating), search, and recommendations.
  • Wikipedia uses categories, tags, and search together.

Leave a Comment

Licensed under CC BY-NC 4.0

DevOps viewpoints are those of its owner. You may share and adapt this article for non-commercial purposes, provided proper attribution is given. Attribution should include:

Title: Why categorizing information is hard — and smarter alternatives
Author: peter arthur martin
Original URL: https://www.woodcentral.com/-/peter/why-categorizing-information-is-hard-and-smarter-alternatives/
License: CC BY-NC 4.0

Site Index

👍 This page answered my questions

Your vote helps other woodworkers quickly find the answers and techniques that actually work in the shop.