Challenge

A clean, classified dataset isn’t useful unless people can interact with it. I needed a way to turn raw militaria data into a product that both I and other collectors could actually use — a site where you can search, filter, and compare items across 100+ dealers in one place.

Solution: A Full Web Application (Django + Render)

I built a Django-based application, hosted on Render, backed by PostgreSQL on AWS RDS and images in S3. The site currently makes 350,000+ militaria products searchable in seconds.

Key features:

  • Search & Filter: Products can be filtered by conflict, nation, and item type or queried directly by keyword.

  • User Accounts: Visitors can register, save products, and build their own collections.

  • AI-Assisted Item Creation: Users can add their own items just by writing a description (e.g. “Purchased from vet’s estate, 82nd Airborne helmet, Normandy”). The system parses the text and fills in structured fields automatically.

  • Product Comparison (in progress): A side-by-side comparison tool for evaluating multiple items at once.

  • Scalability: Optimized queries and caching ensure that browsing stays fast even at scale.

Selecting a Conflict. Here we select World War One.

Selecting a nation. Here we select Hungary.

Finally we select a item type. Here we select Medals & Awards.

Now you can see all the products that fall into the categories you selected. We will select the first product.

How to Read a Product Page

  1. Dealer Logo – Source site (here: Aberdeen Medals).

  2. Title – Original dealer title, kept intact.

  3. Price – Shown in dealer’s currency.

  4. Availability – Updated automatically (Available/Sold).

  5. Structured Metadata – Conflict, Nation, Item Type, Source Site, Product ID, Currency.

  6. Shipping Info – Location when available.

  7. View on Original Site – Opens the dealer’s product page.

  8. Save – Add to your Milivault collection.

  9. Description – Dealer text, cleaned and standardized.

Supporting Tools (Streamlit)

Alongside the site, I built internal tools in Streamlit to keep the dataset clean and usable:

  • Labeling Tool: A batch-confirm interface for quickly validating or overriding ML/OpenAI classifications.

  • SQL Explorer: A lightweight dashboard for querying and validating trends in the data.

These tools aren’t public-facing yet, but they show I can design internal ops tools as well as external products.

This Streamlit tool lets me validate products in bulk. Here, I’m confirming that these items belong to 19th Century, France, Headwear. The second product is actually a helmet, so I leave it unselected. With one action, I can batch-confirm the rest — updating the user_confirmed column in PostgreSQL. This human-in-the-loop workflow is what keeps 350,000+ classifications accurate while still scaling efficiently.

Results

  • A fully functional end-to-end system: from scraping to ML to a usable web application.

  • Hundreds of thousands of products pre-classified and searchable in one place.

  • Fast, responsive filtering across 100+ dealer sites — something no collector could do manually.

  • AI features integrated into both data ingestion (classification pipeline) and user input (AI-assisted item creation).

  • Early insights already emerging — e.g., analysis of helmets shows that sales platform is the strongest predictor of price.

Why it matters

This section closes the loop: I can take messy, unstructured dealer listings and turn them into a production-grade application. For a prospective employer, it demonstrates:

  • Full-stack capability: Django + PostgreSQL + AWS + Render.

  • Data engineering: handling 350k+ records with real-time search and filter.

  • ML/AI integration: hybrid classification pipeline + AI-assisted user workflows.

  • Product sense: building something real people could use, not just backend scripts.

Conclusion

What started as messy, inconsistent product listings across 100+ sites is now a structured, searchable dataset of 350,000+ items — backed by a machine learning pipeline and delivered through a live web application. This project shows how I can take a vague, unstructured problem and build an end-to-end system: scraping, cleaning, classifying, and presenting data at scale.