General 2 — Keenan Nilson

Milivault

By Keenan Nilson

A Data Platform Tracking and Sorting 350,000+ Military Antiques Across 100+ Dealer Sites

Visit Site

GitHub

Tech at a glance

How does the site function?

I host the site on Render, connected to my AWS services.
The frontend runs on Django, CSS, and HTML.

How is the data hosted?

Product data lives in AWS RDS (PostgreSQL).
Image files are stored in AWS S3.
The live site itself runs on Render + Django.

How is the data collected?

I built a custom Python scraper that runs on an AWS EC2 instance.
Right now the system tracks 350,000+ products, each with 50+ data fields and 5–15 images.

How is the data kept fresh?

A lightweight job runs every 10 minutes to check for updates, plus a full refresh every 12 hours.
This balance keeps the data current without overloading sites or servers.

How is the data classified?

I trained a scikit-learn model to classify products using a taxonomy I designed specifically for militaria.
The model is powered by a dataset I built by manually labeling 67,000 products with the help of a Streamlit app I created.
If the model’s confidence is too low, I send the product through OpenAI’s API as a fallback.
Anything OpenAI can’t classify with confidence is flagged for human review.

Stack: Python, Django, AWS (RDS, S3, EC2), Render, Streamlit, scikit-learn, OpenAI API.

Main Objective

Build a scalable platform to collect, clean, classify, and host militaria product data at scale, laying the foundation for future market insights.

Core Achievements

350k+ products tracked across 100+ dealer sites, each with 50+ fields and images.
Scraper framework (Python + AWS EC2) with JSON configs for flexible site integration.
Database & hosting – PostgreSQL (AWS RDS) + S3 for images, served via Django + Render.
Classification pipeline – Custom ML model trained on 67k labeled items, with OpenAI fallback + Streamlit review app.

Item Distribution by Site

Distribution of items collected by site (top dealers dominate the dataset)

The Main Components

1. Gather the Data

Custom Python scrapers with JSON configs pull listings from 100+ dealer sites. Each record is cleaned, deduplicated, and enriched (hidden prices, availability, normalized fields). The data is stored in AWS RDS with product images in S3.

Part One
2. Classify the Data

I trained a scikit-learn model on 67k manually labeled items (via a Streamlit app) to classify products into a taxonomy I designed. When confidence is low, products are sent to OpenAI for backup classification. Any remaining edge cases are flagged for human review.

Part Two
3. Present the Data

The cleaned and classified dataset powers a Django site (milivault.com) and Streamlit tools. Users can search, filter, and confirm classifications, making the data both useful and transparent.

Part Three

Why I built this

I have always had a deep love for my local antique shop, which was unique in that it exclusively sold military antiques. The shop also had its own museum and a vibrant, one-of-a-kind atmosphere. I volunteered there every weekend for years, learning about the history behind each piece and connecting with the militaria community. Unfortunately, the shop is no longer in business, but that doesn’t mean my passion for antiques has faded. While I may now explore antiques and follow trends in a less traditional way, my love for the craft and my appreciation for the stories these objects hold remains strong.