Engineering 9 min read

The messy reality of building a ticket resale platform

April 5, 2026

#web-scraping#data-pipelines

Marketplaces sound simple. Sellers list things. Buyers buy things. You take a cut. What could go wrong? As it turns out, a lot — especially when the inventory is live event tickets with prices that change by the minute, listings that expire without notice, and data that comes from scraping other platforms rather than direct seller input.

The data problem

The platform aggregates ticket inventory from multiple resale exchanges. This means scraping. A lot of scraping. Thousands of events, each with hundreds of listings, prices updating constantly.

The first thing we learned: scraping at scale is an engineering discipline, not a hack. You need:

- Distributed scraper workers that can be scaled independently - Rate limiting and rotation to avoid getting blocked - Change detection so you're not reprocessing unchanged data - Error handling that distinguishes between "this listing is gone" and "we got rate limited" - A staging layer for raw scraped data before it hits the production database

We built the scraper infrastructure on containerized workers with a job queue. Each worker handles one exchange. If an exchange changes their markup, we update one worker — the rest keep running. When volume spikes before a major event, we spin up more workers.

Price synchronization

Tickets exist on multiple exchanges simultaneously. The same pair of seats might be listed on three platforms at different prices. Our platform needs to show the best price, but "best price" changes minute to minute.

We built a price sync engine that:

- Ingests price updates from all exchanges in near real-time - Normalizes pricing (some include fees, some don't) - Detects and removes stale listings — if a scrape fails, we don't show yesterday's prices as current - Handles currency and fee calculation per exchange - Maintains a price history for analytics

The hardest part was staleness. A ticket listing might sell on another platform and we don't know about it for 2-3 minutes. During that window, our platform shows inventory that doesn't exist. This is the kind of problem that doesn't show up in development but makes or breaks the user experience. We solved it with aggressive TTLs and real-time validation at checkout — before a purchase completes, we verify the listing is still live.

Search and discovery

Users need to find events quickly. Sounds like a straightforward search problem, but event data is messy:

- "Madison Square Garden" vs "MSG" vs "The Garden" - Artists who go by different names - Events that get rescheduled, renamed, or split into multiple dates - Venue sections and rows that use different naming schemes across exchanges

We built a normalization layer that maps all variations to canonical entities. When a user searches for "MSG," they get results for Madison Square Garden. When two exchanges list the same event under slightly different names, they merge into one listing.

This normalization work isn't glamorous. It's a lot of mapping tables and edge case handling. But it's the difference between a platform that feels broken and one that feels reliable.

The daily operations

Running a ticket marketplace means dealing with:

- Exchanges changing their markup without warning (we break, scramble to fix) - Sudden traffic spikes when a popular artist announces a tour - Price wars where inventory changes every few seconds - Listings that appear valid but reference events that have been cancelled - Edge cases in every possible direction

We built extensive monitoring — not just "is the server up" but "are we getting fresh data from each exchange?" and "is our inventory count within expected ranges?" An alert fires if scrape freshness drops below our threshold for any exchange, which lets us catch issues before users notice stale data.

Architecture decisions we're glad we made

Looking back, a few early decisions paid off:

1. Separating scraping from serving — the scraper infrastructure and the user-facing platform are decoupled. Scrapers can crash without affecting the site. 2. Event sourcing for price changes — we store every price update, not just the current price. This makes debugging easy and analytics powerful. 3. Exchange-agnostic data model — adding a new exchange means writing one new scraper worker. The core platform doesn't change. 4. Aggressive caching with smart invalidation — the user-facing site is fast because we cache heavily, but stale-while-revalidate patterns keep things fresh.

The lesson from this project: the hard part of a marketplace isn't the buying and selling. It's the data quality. If your inventory data is unreliable, nothing else matters.

Have a similar challenge?

We build production-grade software for companies that need it done right.

Let's Talk