- Complete exploration of automated data sources (q5vd): PokeDB.org identified as ideal single source of truth with JSON data export - Add bean for PokeDB.org data import tool (bs05) - Add bean for improving encounter rate display with time/weather variants (oqfo) - Mark branding cleanup bean (xvaw) as completed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3.8 KiB
3.8 KiB
title, status, type, priority, created_at, updated_at, parent
| title | status | type | priority | created_at | updated_at | parent |
|---|---|---|---|---|---|---|
| Explore automated data sources for encounter data | completed | task | normal | 2026-02-10T08:58:47Z | 2026-02-10T14:10:50Z | nuzlocke-tracker-rzu4 |
Research and evaluate automated or semi-automated options for populating encounter data, especially for games where PokeAPI has no data (Gen 8+).
Games needing data
These games currently have empty/placeholder seed data:
- Let's Go Pikachu / Eevee
- Sword / Shield
- Brilliant Diamond / Shining Pearl
- Legends: Arceus
- Scarlet / Violet
- Legends: Z-A (upcoming, Oct 2025)
Research findings
Ruled out
- PokeAPI — encounter data last updated ~9 years ago (Sun/Moon era). No Gen 8+ encounter data.
- veekun/pokedex — last commit 5 years ago, similar dataset to PokeAPI. No advantage.
- PokemonDB (pokemondb.net) — covers all gens but encounter rates are NOT percentile (just rarity labels like "common", "uncommon"). Not suitable for our seed format which uses exact percentages.
Viable sources
PokeDB (pokedb.org) — RECOMMENDED
- Covers all generations including Galar (Sw/Sh), Hisui (Legends Arceus), Paldea (Sc/Vi), and Let's Go
- Percentile encounter rates that sum to 100% per method — exactly what our seed format needs
- Rich data model with 60+ fields per encounter (documented at /editors/docs/data-model/tables/encounters/)
- Supports 6 rate variants: overall-only, weather-percentages, time-and-weather-checks, seasons, time-of-day, probability-weights
- Sub-area support for complex locations (e.g., Mount Coronet has floor-by-floor breakdown)
- Version-specific rates (e.g., different rates for Sword vs Shield, HeartGold vs SoulSilver)
- Includes trades, swarm encounters, special methods (headbutt, honey trees, pokeradar, etc.)
robots.txt: very permissive — only disallows/private/, allows everything else- URL pattern:
/locations/{region}/{location-name}/ - Region index pages list all locations:
/locations/{region}/ - Game version abbreviations: SW/SH, BD/SP, D/P/PL, S/V, LGP/LGE, etc.
Bulbapedia (backup)
- MediaWiki-based, covers all generations
- 5-second crawl delay in robots.txt
- Inconsistent table format across generations
Serebii (backup)
- Very permissive robots.txt
- Mixes version data on same page, harder to parse
pkNX (alternative approach)
- Most accurate data (from game files), but requires ROM dumps
- Legal gray area, FlatBuffer conversion needed
Recommendation
PokeDB.org is the ideal scraping target:
- Percentile encounter rates matching our seed format
- Covers all games we need data for
- Very permissive robots.txt (only
/private/disallowed) - Consistent, well-documented data model
- Location index pages make discovery easy
- Sub-areas and version-specific data handled cleanly
- Rate-limited scraping acceptable (user confirmed)
Implementation approach
Build a scraper (Go, to match existing tooling) that:
- Fetches the region index page for each game/region needing data
- Discovers all location URLs from the index
- Scrapes each location page for encounter tables
- Parses encounter method, pokemon name, game version, rate, level range
- Maps Pokemon names to pokeapi IDs (from our existing
pokemon.json) - Handles sub-areas (either flatten or use children in seed format)
- Outputs data in the existing
{game}.jsonseed format - Respects rate limiting (2+ second delay between requests, disk-cache responses)
Notes
- The existing Go tool (
tools/fetch-pokeapi/) has a good HTTP client with caching and rate limiting that can be reused - Output format must match the existing
{game}.jsonstructure (routes with encounters) - Pokemon name → pokeapi_id mapping can use the existing
pokemon.jsonas a lookup table - PokeDB uses probability weights for Sc/Vi instead of percentages — will need conversion