Files
nuzlocke-tracker/.beans/nuzlocke-tracker-q5vd--explore-automated-data-sources-for-encounter-data.md
Julian Tabel 00dead68f7 Add PokeDB.org data import bean, encounter display bean, complete data source research
- Complete exploration of automated data sources (q5vd): PokeDB.org
  identified as ideal single source of truth with JSON data export
- Add bean for PokeDB.org data import tool (bs05)
- Add bean for improving encounter rate display with time/weather
  variants (oqfo)
- Mark branding cleanup bean (xvaw) as completed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 15:16:26 +01:00

86 lines
3.8 KiB
Markdown

---
# nuzlocke-tracker-q5vd
title: Explore automated data sources for encounter data
status: completed
type: task
priority: normal
created_at: 2026-02-10T08:58:47Z
updated_at: 2026-02-10T14:10:50Z
parent: nuzlocke-tracker-rzu4
---
Research and evaluate automated or semi-automated options for populating encounter data, especially for games where PokeAPI has no data (Gen 8+).
## Games needing data
These games currently have empty/placeholder seed data:
- Let's Go Pikachu / Eevee
- Sword / Shield
- Brilliant Diamond / Shining Pearl
- Legends: Arceus
- Scarlet / Violet
- Legends: Z-A (upcoming, Oct 2025)
## Research findings
### Ruled out
- **PokeAPI** — encounter data last updated ~9 years ago (Sun/Moon era). No Gen 8+ encounter data.
- **veekun/pokedex** — last commit 5 years ago, similar dataset to PokeAPI. No advantage.
- **PokemonDB (pokemondb.net)** — covers all gens but encounter rates are NOT percentile (just rarity labels like "common", "uncommon"). Not suitable for our seed format which uses exact percentages.
### Viable sources
#### PokeDB (pokedb.org) — RECOMMENDED
- Covers **all generations** including Galar (Sw/Sh), Hisui (Legends Arceus), Paldea (Sc/Vi), and Let's Go
- **Percentile encounter rates** that sum to 100% per method — exactly what our seed format needs
- Rich data model with 60+ fields per encounter (documented at /editors/docs/data-model/tables/encounters/)
- Supports 6 rate variants: overall-only, weather-percentages, time-and-weather-checks, seasons, time-of-day, probability-weights
- Sub-area support for complex locations (e.g., Mount Coronet has floor-by-floor breakdown)
- Version-specific rates (e.g., different rates for Sword vs Shield, HeartGold vs SoulSilver)
- Includes trades, swarm encounters, special methods (headbutt, honey trees, pokeradar, etc.)
- `robots.txt`: very permissive — only disallows `/private/`, allows everything else
- URL pattern: `/locations/{region}/{location-name}/`
- Region index pages list all locations: `/locations/{region}/`
- Game version abbreviations: SW/SH, BD/SP, D/P/PL, S/V, LGP/LGE, etc.
#### Bulbapedia (backup)
- MediaWiki-based, covers all generations
- 5-second crawl delay in robots.txt
- Inconsistent table format across generations
#### Serebii (backup)
- Very permissive robots.txt
- Mixes version data on same page, harder to parse
#### pkNX (alternative approach)
- Most accurate data (from game files), but requires ROM dumps
- Legal gray area, FlatBuffer conversion needed
## Recommendation
**PokeDB.org** is the ideal scraping target:
1. Percentile encounter rates matching our seed format
2. Covers all games we need data for
3. Very permissive robots.txt (only `/private/` disallowed)
4. Consistent, well-documented data model
5. Location index pages make discovery easy
6. Sub-areas and version-specific data handled cleanly
7. Rate-limited scraping acceptable (user confirmed)
## Implementation approach
Build a scraper (Go, to match existing tooling) that:
1. Fetches the region index page for each game/region needing data
2. Discovers all location URLs from the index
3. Scrapes each location page for encounter tables
4. Parses encounter method, pokemon name, game version, rate, level range
5. Maps Pokemon names to pokeapi IDs (from our existing `pokemon.json`)
6. Handles sub-areas (either flatten or use children in seed format)
7. Outputs data in the existing `{game}.json` seed format
8. Respects rate limiting (2+ second delay between requests, disk-cache responses)
## Notes
- The existing Go tool (`tools/fetch-pokeapi/`) has a good HTTP client with caching and rate limiting that can be reused
- Output format must match the existing `{game}.json` structure (routes with encounters)
- Pokemon name → pokeapi_id mapping can use the existing `pokemon.json` as a lookup table
- PokeDB uses probability weights for Sc/Vi instead of percentages — will need conversion