2026-02-10 10:00:14 +01:00
---
# nuzlocke-tracker-q5vd
title: Explore automated data sources for encounter data
2026-02-10 15:16:26 +01:00
status: completed
2026-02-10 10:00:14 +01:00
type: task
2026-02-10 15:16:26 +01:00
priority: normal
2026-02-10 10:00:14 +01:00
created_at: 2026-02-10T08:58:47Z
2026-02-10 15:16:26 +01:00
updated_at: 2026-02-10T14:10:50Z
2026-02-10 10:00:14 +01:00
parent: nuzlocke-tracker-rzu4
---
Research and evaluate automated or semi-automated options for populating encounter data, especially for games where PokeAPI has no data (Gen 8+).
2026-02-10 15:16:26 +01:00
## Games needing data
These games currently have empty/placeholder seed data:
- Let's Go Pikachu / Eevee
- Sword / Shield
- Brilliant Diamond / Shining Pearl
- Legends: Arceus
- Scarlet / Violet
- Legends: Z-A (upcoming, Oct 2025)
## Research findings
### Ruled out
- **PokeAPI** — encounter data last updated ~9 years ago (Sun/Moon era). No Gen 8+ encounter data.
- **veekun/pokedex** — last commit 5 years ago, similar dataset to PokeAPI. No advantage.
- **PokemonDB (pokemondb.net)** — covers all gens but encounter rates are NOT percentile (just rarity labels like "common", "uncommon"). Not suitable for our seed format which uses exact percentages.
### Viable sources
#### PokeDB (pokedb.org) — RECOMMENDED
- Covers **all generations ** including Galar (Sw/Sh), Hisui (Legends Arceus), Paldea (Sc/Vi), and Let's Go
- **Percentile encounter rates** that sum to 100% per method — exactly what our seed format needs
- Rich data model with 60+ fields per encounter (documented at /editors/docs/data-model/tables/encounters/)
- Supports 6 rate variants: overall-only, weather-percentages, time-and-weather-checks, seasons, time-of-day, probability-weights
- Sub-area support for complex locations (e.g., Mount Coronet has floor-by-floor breakdown)
- Version-specific rates (e.g., different rates for Sword vs Shield, HeartGold vs SoulSilver)
- Includes trades, swarm encounters, special methods (headbutt, honey trees, pokeradar, etc.)
- `robots.txt` : very permissive — only disallows `/private/` , allows everything else
- URL pattern: `/locations/{region}/{location-name}/`
- Region index pages list all locations: `/locations/{region}/`
- Game version abbreviations: SW/SH, BD/SP, D/P/PL, S/V, LGP/LGE, etc.
#### Bulbapedia (backup)
- MediaWiki-based, covers all generations
- 5-second crawl delay in robots.txt
- Inconsistent table format across generations
#### Serebii (backup)
- Very permissive robots.txt
- Mixes version data on same page, harder to parse
#### pkNX (alternative approach)
- Most accurate data (from game files), but requires ROM dumps
- Legal gray area, FlatBuffer conversion needed
## Recommendation
**PokeDB.org** is the ideal scraping target:
1. Percentile encounter rates matching our seed format
2. Covers all games we need data for
3. Very permissive robots.txt (only `/private/` disallowed)
4. Consistent, well-documented data model
5. Location index pages make discovery easy
6. Sub-areas and version-specific data handled cleanly
7. Rate-limited scraping acceptable (user confirmed)
## Implementation approach
Build a scraper (Go, to match existing tooling) that:
1. Fetches the region index page for each game/region needing data
2. Discovers all location URLs from the index
3. Scrapes each location page for encounter tables
4. Parses encounter method, pokemon name, game version, rate, level range
5. Maps Pokemon names to pokeapi IDs (from our existing `pokemon.json` )
6. Handles sub-areas (either flatten or use children in seed format)
7. Outputs data in the existing `{game}.json` seed format
8. Respects rate limiting (2+ second delay between requests, disk-cache responses)
## Notes
- The existing Go tool (`tools/fetch-pokeapi/` ) has a good HTTP client with caching and rate limiting that can be reused
- Output format must match the existing `{game}.json` structure (routes with encounters)
- Pokemon name → pokeapi_id mapping can use the existing `pokemon.json` as a lookup table
- PokeDB uses probability weights for Sc/Vi instead of percentages — will need conversion