- Complete exploration of automated data sources (q5vd): PokeDB.org identified as ideal single source of truth with JSON data export - Add bean for PokeDB.org data import tool (bs05) - Add bean for improving encounter rate display with time/weather variants (oqfo) - Mark branding cleanup bean (xvaw) as completed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
86 lines
3.8 KiB
Markdown
86 lines
3.8 KiB
Markdown
---
|
|
# nuzlocke-tracker-q5vd
|
|
title: Explore automated data sources for encounter data
|
|
status: completed
|
|
type: task
|
|
priority: normal
|
|
created_at: 2026-02-10T08:58:47Z
|
|
updated_at: 2026-02-10T14:10:50Z
|
|
parent: nuzlocke-tracker-rzu4
|
|
---
|
|
|
|
Research and evaluate automated or semi-automated options for populating encounter data, especially for games where PokeAPI has no data (Gen 8+).
|
|
|
|
## Games needing data
|
|
|
|
These games currently have empty/placeholder seed data:
|
|
- Let's Go Pikachu / Eevee
|
|
- Sword / Shield
|
|
- Brilliant Diamond / Shining Pearl
|
|
- Legends: Arceus
|
|
- Scarlet / Violet
|
|
- Legends: Z-A (upcoming, Oct 2025)
|
|
|
|
## Research findings
|
|
|
|
### Ruled out
|
|
- **PokeAPI** — encounter data last updated ~9 years ago (Sun/Moon era). No Gen 8+ encounter data.
|
|
- **veekun/pokedex** — last commit 5 years ago, similar dataset to PokeAPI. No advantage.
|
|
- **PokemonDB (pokemondb.net)** — covers all gens but encounter rates are NOT percentile (just rarity labels like "common", "uncommon"). Not suitable for our seed format which uses exact percentages.
|
|
|
|
### Viable sources
|
|
|
|
#### PokeDB (pokedb.org) — RECOMMENDED
|
|
- Covers **all generations** including Galar (Sw/Sh), Hisui (Legends Arceus), Paldea (Sc/Vi), and Let's Go
|
|
- **Percentile encounter rates** that sum to 100% per method — exactly what our seed format needs
|
|
- Rich data model with 60+ fields per encounter (documented at /editors/docs/data-model/tables/encounters/)
|
|
- Supports 6 rate variants: overall-only, weather-percentages, time-and-weather-checks, seasons, time-of-day, probability-weights
|
|
- Sub-area support for complex locations (e.g., Mount Coronet has floor-by-floor breakdown)
|
|
- Version-specific rates (e.g., different rates for Sword vs Shield, HeartGold vs SoulSilver)
|
|
- Includes trades, swarm encounters, special methods (headbutt, honey trees, pokeradar, etc.)
|
|
- `robots.txt`: very permissive — only disallows `/private/`, allows everything else
|
|
- URL pattern: `/locations/{region}/{location-name}/`
|
|
- Region index pages list all locations: `/locations/{region}/`
|
|
- Game version abbreviations: SW/SH, BD/SP, D/P/PL, S/V, LGP/LGE, etc.
|
|
|
|
#### Bulbapedia (backup)
|
|
- MediaWiki-based, covers all generations
|
|
- 5-second crawl delay in robots.txt
|
|
- Inconsistent table format across generations
|
|
|
|
#### Serebii (backup)
|
|
- Very permissive robots.txt
|
|
- Mixes version data on same page, harder to parse
|
|
|
|
#### pkNX (alternative approach)
|
|
- Most accurate data (from game files), but requires ROM dumps
|
|
- Legal gray area, FlatBuffer conversion needed
|
|
|
|
## Recommendation
|
|
|
|
**PokeDB.org** is the ideal scraping target:
|
|
1. Percentile encounter rates matching our seed format
|
|
2. Covers all games we need data for
|
|
3. Very permissive robots.txt (only `/private/` disallowed)
|
|
4. Consistent, well-documented data model
|
|
5. Location index pages make discovery easy
|
|
6. Sub-areas and version-specific data handled cleanly
|
|
7. Rate-limited scraping acceptable (user confirmed)
|
|
|
|
## Implementation approach
|
|
|
|
Build a scraper (Go, to match existing tooling) that:
|
|
1. Fetches the region index page for each game/region needing data
|
|
2. Discovers all location URLs from the index
|
|
3. Scrapes each location page for encounter tables
|
|
4. Parses encounter method, pokemon name, game version, rate, level range
|
|
5. Maps Pokemon names to pokeapi IDs (from our existing `pokemon.json`)
|
|
6. Handles sub-areas (either flatten or use children in seed format)
|
|
7. Outputs data in the existing `{game}.json` seed format
|
|
8. Respects rate limiting (2+ second delay between requests, disk-cache responses)
|
|
|
|
## Notes
|
|
- The existing Go tool (`tools/fetch-pokeapi/`) has a good HTTP client with caching and rate limiting that can be reused
|
|
- Output format must match the existing `{game}.json` structure (routes with encounters)
|
|
- Pokemon name → pokeapi_id mapping can use the existing `pokemon.json` as a lookup table
|
|
- PokeDB uses probability weights for Sc/Vi instead of percentages — will need conversion |