Files
nuzlocke-tracker/.beans/nuzlocke-tracker-bs05--build-pokedborg-encounter-data-scraper.md

112 lines
5.0 KiB
Markdown
Raw Normal View History

---
# nuzlocke-tracker-bs05
title: Build PokeDB.org data import tool
status: draft
type: task
priority: normal
created_at: 2026-02-10T14:04:11Z
updated_at: 2026-02-10T14:11:06Z
parent: nuzlocke-tracker-rzu4
---
Build a Go tool that converts PokeDB.org's JSON data export into our existing seed JSON format. This replaces PokeAPI as the single source of truth for ALL games (Gen 1-9).
## Data source
PokeDB.org provides a full data export at https://pokedb.org/data-export with JSON downloads:
- `encounters.json` (69MB, 37,724 records) — all encounter data across all games
- `locations.json` — 839 locations
- `location_areas.json` — 2,672 location areas
- `encounter_methods.json` — 73 encounter methods
- `versions.json` — 82 game versions
- `pokemon_forms.json` — Pokemon forms with identifiers
**No scraping required.** Just download the JSON files and process them locally.
**Terms of use:** "Data is provided for educational, research, and non-commercial purposes." Attribution to PokeDB requested.
## Encounter data coverage
Encounter counts by version:
- Sword: 10,160 / Shield: 10,144
- Scarlet: 4,135 / Violet: 4,101
- SoulSilver: 2,492 / HeartGold: 2,475
- Shining Pearl: 2,021 / Brilliant Diamond: 2,013
- Legends Arceus: 1,756
- Black 2: 1,418 / White 2: 1,418
- Crystal: 1,375 / Alpha Sapphire: 1,338 / Platinum: 1,337
- Diamond: 1,292 / Pearl: 1,289 / Silver: 1,284 / Gold: 1,282
- LeafGreen: 987 / FireRed: 985 / White: 981 / Black: 947
- Ultra Moon: 886 / Ultra Sun: 885 / X: 880 / Y: 879
- Emerald: 763 / Let's Go Eevee: 710 / Sun: 709 / Moon: 707
- Sapphire: 707 / Ruby: 707 / Let's Go Pikachu: 690
- Blue: 528 / Red: 526 / Yellow: 496
## Data format details
Each encounter record has:
- `pokemon_form_identifier` — e.g. "pidgey-default", "mr-mime-default"
- `version_identifiers` — array of game version IDs (e.g. ["sword", "shield"])
- `location_area_identifier` — e.g. "route-01-kanto", "axews-eye"
- `encounter_method_identifier` — e.g. "walking-tall-grass", "surfing", "npc-trade"
- `levels` — string like "2 - 4" or "67"
- Rate fields vary by game generation:
- Gen 1/3/6: `rate_overall` (single percentage)
- Gen 2/4: `rate_morning`, `rate_day`, `rate_night` (time-of-day percentages)
- Gen 5: `rate_spring`, `rate_summer`, `rate_autumn`, `rate_winter` (seasonal)
- Gen 8 Sw/Sh: `weather_*_rate` fields (per-weather percentages, e.g. "40%")
- Gen 8 Legends Arceus: `during_*` and `while_*` booleans (time+weather conditions)
- Gen 9 Sc/Vi: `probability_*` fields (overworld probability weights)
- `trade_for` — Pokemon form identifier for NPC trades
- `alpha_levels` — for Legends Arceus alpha encounters
- `visible` — overworld vs hidden encounter
- Max Raid and Tera Raid fields for special encounters
## Implementation approach
### Checklist
- [ ] Set up project structure in `tools/import-pokedb/`
- [ ] Download and cache PokeDB JSON export files
- [ ] Parse PokeDB encounters, locations, location_areas, versions, pokemon_forms
- [ ] Build lookup maps: pokemon_form_identifier → pokeapi_id (using existing `pokemon.json`)
- [ ] Build lookup maps: location_area_identifier → location name + region
- [ ] Filter encounters by target game version
- [ ] Map PokeDB encounter methods to our seed format methods (73 → simplified set)
- [ ] Parse level strings ("2 - 4" → min_level: 2, max_level: 4)
- [ ] Handle rate variants per game generation:
- For now, flatten time/weather/season rates into `encounter_rate` (use the max or average)
- Preserve raw variant data for future use (see nuzlocke-tracker-oqfo)
- [ ] Group encounters by location area → route output
- [ ] Apply route ordering (use existing route_order.json or generate from location data)
- [ ] Output in existing `{game}.json` seed format
- [ ] Generate seed data for ALL games, replacing PokeAPI as the single source of truth
- [ ] Compare output against existing PokeAPI-sourced data to validate accuracy
- [ ] Run for all games and verify output
## Encounter method mapping (draft)
PokeDB method → Our seed method:
- `walking-tall-grass`, `walking-*` → "walk"
- `surfing`, `surfing-*` → "surf"
- `fishing-old-rod` → "old-rod"
- `fishing-good-rod` → "good-rod"
- `fishing-super-rod` → "super-rod"
- `fishing` → "fishing"
- `rock-smash` → "rock-smash"
- `headbutt-*` → "headbutt"
- `npc-gift`, `egg`, `revive` → "gift"
- `npc-trade` → "trade"
- `symbol-encounter` → "walk" (overworld, Gen 8+)
- `wanderer` → "walk" (overworld visible)
- `fixed-encounter`, `static-encounter` → "static"
- `swarm` → "swarm"
- `poke-radar` → "pokeradar"
- `dual-slot-mode` → "dual-slot"
- Others: TBD based on relevance
## Notes
- This tool replaces `tools/fetch-pokeapi/` as the primary data source for all games
- Pokemon form identifiers need mapping to pokeapi IDs — may need a fuzzy match since naming conventions differ
- The existing `pokemon.json` has names and pokeapi IDs we can use as a lookup
- S/V probability weights are not percentages — they represent relative spawn weights
- Legends Arceus uses boolean conditions (during_night + while_clear) rather than rates