Files
nuzlocke-tracker/.beans/nuzlocke-tracker-bs05--build-pokedborg-encounter-data-scraper.md
Julian Tabel 1aa67665ff Add Python tool scaffold for PokeDB data import
Set up tools/import-pokedb/ with CLI, JSON loader, and output models.
Replaces the Go/PokeAPI approach with local PokeDB.org JSON processing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 09:49:51 +01:00

4.8 KiB

title, status, type, priority, created_at, updated_at, parent, blocking
title status type priority created_at updated_at parent blocking
Build PokeDB.org data import tool in-progress feature normal 2026-02-10T14:04:11Z 2026-02-11T08:44:03Z nuzlocke-tracker-rzu4
nuzlocke-tracker-spx3

Build a standalone Python tool that converts PokeDB.org's JSON data export into our existing seed JSON format. This replaces PokeAPI as the single source of truth for ALL games (Gen 1-9).

Python was chosen over Go because:

  • The backend is already Python, so the team is familiar with it
  • We're processing local JSON files — no need for Go's concurrency
  • Remains a standalone tool in tools/import-pokedb/, not part of the backend

Data source

PokeDB.org provides a full data export at https://pokedb.org/data-export with JSON downloads:

  • encounters.json (69MB, 37,724 records) — all encounter data across all games
  • locations.json — 839 locations
  • location_areas.json — 2,672 location areas
  • encounter_methods.json — 73 encounter methods
  • versions.json — 82 game versions
  • pokemon_forms.json — Pokemon forms with identifiers

No scraping required. Just download the JSON files and process them locally.

Terms of use: "Data is provided for educational, research, and non-commercial purposes." Attribution to PokeDB requested.

Encounter data coverage

Encounter counts by version:

  • Sword: 10,160 / Shield: 10,144
  • Scarlet: 4,135 / Violet: 4,101
  • SoulSilver: 2,492 / HeartGold: 2,475
  • Shining Pearl: 2,021 / Brilliant Diamond: 2,013
  • Legends Arceus: 1,756
  • Black 2: 1,418 / White 2: 1,418
  • Crystal: 1,375 / Alpha Sapphire: 1,338 / Platinum: 1,337
  • Diamond: 1,292 / Pearl: 1,289 / Silver: 1,284 / Gold: 1,282
  • LeafGreen: 987 / FireRed: 985 / White: 981 / Black: 947
  • Ultra Moon: 886 / Ultra Sun: 885 / X: 880 / Y: 879
  • Emerald: 763 / Let's Go Eevee: 710 / Sun: 709 / Moon: 707
  • Sapphire: 707 / Ruby: 707 / Let's Go Pikachu: 690
  • Blue: 528 / Red: 526 / Yellow: 496

Data format details

Each encounter record has:

  • pokemon_form_identifier — e.g. "pidgey-default", "mr-mime-default"
  • version_identifiers — array of game version IDs (e.g. ["sword", "shield"])
  • location_area_identifier — e.g. "route-01-kanto", "axews-eye"
  • encounter_method_identifier — e.g. "walking-tall-grass", "surfing", "npc-trade"
  • levels — string like "2 - 4" or "67"
  • Rate fields vary by game generation:
    • Gen 1/3/6: rate_overall (single percentage)
    • Gen 2/4: rate_morning, rate_day, rate_night (time-of-day percentages)
    • Gen 5: rate_spring, rate_summer, rate_autumn, rate_winter (seasonal)
    • Gen 8 Sw/Sh: weather_*_rate fields (per-weather percentages, e.g. "40%")
    • Gen 8 Legends Arceus: during_* and while_* booleans (time+weather conditions)
    • Gen 9 Sc/Vi: probability_* fields (overworld probability weights)
  • trade_for — Pokemon form identifier for NPC trades
  • alpha_levels — for Legends Arceus alpha encounters
  • visible — overworld vs hidden encounter
  • Max Raid and Tera Raid fields for special encounters

Subtasks

Work is broken into child task beans:

  • Set up Python tool scaffold — project structure, CLI entry point, PokeDB JSON file loading
  • Build reference data mappings — pokemon_form → pokeapi_id, location_area → name/region, encounter method mapping
  • Core encounter processing — filter by game version, parse levels, handle rate variants, group by location area
  • Output seed JSON — produce per-game JSON in existing format, integrate route ordering + special encounters
  • Validation & full generation — compare against existing data, run for all games, fix discrepancies

Encounter method mapping (draft)

PokeDB method → Our seed method:

  • walking-tall-grass, walking-* → "walk"
  • surfing, surfing-* → "surf"
  • fishing-old-rod → "old-rod"
  • fishing-good-rod → "good-rod"
  • fishing-super-rod → "super-rod"
  • fishing → "fishing"
  • rock-smash → "rock-smash"
  • headbutt-* → "headbutt"
  • npc-gift, egg, revive → "gift"
  • npc-trade → "trade"
  • symbol-encounter → "walk" (overworld, Gen 8+)
  • wanderer → "walk" (overworld visible)
  • fixed-encounter, static-encounter → "static"
  • swarm → "swarm"
  • poke-radar → "pokeradar"
  • dual-slot-mode → "dual-slot"
  • Others: TBD based on relevance

Notes

  • This tool replaces tools/fetch-pokeapi/ as the primary data source for all games
  • Pokemon form identifiers need mapping to pokeapi IDs — may need a fuzzy match since naming conventions differ
  • The existing pokemon.json has names and pokeapi IDs we can use as a lookup
  • S/V probability weights are not percentages — they represent relative spawn weights
  • Legends Arceus uses boolean conditions (during_night + while_clear) rather than rates