Add Python tool scaffold for PokeDB data import

Set up tools/import-pokedb/ with CLI, JSON loader, and output models.
Replaces the Go/PokeAPI approach with local PokeDB.org JSON processing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Julian Tabel
2026-02-11 09:49:51 +01:00
parent 5151be785b
commit 1aa67665ff
11 changed files with 522 additions and 23 deletions

View File

@@ -1,17 +1,22 @@
---
# nuzlocke-tracker-bs05
title: Build PokeDB.org data import tool
status: draft
type: task
status: in-progress
type: feature
priority: normal
created_at: 2026-02-10T14:04:11Z
updated_at: 2026-02-10T14:31:08Z
updated_at: 2026-02-11T08:44:03Z
parent: nuzlocke-tracker-rzu4
blocking:
- nuzlocke-tracker-spx3
---
Build a Go tool that converts PokeDB.org's JSON data export into our existing seed JSON format. This replaces PokeAPI as the single source of truth for ALL games (Gen 1-9).
Build a standalone Python tool that converts PokeDB.org's JSON data export into our existing seed JSON format. This replaces PokeAPI as the single source of truth for ALL games (Gen 1-9).
Python was chosen over Go because:
- The backend is already Python, so the team is familiar with it
- We're processing local JSON files — no need for Go's concurrency
- Remains a standalone tool in `tools/import-pokedb/`, not part of the backend
## Data source
@@ -64,26 +69,15 @@ Each encounter record has:
- `visible` — overworld vs hidden encounter
- Max Raid and Tera Raid fields for special encounters
## Implementation approach
## Subtasks
### Checklist
- [ ] Set up project structure in `tools/import-pokedb/`
- [ ] Download and cache PokeDB JSON export files
- [ ] Parse PokeDB encounters, locations, location_areas, versions, pokemon_forms
- [ ] Build lookup maps: pokemon_form_identifier → pokeapi_id (using existing `pokemon.json`)
- [ ] Build lookup maps: location_area_identifier → location name + region
- [ ] Filter encounters by target game version
- [ ] Map PokeDB encounter methods to our seed format methods (73 → simplified set)
- [ ] Parse level strings ("2 - 4" → min_level: 2, max_level: 4)
- [ ] Handle rate variants per game generation:
- For now, flatten time/weather/season rates into `encounter_rate` (use the max or average)
- Preserve raw variant data for future use (see nuzlocke-tracker-oqfo)
- [ ] Group encounters by location area → route output
- [ ] Apply route ordering (use existing route_order.json or generate from location data)
- [ ] Output in existing `{game}.json` seed format
- [ ] Generate seed data for ALL games, replacing PokeAPI as the single source of truth
- [ ] Compare output against existing PokeAPI-sourced data to validate accuracy
- [ ] Run for all games and verify output
Work is broken into child task beans:
- [ ] **Set up Python tool scaffold** — project structure, CLI entry point, PokeDB JSON file loading
- [ ] **Build reference data mappings** — pokemon_form → pokeapi_id, location_area → name/region, encounter method mapping
- [ ] **Core encounter processing** — filter by game version, parse levels, handle rate variants, group by location area
- [ ] **Output seed JSON** — produce per-game JSON in existing format, integrate route ordering + special encounters
- [ ] **Validation & full generation** — compare against existing data, run for all games, fix discrepancies
## Encounter method mapping (draft)

View File

@@ -0,0 +1,30 @@
---
# nuzlocke-tracker-dqyb
title: Set up Python tool scaffold
status: in-progress
type: task
priority: normal
created_at: 2026-02-11T08:42:58Z
updated_at: 2026-02-11T08:44:03Z
parent: nuzlocke-tracker-bs05
blocking:
- nuzlocke-tracker-zno2
---
Set up the standalone Python tool project in `tools/import-pokedb/`.
## Checklist
- [x] Create `tools/import-pokedb/` directory structure
- [x] Set up `pyproject.toml` with dependencies (just stdlib should suffice for JSON processing, maybe `click` for CLI)
- [x] Create CLI entry point (`__main__.py` or similar) that accepts:
- Path to directory containing PokeDB JSON export files
- Target output directory (default: `backend/src/app/seeds/data/`)
- Optional: specific game version to generate (default: all)
- [x] Load and parse all PokeDB JSON files: `encounters.json`, `locations.json`, `location_areas.json`, `encounter_methods.json`, `versions.json`, `pokemon_forms.json`
- [x] Basic validation that all expected files are present and parseable
## Notes
- Keep it as a standalone tool, not part of the backend
- The PokeDB JSON files are downloaded manually from https://pokedb.org/data-export — no need to automate the download
- Model the CLI similarly to how `tools/fetch-pokeapi/` works (cd into dir, run the tool)

View File

@@ -0,0 +1,31 @@
---
# nuzlocke-tracker-gkcy
title: Output seed JSON
status: todo
type: task
priority: normal
created_at: 2026-02-11T08:43:21Z
updated_at: 2026-02-11T08:43:33Z
parent: nuzlocke-tracker-bs05
blocking:
- nuzlocke-tracker-vdks
---
Generate the final per-game JSON files in the existing seed format.
## Checklist
- [ ] **Apply route ordering**: Use the existing `backend/src/app/seeds/route_order.json` to assign `order` values to routes. Handle aliases (e.g. "red-blue" → "firered-leafgreen"). Log warnings for routes not in the order file.
- [ ] **Merge special encounters**: Integrate starters, gifts, fossils, and trades from `backend/src/app/seeds/special_encounters.json` into the appropriate routes.
- [ ] **Output per-game JSON**: Write `{game-slug}.json` files matching the existing format:
```json
[{"name": "Route 1", "order": 3, "encounters": [...], "children": []}]
```
- [ ] **Output games.json**: Generate the global games list from `version_groups.json` (this may already be handled by existing config, verify).
- [ ] **Output pokemon.json**: Generate the global pokemon list including all pokemon referenced in any encounter. Include pokeapi_id, national_dex, name, types, sprite_url.
- [ ] **Handle version exclusives**: Ensure encounters specific to one version in a version group only appear in that game's JSON file (e.g. FireRed exclusives vs LeafGreen exclusives).
## Notes
- The output must be a drop-in replacement for the existing files in `backend/src/app/seeds/data/`
- Boss data (`{game}-bosses.json`) is NOT generated by this tool — it's manually curated
- Evolutions data is also separate (currently from PokeAPI) — out of scope for this task

View File

@@ -0,0 +1,34 @@
---
# nuzlocke-tracker-rfg0
title: Core encounter processing
status: todo
type: task
priority: normal
created_at: 2026-02-11T08:43:12Z
updated_at: 2026-02-11T08:43:33Z
parent: nuzlocke-tracker-bs05
blocking:
- nuzlocke-tracker-gkcy
---
Implement the core logic that transforms raw PokeDB encounter records into our internal format.
## Checklist
- [ ] **Filter by game version**: Given a target game slug, select only encounters where `version_identifiers` includes that game
- [ ] **Parse level strings**: Convert "2 - 4" → min_level=2, max_level=4; "67" → min_level=67, max_level=67
- [ ] **Handle rate variants per generation**:
- Gen 1/3/6: use `rate_overall` directly as `encounter_rate`
- Gen 2/4: `rate_morning`, `rate_day`, `rate_night` — flatten to max or average for `encounter_rate`
- Gen 5: `rate_spring` through `rate_winter` — flatten similarly
- Gen 8 Sw/Sh: `weather_*_rate` fields — flatten to max
- Gen 8 Legends Arceus: `during_*` / `while_*` booleans — convert to a presence-based rate
- Gen 9 Sc/Vi: `probability_*` fields (spawn weights, not percentages) — normalize to percentages
- Preserve raw variant data in a way that nuzlocke-tracker-oqfo can use later
- [ ] **Aggregate encounters**: Group by (pokemon, method, location_area) and merge level ranges / rates where appropriate (same logic as the Go tool's aggregation)
- [ ] **Group by location area**: Collect all encounters for a location area into a route structure
- [ ] **Handle parent/child routes**: Multi-area locations (e.g. Safari Zone) should produce parent routes with children, matching the existing hierarchical format
## Notes
- Rate parsing needs to handle percentage strings like "40%" as well as bare numbers
- The Go tool aggregates encounters with the same pokemon+method at a location into a single entry with merged level ranges — replicate this

View File

@@ -0,0 +1,29 @@
---
# nuzlocke-tracker-vdks
title: Validation and full generation
status: todo
type: task
created_at: 2026-02-11T08:43:29Z
updated_at: 2026-02-11T08:43:29Z
parent: nuzlocke-tracker-bs05
---
Validate the new tool's output against existing data and generate seed data for all games.
## Checklist
- [ ] **Diff against existing data**: For games we already have PokeAPI-sourced data for, compare the PokeDB output. Identify and investigate discrepancies:
- Missing routes or encounters
- Different encounter rates
- Different level ranges
- Missing or extra pokemon
- [ ] **Fix discrepancies**: Adjust mappings, parsing, or aggregation logic to resolve legitimate differences. Document cases where PokeDB provides better/different data than PokeAPI.
- [ ] **Generate for all games**: Run the tool for every game version in `version_groups.json`. Verify output is valid JSON and structurally correct.
- [ ] **New game coverage**: For games not previously supported (or with incomplete PokeAPI data), verify the output looks reasonable by spot-checking a few routes.
- [ ] **Update route_order.json**: Add route orderings for any new games that didn't have entries. This may require manual curation.
- [ ] **Update special_encounters.json**: Add special encounters for any new games. This may require manual curation.
## Notes
- This is the final validation step before we can replace PokeAPI as the data source
- Some discrepancies are expected — PokeDB may have more complete data than PokeAPI
- Route ordering for new games will likely need manual work

View File

@@ -0,0 +1,26 @@
---
# nuzlocke-tracker-zno2
title: Build reference data mappings
status: todo
type: task
priority: normal
created_at: 2026-02-11T08:43:02Z
updated_at: 2026-02-11T08:43:33Z
parent: nuzlocke-tracker-bs05
blocking:
- nuzlocke-tracker-rfg0
---
Build the lookup maps needed to translate PokeDB identifiers into our seed format.
## Checklist
- [ ] **Pokemon form mapping**: Map `pokemon_form_identifier` (e.g. "pidgey-default", "mr-mime-default") to `pokeapi_id` using the existing `backend/src/app/seeds/data/pokemon.json` as reference. Handle naming convention differences between PokeDB and PokeAPI (may need fuzzy matching or a manual override table).
- [ ] **Location area mapping**: Map `location_area_identifier` to human-readable location names and regions using `locations.json` and `location_areas.json`. Produce names matching our existing format (e.g. "Route 1", "Viridian Forest").
- [ ] **Encounter method mapping**: Map PokeDB's 73 encounter methods to our simplified set. See the draft mapping in the parent bean. Implement as a dictionary/config that's easy to extend.
- [ ] **Version mapping**: Map PokeDB `version_identifiers` to our game slugs (should mostly be 1:1 but verify).
## Notes
- The pokemon form mapping is the trickiest part — PokeDB uses identifiers like "mr-mime-default" while our pokemon.json uses names like "Mr. Mime" and pokeapi IDs
- Log warnings for any unmapped identifiers so we can add overrides
- The `pokemon_forms.json` from PokeDB may help bridge the gap

View File

@@ -0,0 +1,115 @@
"""CLI entry point for the PokeDB import tool.
Usage:
# From repo root:
python -m import_pokedb ./pokedb-export/
# With options:
python -m import_pokedb ./pokedb-export/ --output backend/src/app/seeds/data/ --game firered
"""
from __future__ import annotations
import argparse
import sys
from pathlib import Path
from .loader import load_pokedb_data, load_seed_config
SEEDS_DIR_CANDIDATES = [
Path("backend/src/app/seeds"), # from repo root
Path("../../backend/src/app/seeds"), # from tools/import-pokedb/
]
def find_seeds_dir() -> Path:
"""Locate the backend seeds directory."""
for candidate in SEEDS_DIR_CANDIDATES:
if (candidate / "version_groups.json").exists():
return candidate.resolve()
# Fallback
return Path("backend/src/app/seeds").resolve()
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
prog="import-pokedb",
description="Convert PokeDB.org JSON data exports into nuzlocke-tracker seed format.",
)
parser.add_argument(
"pokedb_dir",
type=Path,
help="Path to directory containing PokeDB JSON export files",
)
parser.add_argument(
"--output",
type=Path,
default=None,
help="Output directory for seed JSON files (default: backend/src/app/seeds/data/)",
)
parser.add_argument(
"--game",
type=str,
default=None,
help="Generate data for a specific game slug only (default: all games)",
)
return parser
def main(argv: list[str] | None = None) -> None:
parser = build_parser()
args = parser.parse_args(argv)
pokedb_dir: Path = args.pokedb_dir
if not pokedb_dir.is_dir():
print(f"Error: {pokedb_dir} is not a directory", file=sys.stderr)
sys.exit(1)
seeds_dir = find_seeds_dir()
output_dir: Path = args.output or (seeds_dir / "data")
output_dir.mkdir(parents=True, exist_ok=True)
print(f"PokeDB data: {pokedb_dir.resolve()}")
print(f"Seeds config: {seeds_dir}")
print(f"Output: {output_dir.resolve()}")
print()
# Load PokeDB export data
pokedb = load_pokedb_data(pokedb_dir)
print(pokedb.summary())
print()
# Load existing seed configuration
config = load_seed_config(seeds_dir)
print(f"Loaded {len(config.version_groups)} version groups")
print(f"Loaded route order for {len(config.route_order)} version groups")
if config.special_encounters:
se_count = len(config.special_encounters.get("encounters", {}))
print(f"Loaded special encounters for {se_count} version groups")
print()
# Determine which games to process
target_game = args.game
if target_game:
found = False
for vg_info in config.version_groups.values():
if target_game in vg_info.get("versions", []):
found = True
break
if not found:
print(f"Error: Game '{target_game}' not found in version_groups.json", file=sys.stderr)
sys.exit(1)
print(f"Target: {target_game}")
else:
total_games = sum(
len(vg.get("versions", []))
for vg in config.version_groups.values()
)
print(f"Target: all {total_games} games")
# TODO: Processing pipeline (subtasks zno2, rfg0, gkcy)
print("\nScaffold loaded successfully. Processing pipeline not yet implemented.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,150 @@
"""Load and validate PokeDB JSON export files."""
from __future__ import annotations
import json
import sys
from pathlib import Path
from typing import Any
REQUIRED_FILES = [
"encounters.json",
"locations.json",
"location_areas.json",
"encounter_methods.json",
"versions.json",
"pokemon_forms.json",
]
class PokeDBData:
"""Container for all loaded PokeDB export data."""
def __init__(
self,
encounters: list[dict[str, Any]],
locations: list[dict[str, Any]],
location_areas: list[dict[str, Any]],
encounter_methods: list[dict[str, Any]],
versions: list[dict[str, Any]],
pokemon_forms: list[dict[str, Any]],
) -> None:
self.encounters = encounters
self.locations = locations
self.location_areas = location_areas
self.encounter_methods = encounter_methods
self.versions = versions
self.pokemon_forms = pokemon_forms
def summary(self) -> str:
return (
f"PokeDB data loaded:\n"
f" encounters: {len(self.encounters):,}\n"
f" locations: {len(self.locations):,}\n"
f" location_areas: {len(self.location_areas):,}\n"
f" encounter_methods: {len(self.encounter_methods):,}\n"
f" versions: {len(self.versions):,}\n"
f" pokemon_forms: {len(self.pokemon_forms):,}"
)
def load_pokedb_data(data_dir: Path) -> PokeDBData:
"""Load all PokeDB JSON export files from a directory.
Exits with an error message if any required files are missing or unparseable.
"""
missing = [f for f in REQUIRED_FILES if not (data_dir / f).exists()]
if missing:
print(
f"Error: Missing required PokeDB files in {data_dir}:",
file=sys.stderr,
)
for f in missing:
print(f" - {f}", file=sys.stderr)
print(
"\nDownload the JSON export from https://pokedb.org/data-export",
file=sys.stderr,
)
sys.exit(1)
def _load(filename: str) -> list[dict[str, Any]]:
path = data_dir / filename
try:
with open(path) as f:
data = json.load(f)
except json.JSONDecodeError as e:
print(f"Error: Failed to parse {path}: {e}", file=sys.stderr)
sys.exit(1)
if not isinstance(data, list):
print(
f"Error: Expected a JSON array in {path}, got {type(data).__name__}",
file=sys.stderr,
)
sys.exit(1)
return data
return PokeDBData(
encounters=_load("encounters.json"),
locations=_load("locations.json"),
location_areas=_load("location_areas.json"),
encounter_methods=_load("encounter_methods.json"),
versions=_load("versions.json"),
pokemon_forms=_load("pokemon_forms.json"),
)
class SeedConfig:
"""Container for existing seed configuration files."""
def __init__(
self,
version_groups: dict[str, Any],
route_order: dict[str, list[str]],
special_encounters: dict[str, Any] | None,
) -> None:
self.version_groups = version_groups
self.route_order = route_order
self.special_encounters = special_encounters
def load_seed_config(seeds_dir: Path) -> SeedConfig:
"""Load existing seed configuration files (version_groups, route_order, etc.).
Exits with an error message if required config files are missing.
"""
vg_path = seeds_dir / "version_groups.json"
if not vg_path.exists():
print(f"Error: version_groups.json not found at {vg_path}", file=sys.stderr)
sys.exit(1)
with open(vg_path) as f:
version_groups = json.load(f)
# Load route_order.json and resolve aliases
ro_path = seeds_dir / "route_order.json"
if not ro_path.exists():
print(f"Error: route_order.json not found at {ro_path}", file=sys.stderr)
sys.exit(1)
with open(ro_path) as f:
ro_raw = json.load(f)
route_order: dict[str, list[str]] = dict(ro_raw.get("routes", {}))
for alias, target in ro_raw.get("aliases", {}).items():
if target in route_order:
route_order[alias] = route_order[target]
# Load special_encounters.json (optional)
se_path = seeds_dir / "special_encounters.json"
special_encounters = None
if se_path.exists():
with open(se_path) as f:
special_encounters = json.load(f)
return SeedConfig(
version_groups=version_groups,
route_order=route_order,
special_encounters=special_encounters,
)

View File

@@ -0,0 +1,81 @@
"""Output data models matching the existing seed JSON format."""
from __future__ import annotations
from dataclasses import dataclass, field
@dataclass
class Encounter:
pokeapi_id: int
pokemon_name: str
method: str
encounter_rate: int
min_level: int
max_level: int
def to_dict(self) -> dict:
return {
"pokeapi_id": self.pokeapi_id,
"pokemon_name": self.pokemon_name,
"method": self.method,
"encounter_rate": self.encounter_rate,
"min_level": self.min_level,
"max_level": self.max_level,
}
@dataclass
class Route:
name: str
order: int
encounters: list[Encounter] = field(default_factory=list)
children: list[Route] = field(default_factory=list)
def to_dict(self) -> dict:
d: dict = {
"name": self.name,
"order": self.order,
"encounters": [e.to_dict() for e in self.encounters],
}
if self.children:
d["children"] = [c.to_dict() for c in self.children]
return d
@dataclass
class Game:
name: str
slug: str
generation: int
region: str
release_year: int
color: str | None = None
def to_dict(self) -> dict:
return {
"name": self.name,
"slug": self.slug,
"generation": self.generation,
"region": self.region,
"release_year": self.release_year,
"color": self.color,
}
@dataclass
class Pokemon:
pokeapi_id: int
national_dex: int
name: str
types: list[str]
sprite_url: str
def to_dict(self) -> dict:
return {
"pokeapi_id": self.pokeapi_id,
"national_dex": self.national_dex,
"name": self.name,
"types": self.types,
"sprite_url": self.sprite_url,
}

View File

@@ -0,0 +1,9 @@
[project]
name = "import-pokedb"
version = "0.1.0"
description = "Convert PokeDB.org JSON data exports into nuzlocke-tracker seed format"
requires-python = ">=3.12"
dependencies = []
[project.scripts]
import-pokedb = "import_pokedb.__main__:main"