diff --git a/.beans/nuzlocke-tracker-bs05--build-pokedborg-encounter-data-scraper.md b/.beans/nuzlocke-tracker-bs05--build-pokedborg-encounter-data-scraper.md index 292148c..c2bc34f 100644 --- a/.beans/nuzlocke-tracker-bs05--build-pokedborg-encounter-data-scraper.md +++ b/.beans/nuzlocke-tracker-bs05--build-pokedborg-encounter-data-scraper.md @@ -1,17 +1,22 @@ --- # nuzlocke-tracker-bs05 title: Build PokeDB.org data import tool -status: draft -type: task +status: in-progress +type: feature priority: normal created_at: 2026-02-10T14:04:11Z -updated_at: 2026-02-10T14:31:08Z +updated_at: 2026-02-11T08:44:03Z parent: nuzlocke-tracker-rzu4 blocking: - nuzlocke-tracker-spx3 --- -Build a Go tool that converts PokeDB.org's JSON data export into our existing seed JSON format. This replaces PokeAPI as the single source of truth for ALL games (Gen 1-9). +Build a standalone Python tool that converts PokeDB.org's JSON data export into our existing seed JSON format. This replaces PokeAPI as the single source of truth for ALL games (Gen 1-9). + +Python was chosen over Go because: +- The backend is already Python, so the team is familiar with it +- We're processing local JSON files — no need for Go's concurrency +- Remains a standalone tool in `tools/import-pokedb/`, not part of the backend ## Data source @@ -64,26 +69,15 @@ Each encounter record has: - `visible` — overworld vs hidden encounter - Max Raid and Tera Raid fields for special encounters -## Implementation approach +## Subtasks -### Checklist -- [ ] Set up project structure in `tools/import-pokedb/` -- [ ] Download and cache PokeDB JSON export files -- [ ] Parse PokeDB encounters, locations, location_areas, versions, pokemon_forms -- [ ] Build lookup maps: pokemon_form_identifier → pokeapi_id (using existing `pokemon.json`) -- [ ] Build lookup maps: location_area_identifier → location name + region -- [ ] Filter encounters by target game version -- [ ] Map PokeDB encounter methods to our seed format methods (73 → simplified set) -- [ ] Parse level strings ("2 - 4" → min_level: 2, max_level: 4) -- [ ] Handle rate variants per game generation: - - For now, flatten time/weather/season rates into `encounter_rate` (use the max or average) - - Preserve raw variant data for future use (see nuzlocke-tracker-oqfo) -- [ ] Group encounters by location area → route output -- [ ] Apply route ordering (use existing route_order.json or generate from location data) -- [ ] Output in existing `{game}.json` seed format -- [ ] Generate seed data for ALL games, replacing PokeAPI as the single source of truth -- [ ] Compare output against existing PokeAPI-sourced data to validate accuracy -- [ ] Run for all games and verify output +Work is broken into child task beans: + +- [ ] **Set up Python tool scaffold** — project structure, CLI entry point, PokeDB JSON file loading +- [ ] **Build reference data mappings** — pokemon_form → pokeapi_id, location_area → name/region, encounter method mapping +- [ ] **Core encounter processing** — filter by game version, parse levels, handle rate variants, group by location area +- [ ] **Output seed JSON** — produce per-game JSON in existing format, integrate route ordering + special encounters +- [ ] **Validation & full generation** — compare against existing data, run for all games, fix discrepancies ## Encounter method mapping (draft) diff --git a/.beans/nuzlocke-tracker-dqyb--set-up-python-tool-scaffold.md b/.beans/nuzlocke-tracker-dqyb--set-up-python-tool-scaffold.md new file mode 100644 index 0000000..6d9d3fb --- /dev/null +++ b/.beans/nuzlocke-tracker-dqyb--set-up-python-tool-scaffold.md @@ -0,0 +1,30 @@ +--- +# nuzlocke-tracker-dqyb +title: Set up Python tool scaffold +status: in-progress +type: task +priority: normal +created_at: 2026-02-11T08:42:58Z +updated_at: 2026-02-11T08:44:03Z +parent: nuzlocke-tracker-bs05 +blocking: + - nuzlocke-tracker-zno2 +--- + +Set up the standalone Python tool project in `tools/import-pokedb/`. + +## Checklist + +- [x] Create `tools/import-pokedb/` directory structure +- [x] Set up `pyproject.toml` with dependencies (just stdlib should suffice for JSON processing, maybe `click` for CLI) +- [x] Create CLI entry point (`__main__.py` or similar) that accepts: + - Path to directory containing PokeDB JSON export files + - Target output directory (default: `backend/src/app/seeds/data/`) + - Optional: specific game version to generate (default: all) +- [x] Load and parse all PokeDB JSON files: `encounters.json`, `locations.json`, `location_areas.json`, `encounter_methods.json`, `versions.json`, `pokemon_forms.json` +- [x] Basic validation that all expected files are present and parseable + +## Notes +- Keep it as a standalone tool, not part of the backend +- The PokeDB JSON files are downloaded manually from https://pokedb.org/data-export — no need to automate the download +- Model the CLI similarly to how `tools/fetch-pokeapi/` works (cd into dir, run the tool) \ No newline at end of file diff --git a/.beans/nuzlocke-tracker-gkcy--output-seed-json.md b/.beans/nuzlocke-tracker-gkcy--output-seed-json.md new file mode 100644 index 0000000..54ec3fc --- /dev/null +++ b/.beans/nuzlocke-tracker-gkcy--output-seed-json.md @@ -0,0 +1,31 @@ +--- +# nuzlocke-tracker-gkcy +title: Output seed JSON +status: todo +type: task +priority: normal +created_at: 2026-02-11T08:43:21Z +updated_at: 2026-02-11T08:43:33Z +parent: nuzlocke-tracker-bs05 +blocking: + - nuzlocke-tracker-vdks +--- + +Generate the final per-game JSON files in the existing seed format. + +## Checklist + +- [ ] **Apply route ordering**: Use the existing `backend/src/app/seeds/route_order.json` to assign `order` values to routes. Handle aliases (e.g. "red-blue" → "firered-leafgreen"). Log warnings for routes not in the order file. +- [ ] **Merge special encounters**: Integrate starters, gifts, fossils, and trades from `backend/src/app/seeds/special_encounters.json` into the appropriate routes. +- [ ] **Output per-game JSON**: Write `{game-slug}.json` files matching the existing format: + ```json + [{"name": "Route 1", "order": 3, "encounters": [...], "children": []}] + ``` +- [ ] **Output games.json**: Generate the global games list from `version_groups.json` (this may already be handled by existing config, verify). +- [ ] **Output pokemon.json**: Generate the global pokemon list including all pokemon referenced in any encounter. Include pokeapi_id, national_dex, name, types, sprite_url. +- [ ] **Handle version exclusives**: Ensure encounters specific to one version in a version group only appear in that game's JSON file (e.g. FireRed exclusives vs LeafGreen exclusives). + +## Notes +- The output must be a drop-in replacement for the existing files in `backend/src/app/seeds/data/` +- Boss data (`{game}-bosses.json`) is NOT generated by this tool — it's manually curated +- Evolutions data is also separate (currently from PokeAPI) — out of scope for this task \ No newline at end of file diff --git a/.beans/nuzlocke-tracker-rfg0--core-encounter-processing.md b/.beans/nuzlocke-tracker-rfg0--core-encounter-processing.md new file mode 100644 index 0000000..176d6f1 --- /dev/null +++ b/.beans/nuzlocke-tracker-rfg0--core-encounter-processing.md @@ -0,0 +1,34 @@ +--- +# nuzlocke-tracker-rfg0 +title: Core encounter processing +status: todo +type: task +priority: normal +created_at: 2026-02-11T08:43:12Z +updated_at: 2026-02-11T08:43:33Z +parent: nuzlocke-tracker-bs05 +blocking: + - nuzlocke-tracker-gkcy +--- + +Implement the core logic that transforms raw PokeDB encounter records into our internal format. + +## Checklist + +- [ ] **Filter by game version**: Given a target game slug, select only encounters where `version_identifiers` includes that game +- [ ] **Parse level strings**: Convert "2 - 4" → min_level=2, max_level=4; "67" → min_level=67, max_level=67 +- [ ] **Handle rate variants per generation**: + - Gen 1/3/6: use `rate_overall` directly as `encounter_rate` + - Gen 2/4: `rate_morning`, `rate_day`, `rate_night` — flatten to max or average for `encounter_rate` + - Gen 5: `rate_spring` through `rate_winter` — flatten similarly + - Gen 8 Sw/Sh: `weather_*_rate` fields — flatten to max + - Gen 8 Legends Arceus: `during_*` / `while_*` booleans — convert to a presence-based rate + - Gen 9 Sc/Vi: `probability_*` fields (spawn weights, not percentages) — normalize to percentages + - Preserve raw variant data in a way that nuzlocke-tracker-oqfo can use later +- [ ] **Aggregate encounters**: Group by (pokemon, method, location_area) and merge level ranges / rates where appropriate (same logic as the Go tool's aggregation) +- [ ] **Group by location area**: Collect all encounters for a location area into a route structure +- [ ] **Handle parent/child routes**: Multi-area locations (e.g. Safari Zone) should produce parent routes with children, matching the existing hierarchical format + +## Notes +- Rate parsing needs to handle percentage strings like "40%" as well as bare numbers +- The Go tool aggregates encounters with the same pokemon+method at a location into a single entry with merged level ranges — replicate this \ No newline at end of file diff --git a/.beans/nuzlocke-tracker-vdks--validation-and-full-generation.md b/.beans/nuzlocke-tracker-vdks--validation-and-full-generation.md new file mode 100644 index 0000000..4d95bed --- /dev/null +++ b/.beans/nuzlocke-tracker-vdks--validation-and-full-generation.md @@ -0,0 +1,29 @@ +--- +# nuzlocke-tracker-vdks +title: Validation and full generation +status: todo +type: task +created_at: 2026-02-11T08:43:29Z +updated_at: 2026-02-11T08:43:29Z +parent: nuzlocke-tracker-bs05 +--- + +Validate the new tool's output against existing data and generate seed data for all games. + +## Checklist + +- [ ] **Diff against existing data**: For games we already have PokeAPI-sourced data for, compare the PokeDB output. Identify and investigate discrepancies: + - Missing routes or encounters + - Different encounter rates + - Different level ranges + - Missing or extra pokemon +- [ ] **Fix discrepancies**: Adjust mappings, parsing, or aggregation logic to resolve legitimate differences. Document cases where PokeDB provides better/different data than PokeAPI. +- [ ] **Generate for all games**: Run the tool for every game version in `version_groups.json`. Verify output is valid JSON and structurally correct. +- [ ] **New game coverage**: For games not previously supported (or with incomplete PokeAPI data), verify the output looks reasonable by spot-checking a few routes. +- [ ] **Update route_order.json**: Add route orderings for any new games that didn't have entries. This may require manual curation. +- [ ] **Update special_encounters.json**: Add special encounters for any new games. This may require manual curation. + +## Notes +- This is the final validation step before we can replace PokeAPI as the data source +- Some discrepancies are expected — PokeDB may have more complete data than PokeAPI +- Route ordering for new games will likely need manual work \ No newline at end of file diff --git a/.beans/nuzlocke-tracker-zno2--build-reference-data-mappings.md b/.beans/nuzlocke-tracker-zno2--build-reference-data-mappings.md new file mode 100644 index 0000000..e5ac9cd --- /dev/null +++ b/.beans/nuzlocke-tracker-zno2--build-reference-data-mappings.md @@ -0,0 +1,26 @@ +--- +# nuzlocke-tracker-zno2 +title: Build reference data mappings +status: todo +type: task +priority: normal +created_at: 2026-02-11T08:43:02Z +updated_at: 2026-02-11T08:43:33Z +parent: nuzlocke-tracker-bs05 +blocking: + - nuzlocke-tracker-rfg0 +--- + +Build the lookup maps needed to translate PokeDB identifiers into our seed format. + +## Checklist + +- [ ] **Pokemon form mapping**: Map `pokemon_form_identifier` (e.g. "pidgey-default", "mr-mime-default") to `pokeapi_id` using the existing `backend/src/app/seeds/data/pokemon.json` as reference. Handle naming convention differences between PokeDB and PokeAPI (may need fuzzy matching or a manual override table). +- [ ] **Location area mapping**: Map `location_area_identifier` to human-readable location names and regions using `locations.json` and `location_areas.json`. Produce names matching our existing format (e.g. "Route 1", "Viridian Forest"). +- [ ] **Encounter method mapping**: Map PokeDB's 73 encounter methods to our simplified set. See the draft mapping in the parent bean. Implement as a dictionary/config that's easy to extend. +- [ ] **Version mapping**: Map PokeDB `version_identifiers` to our game slugs (should mostly be 1:1 but verify). + +## Notes +- The pokemon form mapping is the trickiest part — PokeDB uses identifiers like "mr-mime-default" while our pokemon.json uses names like "Mr. Mime" and pokeapi IDs +- Log warnings for any unmapped identifiers so we can add overrides +- The `pokemon_forms.json` from PokeDB may help bridge the gap \ No newline at end of file diff --git a/tools/import-pokedb/import_pokedb/__init__.py b/tools/import-pokedb/import_pokedb/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tools/import-pokedb/import_pokedb/__main__.py b/tools/import-pokedb/import_pokedb/__main__.py new file mode 100644 index 0000000..603daf3 --- /dev/null +++ b/tools/import-pokedb/import_pokedb/__main__.py @@ -0,0 +1,115 @@ +"""CLI entry point for the PokeDB import tool. + +Usage: + # From repo root: + python -m import_pokedb ./pokedb-export/ + + # With options: + python -m import_pokedb ./pokedb-export/ --output backend/src/app/seeds/data/ --game firered +""" + +from __future__ import annotations + +import argparse +import sys +from pathlib import Path + +from .loader import load_pokedb_data, load_seed_config + +SEEDS_DIR_CANDIDATES = [ + Path("backend/src/app/seeds"), # from repo root + Path("../../backend/src/app/seeds"), # from tools/import-pokedb/ +] + + +def find_seeds_dir() -> Path: + """Locate the backend seeds directory.""" + for candidate in SEEDS_DIR_CANDIDATES: + if (candidate / "version_groups.json").exists(): + return candidate.resolve() + # Fallback + return Path("backend/src/app/seeds").resolve() + + +def build_parser() -> argparse.ArgumentParser: + parser = argparse.ArgumentParser( + prog="import-pokedb", + description="Convert PokeDB.org JSON data exports into nuzlocke-tracker seed format.", + ) + parser.add_argument( + "pokedb_dir", + type=Path, + help="Path to directory containing PokeDB JSON export files", + ) + parser.add_argument( + "--output", + type=Path, + default=None, + help="Output directory for seed JSON files (default: backend/src/app/seeds/data/)", + ) + parser.add_argument( + "--game", + type=str, + default=None, + help="Generate data for a specific game slug only (default: all games)", + ) + return parser + + +def main(argv: list[str] | None = None) -> None: + parser = build_parser() + args = parser.parse_args(argv) + + pokedb_dir: Path = args.pokedb_dir + if not pokedb_dir.is_dir(): + print(f"Error: {pokedb_dir} is not a directory", file=sys.stderr) + sys.exit(1) + + seeds_dir = find_seeds_dir() + output_dir: Path = args.output or (seeds_dir / "data") + output_dir.mkdir(parents=True, exist_ok=True) + + print(f"PokeDB data: {pokedb_dir.resolve()}") + print(f"Seeds config: {seeds_dir}") + print(f"Output: {output_dir.resolve()}") + print() + + # Load PokeDB export data + pokedb = load_pokedb_data(pokedb_dir) + print(pokedb.summary()) + print() + + # Load existing seed configuration + config = load_seed_config(seeds_dir) + print(f"Loaded {len(config.version_groups)} version groups") + print(f"Loaded route order for {len(config.route_order)} version groups") + if config.special_encounters: + se_count = len(config.special_encounters.get("encounters", {})) + print(f"Loaded special encounters for {se_count} version groups") + print() + + # Determine which games to process + target_game = args.game + if target_game: + found = False + for vg_info in config.version_groups.values(): + if target_game in vg_info.get("versions", []): + found = True + break + if not found: + print(f"Error: Game '{target_game}' not found in version_groups.json", file=sys.stderr) + sys.exit(1) + print(f"Target: {target_game}") + else: + total_games = sum( + len(vg.get("versions", [])) + for vg in config.version_groups.values() + ) + print(f"Target: all {total_games} games") + + # TODO: Processing pipeline (subtasks zno2, rfg0, gkcy) + print("\nScaffold loaded successfully. Processing pipeline not yet implemented.") + + +if __name__ == "__main__": + main() diff --git a/tools/import-pokedb/import_pokedb/loader.py b/tools/import-pokedb/import_pokedb/loader.py new file mode 100644 index 0000000..22632a8 --- /dev/null +++ b/tools/import-pokedb/import_pokedb/loader.py @@ -0,0 +1,150 @@ +"""Load and validate PokeDB JSON export files.""" + +from __future__ import annotations + +import json +import sys +from pathlib import Path +from typing import Any + +REQUIRED_FILES = [ + "encounters.json", + "locations.json", + "location_areas.json", + "encounter_methods.json", + "versions.json", + "pokemon_forms.json", +] + + +class PokeDBData: + """Container for all loaded PokeDB export data.""" + + def __init__( + self, + encounters: list[dict[str, Any]], + locations: list[dict[str, Any]], + location_areas: list[dict[str, Any]], + encounter_methods: list[dict[str, Any]], + versions: list[dict[str, Any]], + pokemon_forms: list[dict[str, Any]], + ) -> None: + self.encounters = encounters + self.locations = locations + self.location_areas = location_areas + self.encounter_methods = encounter_methods + self.versions = versions + self.pokemon_forms = pokemon_forms + + def summary(self) -> str: + return ( + f"PokeDB data loaded:\n" + f" encounters: {len(self.encounters):,}\n" + f" locations: {len(self.locations):,}\n" + f" location_areas: {len(self.location_areas):,}\n" + f" encounter_methods: {len(self.encounter_methods):,}\n" + f" versions: {len(self.versions):,}\n" + f" pokemon_forms: {len(self.pokemon_forms):,}" + ) + + +def load_pokedb_data(data_dir: Path) -> PokeDBData: + """Load all PokeDB JSON export files from a directory. + + Exits with an error message if any required files are missing or unparseable. + """ + missing = [f for f in REQUIRED_FILES if not (data_dir / f).exists()] + if missing: + print( + f"Error: Missing required PokeDB files in {data_dir}:", + file=sys.stderr, + ) + for f in missing: + print(f" - {f}", file=sys.stderr) + print( + "\nDownload the JSON export from https://pokedb.org/data-export", + file=sys.stderr, + ) + sys.exit(1) + + def _load(filename: str) -> list[dict[str, Any]]: + path = data_dir / filename + try: + with open(path) as f: + data = json.load(f) + except json.JSONDecodeError as e: + print(f"Error: Failed to parse {path}: {e}", file=sys.stderr) + sys.exit(1) + + if not isinstance(data, list): + print( + f"Error: Expected a JSON array in {path}, got {type(data).__name__}", + file=sys.stderr, + ) + sys.exit(1) + + return data + + return PokeDBData( + encounters=_load("encounters.json"), + locations=_load("locations.json"), + location_areas=_load("location_areas.json"), + encounter_methods=_load("encounter_methods.json"), + versions=_load("versions.json"), + pokemon_forms=_load("pokemon_forms.json"), + ) + + +class SeedConfig: + """Container for existing seed configuration files.""" + + def __init__( + self, + version_groups: dict[str, Any], + route_order: dict[str, list[str]], + special_encounters: dict[str, Any] | None, + ) -> None: + self.version_groups = version_groups + self.route_order = route_order + self.special_encounters = special_encounters + + +def load_seed_config(seeds_dir: Path) -> SeedConfig: + """Load existing seed configuration files (version_groups, route_order, etc.). + + Exits with an error message if required config files are missing. + """ + vg_path = seeds_dir / "version_groups.json" + if not vg_path.exists(): + print(f"Error: version_groups.json not found at {vg_path}", file=sys.stderr) + sys.exit(1) + + with open(vg_path) as f: + version_groups = json.load(f) + + # Load route_order.json and resolve aliases + ro_path = seeds_dir / "route_order.json" + if not ro_path.exists(): + print(f"Error: route_order.json not found at {ro_path}", file=sys.stderr) + sys.exit(1) + + with open(ro_path) as f: + ro_raw = json.load(f) + + route_order: dict[str, list[str]] = dict(ro_raw.get("routes", {})) + for alias, target in ro_raw.get("aliases", {}).items(): + if target in route_order: + route_order[alias] = route_order[target] + + # Load special_encounters.json (optional) + se_path = seeds_dir / "special_encounters.json" + special_encounters = None + if se_path.exists(): + with open(se_path) as f: + special_encounters = json.load(f) + + return SeedConfig( + version_groups=version_groups, + route_order=route_order, + special_encounters=special_encounters, + ) diff --git a/tools/import-pokedb/import_pokedb/models.py b/tools/import-pokedb/import_pokedb/models.py new file mode 100644 index 0000000..9083c7d --- /dev/null +++ b/tools/import-pokedb/import_pokedb/models.py @@ -0,0 +1,81 @@ +"""Output data models matching the existing seed JSON format.""" + +from __future__ import annotations + +from dataclasses import dataclass, field + + +@dataclass +class Encounter: + pokeapi_id: int + pokemon_name: str + method: str + encounter_rate: int + min_level: int + max_level: int + + def to_dict(self) -> dict: + return { + "pokeapi_id": self.pokeapi_id, + "pokemon_name": self.pokemon_name, + "method": self.method, + "encounter_rate": self.encounter_rate, + "min_level": self.min_level, + "max_level": self.max_level, + } + + +@dataclass +class Route: + name: str + order: int + encounters: list[Encounter] = field(default_factory=list) + children: list[Route] = field(default_factory=list) + + def to_dict(self) -> dict: + d: dict = { + "name": self.name, + "order": self.order, + "encounters": [e.to_dict() for e in self.encounters], + } + if self.children: + d["children"] = [c.to_dict() for c in self.children] + return d + + +@dataclass +class Game: + name: str + slug: str + generation: int + region: str + release_year: int + color: str | None = None + + def to_dict(self) -> dict: + return { + "name": self.name, + "slug": self.slug, + "generation": self.generation, + "region": self.region, + "release_year": self.release_year, + "color": self.color, + } + + +@dataclass +class Pokemon: + pokeapi_id: int + national_dex: int + name: str + types: list[str] + sprite_url: str + + def to_dict(self) -> dict: + return { + "pokeapi_id": self.pokeapi_id, + "national_dex": self.national_dex, + "name": self.name, + "types": self.types, + "sprite_url": self.sprite_url, + } diff --git a/tools/import-pokedb/pyproject.toml b/tools/import-pokedb/pyproject.toml new file mode 100644 index 0000000..e4af761 --- /dev/null +++ b/tools/import-pokedb/pyproject.toml @@ -0,0 +1,9 @@ +[project] +name = "import-pokedb" +version = "0.1.0" +description = "Convert PokeDB.org JSON data exports into nuzlocke-tracker seed format" +requires-python = ">=3.12" +dependencies = [] + +[project.scripts] +import-pokedb = "import_pokedb.__main__:main"