- Python 100%
Find the video file sitting alongside the NFO (same stem, any video extension), then search Stash by its exact path using EQUALS. This is unambiguous and requires no regex escaping or partial matching. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| src/stash_nfo_sync | ||
| tests | ||
| CHANGELOG.md | ||
| config.example.yaml | ||
| pyproject.toml | ||
| README.md | ||
| stash-nfo-sync.service | ||
| StashEnrichmentSpec.md | ||
stash-nfo-sync
A Python service that watches a media directory for .nfo files written by Pinchflat and syncs their metadata into Stash via its GraphQL API.
Includes a one-shot backfill script for existing libraries and optional AI-assisted performer extraction powered by Claude.
Table of Contents
- How it works
- Requirements
- Installation
- Configuration
- Usage
- Running as a systemd service
- AI Performer Extraction
- Development
- Architecture
- V2 / Future Work
How it works
Pinchflat downloads YouTube videos and writes Kodi-format .nfo sidecar files alongside them. Stash does not read these automatically. stash-nfo-sync bridges that gap:
- Watcher — monitors your media directory recursively using
watchdog. When a new or modified.nfois detected, it debounces the event (waits for the file to stop changing), then processes it. - Backfill — a one-shot script that walks your full media directory and processes every
.nfofile it finds. Safe to re-run; scenes that are already up to date are skipped. - Processor — for each
.nfo, it triggers a Stash library scan, waits for the video to appear in Stash, then writes the title, studio, release date, description, and YouTube URL to the scene. Keyed on YouTube ID to prevent duplicate entries.
NFO field mapping
| NFO field | Stash field | Notes |
|---|---|---|
<title> |
Scene title | |
<showtitle> |
Studio name | Leading/trailing whitespace trimmed |
<uniqueid> |
Studio code + scene URL | Full YouTube URL built from the ID |
<plot> |
Scene details | Optionally fed to AI extractor |
<aired> |
Release date | Normalized to YYYY-MM-DD |
Requirements
- Python 3.11+
- A running Stash instance with API access
- Pinchflat (or any tool that writes Kodi-format
.nfofiles) - (Optional) An Anthropic API key for AI performer extraction
Installation
# Clone the repository
git clone <repo-url>
cd StashEnrichmentScript
# Install the package (use a virtual environment)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install .
# Copy and edit the example config
cp config.example.yaml config.yaml
Configuration
All settings live in config.yaml. No credentials are hardcoded.
stash_url: http://localhost:9999 # URL of your Stash instance
stash_api_key: YOUR_KEY_HERE # Settings → Security → API Key
media_dir: /path/to/media # Root of your Pinchflat media library
scan_wait_timeout_seconds: 30 # How long to wait for a scan to pick up a new file
debounce_seconds: 3.0 # Wait this long after the last file event before processing
logging:
log_file: /var/log/stash-nfo-sync.log
max_bytes: 10485760 # 10 MB per log file
backup_count: 5 # Number of rotated log files to keep
ai_extraction:
enabled: false # Set to true to enable AI performer extraction
anthropic_api_key: YOUR_KEY_HERE
retry_queue_path: /var/lib/stash-nfo-sync/retry_queue.json
Getting your Stash API key
In Stash: Settings → Security → Authentication → API Key. Generate one if you haven't already.
Usage
Watch mode (daemon)
Start the filesystem watcher. It will drain any previously failed paths from the retry queue, then begin monitoring for new or changed .nfo files.
stash-nfo-sync watch
stash-nfo-sync watch --config /etc/stash-nfo-sync/config.yaml
Backfill
Process every .nfo file in your media directory. Idempotent — scenes already in sync are skipped.
stash-nfo-sync backfill
Preview what would happen without writing anything to Stash:
stash-nfo-sync backfill --dry-run
Reprocess only the paths that previously failed (i.e. paths currently in the retry queue):
stash-nfo-sync backfill --retry-failed
All commands accept --config PATH to specify a non-default config file.
Typical first-run workflow
# 1. Dry run to see what would change
stash-nfo-sync backfill --dry-run
# 2. Run the full backfill
stash-nfo-sync backfill
# 3. If anything failed (e.g. Stash was busy), retry just those
stash-nfo-sync backfill --retry-failed
# 4. Start the watcher for ongoing sync
stash-nfo-sync watch
Running as a systemd service
A .service unit file is included. To install it:
# Copy the binary (adjust path if using a venv)
sudo cp stash-nfo-sync.service /etc/systemd/system/
# Create the config directory and place your config
sudo mkdir -p /etc/stash-nfo-sync
sudo cp config.yaml /etc/stash-nfo-sync/config.yaml
# Create the log and queue directories (as the service user)
sudo mkdir -p /var/log /var/lib/stash-nfo-sync
sudo chown stash:stash /var/lib/stash-nfo-sync
# Enable and start
sudo systemctl daemon-reload
sudo systemctl enable stash-nfo-sync
sudo systemctl start stash-nfo-sync
# Check status
sudo systemctl status stash-nfo-sync
sudo journalctl -u stash-nfo-sync -f
The service runs as the stash user, restarts on failure, and logs to the system journal (which is also captured to the rotating log file configured in config.yaml).
AI Performer Extraction
When ai_extraction.enabled: true, the <plot> text from each .nfo is sent to Claude (Haiku model) to extract performer and director names as structured JSON. Extracted names are upserted in Stash and linked to the scene.
The prompt instructs the model to return empty lists rather than guess when names are not clearly stated — no hallucinated credits.
To enable:
ai_extraction:
enabled: true
anthropic_api_key: sk-ant-...
AI extraction failures are non-blocking: if the Claude API is unavailable or returns an unexpected response, the scene is still synced with all other metadata; only the performer/director fields are left empty.
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=stash_nfo_sync --cov-report=term-missing
Project structure
src/stash_nfo_sync/
├── cli.py # Entrypoint; argparse subcommands
├── config.py # Config loader + validation
├── models.py # Dataclasses: NFORecord, AppConfig, ProcessResult, …
├── nfo_parser.py # XML → NFORecord
├── stash_client.py # Stash GraphQL API wrapper
├── processor.py # Orchestration: parse → scan → upsert → update
├── performer_extractor.py # Claude API integration (optional)
├── retry_queue.py # Atomic JSON file queue
├── watcher.py # watchdog filesystem observer + debounce
├── backfill.py # One-shot directory walker
└── logging_setup.py # Rotating file + stderr log handlers
tests/
├── fixtures/
│ ├── sample.nfo # Real-world Pinchflat NFO fixture
│ └── config.test.yaml
├── test_nfo_parser.py
├── test_stash_client.py
├── test_retry_queue.py
├── test_processor.py
├── test_performer_extractor.py
└── test_backfill.py
Adding a V2 post-process hook
The SceneProcessor.process() method accepts an optional post_process_hooks list. Each hook receives (ProcessResult, NFORecord) after a successful sync. Hook exceptions are caught and logged — they never block a sync.
def my_hook(result: ProcessResult, record: NFORecord) -> None:
print(f"Synced: {record.youtube_url} → scene {result.scene_id}")
processor.process(path, post_process_hooks=[my_hook])
This is the intended extension point for the planned Tea Leaves connector (see V2 / Future Work).
Architecture
┌──────────┐
.nfo file ──────▶│ watcher │──────┐
(new/modified) └──────────┘ │
▼
stash-nfo-sync backfill ──────▶ processor
│
┌───────┼───────────┐
▼ ▼ ▼
nfo_parser stash_client performer_extractor
│ │ │ (optional)
│ ▼ │
│ [GraphQL API] │
│ │
└────────────────────┘
│
(on failure)
▼
retry_queue
Deduplication: The YouTube video ID from <uniqueid> is the canonical key. update_scene always overwrites metadata, so the backfill is safe to re-run. A quick pre-check compares the existing scene's title, date, and studio before triggering a scan — scenes already in sync are skipped without touching Stash.
Scan-then-poll: Stash's metadataScan mutation is asynchronous. After triggering it, poll_for_scene queries findScenes with the YouTube URL filter every 2 seconds up to scan_wait_timeout_seconds. If the video file hasn't been downloaded yet (Pinchflat sometimes writes the NFO before the video finishes downloading), the poll times out and the path is added to the retry queue for the next startup.
V2 / Future Work
After metadata is written to Stash, optionally call a Tea Leaves API endpoint with the YouTube URL, title, studio, performers, and Stash scene ID. The exact API shape is TBD. Implementation will use the post_process_hooks extension point — no changes to the core processor needed.