A personal research tool designed to help a media researcher organize, connect, and publish observations about themes and motifs across disparate media.
  • TypeScript 99.2%
  • CSS 0.3%
  • Dockerfile 0.3%
  • HTML 0.2%
Find a file
Ryan 2dcb770c2a feat(p2): notes & conversations import
New source types 'notes' and 'conversation' extend the existing BullMQ
import pipeline with file upload, paste UI, Pass 1 dedup, and attributed_to
support. Migration 010 widens the import_jobs CHECK constraint.

- parseNotesText: blank-line split, content-hash sourceId for dedup
- parseConversationJson: flexible field aliases, author → attributed_to
- attributed_to now written to entries for all source types (was dropped)
- POST /api/import/notes (.txt/.md) and /conversation (.json)
- PasteCard component: textarea paste with optional speaker attribution
- ImportPage restructured into social vs notes/conversations sections
- 14 new parser unit tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 18:43:20 -05:00
api feat(p2): notes & conversations import 2026-04-24 18:43:20 -05:00
app feat(p2): notes & conversations import 2026-04-24 18:43:20 -05:00
.env.example docs: comprehensive update for v1.0.0 2026-03-27 22:07:15 -05:00
.gitignore Add security-audit skill and exclude .claude/ from git 2026-03-27 22:38:10 -05:00
AUDIT-REPORT-v1.0.0.md docs: update audit report, CLAUDE.md, and README for post-launch hardening 2026-03-28 00:51:21 -05:00
CHANGELOG.md docs: mark P1 complete, promote Notes import to P2, write changelog 2026-04-24 17:19:36 -05:00
CLAUDE.md docs: add API token auth + Stash webhook integration to v2 roadmap 2026-03-28 01:07:52 -05:00
docker-compose.dev.yml Implement Phase 1: auth, shell, and project infrastructure 2026-03-26 23:44:31 -05:00
docker-compose.test.yml test: switch test infrastructure ports to 5434/6381 2026-04-24 17:06:10 -05:00
docker-compose.yml Route all traffic through a single port via nginx reverse proxy 2026-03-27 00:06:47 -05:00
README.md docs: extract roadmap and changelog from README 2026-04-24 16:37:49 -05:00
ROADMAP.md feat(p2): notes & conversations import 2026-04-24 18:43:20 -05:00
Tea Leaves - spec.md docs: add API token auth + Stash webhook integration to v2 roadmap 2026-03-28 01:07:52 -05:00

Tea Leaves

A self-hosted research tool for organizing, connecting, and publishing observations about themes and motifs across media. Notes scattered across Twitter, Bluesky, phone notes, and conversations come together into one structured, searchable, connected workspace.


Table of Contents


Overview

Tea Leaves is a mobile-first PWA designed for a single researcher. It is entirely self-hosted with no cloud dependencies — all storage, AI processing, and serving happens on your own infrastructure.

Key principles:

  • Your data is irreplaceable. Every decision protects years of research.
  • Mobile-first, but fully usable on desktop.
  • No cloud. Everything runs on your homelab.
  • Settings are live-editable in the UI — no restarts required.
  • Export everything, at any time, in standard formats.

Architecture

[Browser / Phone]
      │
      ▼
[Reverse Proxy]          ← Tailscale tunnel from remote server (external, not in this stack)
      │
      ▼
   app:80                ← nginx (React frontend + reverse proxy)
      │
      ├── /api/*  ──────► api:3000   ← Node.js / Express (internal only)
      ├── /auth/* ──────► api:3000
      └── /*      ──────► React app (static files)
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
                 db:5432  redis:6379  [AI Server]
           (PostgreSQL    (sessions,  (external,
           + pgvector)    job queues)  HTTP calls)

Docker Compose services:

Service Image Purpose
app Built from ./app React frontend, served by nginx
api Built from ./api Node.js + Express backend
db pgvector/pgvector:pg16 PostgreSQL with vector search
redis redis:7-alpine Sessions, rate limiting, background jobs

The AI server and reverse proxy (Tailscale + whatever proxy you use) are external — this stack makes HTTP calls out to AI and receives traffic in from the proxy.


Prerequisites

  • Docker and Docker Compose (v2)
  • An Authentik instance with an OAuth2/OIDC provider configured for Tea Leaves
  • (Optional) A self-hosted AI server — Ollama, any OpenAI-compatible server (llama.cpp, LM Studio, vLLM), or an Anthropic API key. AI features are disabled until configured but the app runs fully without it.

Setup

1. Clone the repository

git clone <your-repo-url> tea-leaves
cd tea-leaves

2. Create your environment file

cp .env.example .env

Then edit .env and fill in every value. See Configuration for details.

3. Configure Authentik

In your Authentik instance, create a new OAuth2/OpenID Connect Provider:

  • Name: Tea Leaves
  • Client type: Confidential
  • Redirect URIs: https://yourdomain.com/auth/callback
  • Scopes: openid, email, profile

Then create an Application backed by that provider. Copy the Client ID and Client Secret into your .env.

Set AUTHENTIK_ISSUER to the provider's issuer URL — typically:

https://auth.yourdomain.com/application/o/tea-leaves/

4. Build and start

docker compose up --build -d

Database migrations run automatically on API startup. On first boot, logs will show each migration being applied:

Applying migration: 001_users.sql
  ✓ 001_users.sql
Tea Leaves API listening on port 3000 (production)

5. Verify

curl http://localhost:3000/health
# {"status":"ok"}

Navigate to http://localhost (or whatever APP_PORT you set) and you should see the Tea Leaves login screen.


Configuration

All configuration lives in .env. Copy .env.example as your starting point.

Database

Variable Description
POSTGRES_DB Database name (default: tealeaves)
POSTGRES_USER Database user (default: tealeaves)
POSTGRES_PASSWORD Required. Database password — use a strong random value
DATABASE_URL Full connection string — must match the three vars above

DATABASE_URL uses the internal Docker hostname db, not localhost.

Redis

Variable Description
REDIS_URL Redis connection string (default: redis://redis:6379)

Authentik OIDC

Variable Description
AUTHENTIK_ISSUER OIDC issuer URL from your Authentik provider
AUTHENTIK_CLIENT_ID OAuth2 client ID
AUTHENTIK_CLIENT_SECRET OAuth2 client secret
AUTHENTIK_REDIRECT_URI Must exactly match the redirect URI registered in Authentik

Session

Variable Description
SESSION_SECRET Required. Long random string (32+ characters) used to sign session cookies

Generate a suitable value with:

openssl rand -base64 48

AI Server

All AI settings can be changed live in the Settings UI without a restart. The env vars below serve as boot-time defaults that seed the settings table on first run.

Variable Default Description
AI_PROVIDER ollama ollama · openai · anthropic
AI_BASE_URL http://localhost:11434 Base URL without path suffix
AI_API_KEY (blank) Required for OpenAI and Anthropic; leave blank for Ollama
AI_VISION_MODEL llava Model used for image description and tag suggestions
AI_EMBEDDING_MODEL nomic-embed-text Model used for semantic search and duplicate detection
AI_TIMEOUT_MS 60000 Per-request timeout in milliseconds

All variables are optional — the app starts and runs without them. AI-dependent features (image descriptions, tag suggestions, semantic search, duplicate detection) are simply unavailable until configured.

Media Storage

Variable Description
MEDIA_DIR Path inside the api container where uploads are stored (default: /data/media)

Uploaded files are stored here, outside the web root, and served only through the authenticated API.

App & Ports

Variable Default Description
APP_URL Public-facing base URL, used to construct the OIDC redirect URI
NODE_ENV production Set to development for verbose logging
APP_PORT 80 Host port the app is served on — all traffic (UI, API, auth) goes through here
PORT 3000 Internal port the API listens on inside its container (rarely needs changing)

The API is not exposed directly to the host. All requests go through nginx on APP_PORT, which proxies /api/* and /auth/* to the API container internally.

Change APP_PORT if port 80 is already in use on your host:

APP_PORT=8080

Running the App

Start

docker compose up -d

Stop

docker compose down

Stop and remove all data (destructive)

docker compose down -v

View logs

# All services
docker compose logs -f

# API only
docker compose logs -f api

Rebuild after code changes

docker compose up --build -d

Development

Use the dev override file, which mounts source directories into the containers for hot-reload:

docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build

The frontend dev server (Vite) runs on port 5173 and proxies /api and /auth to the API container. Open http://localhost:5173 during development.

Running services individually

If you prefer to run the API and frontend outside Docker during development:

# Start only the infrastructure
docker compose up -d db redis

# API (from /api)
cd api
npm install
npm run dev

# Frontend (from /app)
cd app
npm install
npm run dev

Ensure your .env uses localhost hostnames for DATABASE_URL and REDIS_URL when running outside Docker.


Authentication

Tea Leaves uses Authentik as its sole authentication provider via OIDC. There are no local usernames or passwords — all login is handled through the SSO flow.

  • /auth/login — redirects to Authentik
  • /auth/callback — handles the OIDC return, creates or updates the user record, establishes a session
  • /auth/logout — destroys the session
  • /auth/me — returns the current user (used by the frontend to check auth state)

All API routes except /auth/* and /health require a valid session. Unauthenticated requests receive a 401.


AI Integration

Tea Leaves calls an external AI server over HTTP. No AI services run inside the Docker Compose stack. AI is optional — the app runs fully without it; AI-dependent features (image descriptions, tag suggestions, semantic search) are simply unavailable until configured.

Supported providers

Provider Value Notes
Ollama ollama Default. Uses /api/chat and /api/embed native endpoints.
OpenAI-compatible openai Covers OpenAI, llama.cpp, LM Studio, vLLM, Jan, Kobold, and any server with /v1/chat/completions + /v1/embeddings.
Anthropic anthropic Uses /v1/messages. No embedding API — configure a separate Ollama or OpenAI embedding provider in the Settings UI (ai.embeddingProvider, ai.embeddingBaseUrl, ai.embeddingApiKey).

Set AI_PROVIDER in .env to select the provider, or change it any time in the Settings UI. See the Configuration section for all AI variables.

Model types

Type Purpose
Vision model (AI_VISION_MODEL) Describe uploaded images, suggest tags from images
Embedding model (AI_EMBEDDING_MODEL) Semantic search, duplicate detection; not used by Anthropic

Example models by provider:

Provider Vision Embedding
Ollama llava, llava:13b, llama3.2-vision, qwen2-vl:7b nomic-embed-text (768-dim), mxbai-embed-large
OpenAI gpt-4o, gpt-4o-mini text-embedding-3-small (1536-dim)
Anthropic claude-sonnet-4-6, claude-opus-4-6 (none — use separate provider)

Embedding dimensions

The database column defaults to 768 dimensions (matching nomic-embed-text). If you use a model with different dimensions, adjust the column before first boot:

ALTER TABLE entries ALTER COLUMN text_embedding TYPE vector(1536);

Replace 1536 with the actual output dimension of your chosen model.

Settings panel

All AI settings (provider, base URL, API key, model names, timeout, and a separate embedding provider for Anthropic users) are live-editable from the Settings UI with no restart required. The env vars serve as boot-time defaults that seed the settings table on first run.

All AI suggestions require your approval — nothing is applied to your data automatically.


Data & Backups

Volumes

Volume Contents
pg_data All PostgreSQL data
redis_data Redis persistence
media_files Uploaded images and screenshots

Export

Export any time from the Settings → Export section:

Format Contents
entries.json All entries with tags and media refs
entries.csv Flat CSV, suitable for spreadsheets
motifs.json All motifs with entries and connections
motif/<id>.json Single motif (entries + connections)
motif/<id>.md Single motif as readable Markdown essay
full.json Everything in one file

Backups

Restic-based automated backups cover the PostgreSQL database and media files. Configure from Settings → Backup:

  • Repository — any restic backend: local path, Backblaze B2 (s3:s3.us-west-004.backblazeb2.com/bucket), S3-compatible, SFTP, rclone
  • Schedule — cron expression (e.g. 0 3 * * * for 3am daily); leave blank to disable
  • Retention — number of snapshots to keep (older ones are pruned automatically)
  • Verificationrestic check runs after every backup; result shown in job history

A status badge in the top bar shows when the last backup ran and warns if it failed or is overdue.


Import

Tea Leaves can import your existing posts from Twitter and Bluesky. Imports run as background jobs (BullMQ, Redis-backed) so large archives don't time out. Progress is shown live on the Import page while the job runs.

Twitter

Export your data from Twitter/X (Settings → Your account → Download an archive of your data). You will receive a .zip file. Upload it directly — no unpacking needed.

The importer reads data/tweet.js inside the archive. That file uses a JavaScript assignment format (window.YTD.tweet.part0 = [...]); the importer strips this prefix automatically. Each tweet's full_text (or text fallback), id_str, created_at, and first expanded URL are imported.

Bluesky

Bluesky does not currently offer a first-party data export. You can use a community tool such as bsky-export or similar to produce a JSON export.

The importer accepts a .json file containing an array of post objects. Two formats are supported:

  • Flat: [{ "text": "…", "createdAt": "…" }, …]
  • AT Protocol: [{ "uri": "at://did:plc:…/app.bsky.feed.post/…", "value": { "text": "…", "createdAt": "…" } }, …]

Duplicate detection

Pass 1 (source ID): Any post whose source_id (e.g. twitter:1234567890) already exists in your entries is silently skipped — no duplicate is created.

Pass 2 (semantic similarity): When AI is configured, posts that are semantically similar to existing entries above the configured threshold are flagged for review rather than imported automatically.

Flagged duplicates appear in the Duplicate review queue on the Import page. For each pair you can see the existing entry alongside the incoming post and choose to Skip (discard the incoming post) or Import anyway (create a new entry regardless).


Keyboard Shortcuts

Key Action
n New entry (on Entries page)
s Focus search (on Search page)
c Open Quick Capture
? Show all shortcuts

Quick Capture

The + floating button (bottom-right on mobile, bottom-right corner on desktop) opens a quick-capture sheet for fast note entry. Paste a URL, type a note, or drop an image hint. Select certainty and save with ⌘↵.

If you're offline, captures are saved to an IndexedDB queue and synced automatically when the network returns. A badge on the button shows how many items are queued.


Build Phases

All 10 phases are complete at v1.0.0. See CHANGELOG.md for the full history and ROADMAP.md for what's planned next.