rbrooks/TeaLeaves

Fork 0

A personal research tool designed to help a media researcher organize, connect, and publish observations about themes and motifs across disparate media.

TypeScript 99.2%
CSS 0.3%
Dockerfile 0.3%
HTML 0.2%

Find a file

Ryan 2dcb770c2a feat(p2): notes & conversations import New source types 'notes' and 'conversation' extend the existing BullMQ import pipeline with file upload, paste UI, Pass 1 dedup, and attributed_to support. Migration 010 widens the import_jobs CHECK constraint. - parseNotesText: blank-line split, content-hash sourceId for dedup - parseConversationJson: flexible field aliases, author → attributed_to - attributed_to now written to entries for all source types (was dropped) - POST /api/import/notes (.txt/.md) and /conversation (.json) - PasteCard component: textarea paste with optional speaker attribution - ImportPage restructured into social vs notes/conversations sections - 14 new parser unit tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-04-24 18:43:20 -05:00
api	feat(p2): notes & conversations import	2026-04-24 18:43:20 -05:00
app	feat(p2): notes & conversations import	2026-04-24 18:43:20 -05:00
.env.example	docs: comprehensive update for v1.0.0	2026-03-27 22:07:15 -05:00
.gitignore	Add security-audit skill and exclude .claude/ from git	2026-03-27 22:38:10 -05:00
AUDIT-REPORT-v1.0.0.md	docs: update audit report, CLAUDE.md, and README for post-launch hardening	2026-03-28 00:51:21 -05:00
CHANGELOG.md	docs: mark P1 complete, promote Notes import to P2, write changelog	2026-04-24 17:19:36 -05:00
CLAUDE.md	docs: add API token auth + Stash webhook integration to v2 roadmap	2026-03-28 01:07:52 -05:00
docker-compose.dev.yml	Implement Phase 1: auth, shell, and project infrastructure	2026-03-26 23:44:31 -05:00
docker-compose.test.yml	test: switch test infrastructure ports to 5434/6381	2026-04-24 17:06:10 -05:00
docker-compose.yml	Route all traffic through a single port via nginx reverse proxy	2026-03-27 00:06:47 -05:00
README.md	docs: extract roadmap and changelog from README	2026-04-24 16:37:49 -05:00
ROADMAP.md	feat(p2): notes & conversations import	2026-04-24 18:43:20 -05:00
Tea Leaves - spec.md	docs: add API token auth + Stash webhook integration to v2 roadmap	2026-03-28 01:07:52 -05:00

README.md

Tea Leaves

A self-hosted research tool for organizing, connecting, and publishing observations about themes and motifs across media. Notes scattered across Twitter, Bluesky, phone notes, and conversations come together into one structured, searchable, connected workspace.

Overview
Architecture
Prerequisites
Setup
Configuration
Running the App
Development
Authentication
AI Integration
Data & Backups
Import
Keyboard Shortcuts
Quick Capture
Build Phases · Roadmap · Changelog

Overview

Tea Leaves is a mobile-first PWA designed for a single researcher. It is entirely self-hosted with no cloud dependencies — all storage, AI processing, and serving happens on your own infrastructure.

Key principles:

Your data is irreplaceable. Every decision protects years of research.
Mobile-first, but fully usable on desktop.
No cloud. Everything runs on your homelab.
Settings are live-editable in the UI — no restarts required.
Export everything, at any time, in standard formats.

Architecture

[Browser / Phone]
      │
      ▼
[Reverse Proxy]          ← Tailscale tunnel from remote server (external, not in this stack)
      │
      ▼
   app:80                ← nginx (React frontend + reverse proxy)
      │
      ├── /api/*  ──────► api:3000   ← Node.js / Express (internal only)
      ├── /auth/* ──────► api:3000
      └── /*      ──────► React app (static files)
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
                 db:5432  redis:6379  [AI Server]
           (PostgreSQL    (sessions,  (external,
           + pgvector)    job queues)  HTTP calls)

Docker Compose services:

Service	Image	Purpose
`app`	Built from `./app`	React frontend, served by nginx
`api`	Built from `./api`	Node.js + Express backend
`db`	`pgvector/pgvector:pg16`	PostgreSQL with vector search
`redis`	`redis:7-alpine`	Sessions, rate limiting, background jobs

The AI server and reverse proxy (Tailscale + whatever proxy you use) are external — this stack makes HTTP calls out to AI and receives traffic in from the proxy.

Prerequisites

Docker and Docker Compose (v2)
An Authentik instance with an OAuth2/OIDC provider configured for Tea Leaves
(Optional) A self-hosted AI server — Ollama, any OpenAI-compatible server (llama.cpp, LM Studio, vLLM), or an Anthropic API key. AI features are disabled until configured but the app runs fully without it.

Setup

1. Clone the repository

git clone <your-repo-url> tea-leaves
cd tea-leaves

2. Create your environment file

cp .env.example .env

Then edit .env and fill in every value. See Configuration for details.

3. Configure Authentik

In your Authentik instance, create a new OAuth2/OpenID Connect Provider:

Name: Tea Leaves
Client type: Confidential
Redirect URIs: https://yourdomain.com/auth/callback
Scopes: openid, email, profile

Then create an Application backed by that provider. Copy the Client ID and Client Secret into your .env.

Set AUTHENTIK_ISSUER to the provider's issuer URL — typically:

https://auth.yourdomain.com/application/o/tea-leaves/

4. Build and start

docker compose up --build -d

Database migrations run automatically on API startup. On first boot, logs will show each migration being applied:

Applying migration: 001_users.sql
  ✓ 001_users.sql
Tea Leaves API listening on port 3000 (production)

5. Verify

curl http://localhost:3000/health
# {"status":"ok"}

Navigate to http://localhost (or whatever APP_PORT you set) and you should see the Tea Leaves login screen.

Configuration

All configuration lives in .env. Copy .env.example as your starting point.

Database

Variable	Description
`POSTGRES_DB`	Database name (default: `tealeaves`)
`POSTGRES_USER`	Database user (default: `tealeaves`)
`POSTGRES_PASSWORD`	Required. Database password — use a strong random value
`DATABASE_URL`	Full connection string — must match the three vars above

DATABASE_URL uses the internal Docker hostname db, not localhost.

Redis

Variable	Description
`REDIS_URL`	Redis connection string (default: `redis://redis:6379`)

Authentik OIDC

Variable	Description
`AUTHENTIK_ISSUER`	OIDC issuer URL from your Authentik provider
`AUTHENTIK_CLIENT_ID`	OAuth2 client ID
`AUTHENTIK_CLIENT_SECRET`	OAuth2 client secret
`AUTHENTIK_REDIRECT_URI`	Must exactly match the redirect URI registered in Authentik

Session

Variable	Description
`SESSION_SECRET`	Required. Long random string (32+ characters) used to sign session cookies

Generate a suitable value with:

openssl rand -base64 48

AI Server

All AI settings can be changed live in the Settings UI without a restart. The env vars below serve as boot-time defaults that seed the settings table on first run.

Variable	Default	Description
`AI_PROVIDER`	`ollama`	`ollama` · `openai` · `anthropic`
`AI_BASE_URL`	`http://localhost:11434`	Base URL without path suffix
`AI_API_KEY`	(blank)	Required for OpenAI and Anthropic; leave blank for Ollama
`AI_VISION_MODEL`	`llava`	Model used for image description and tag suggestions
`AI_EMBEDDING_MODEL`	`nomic-embed-text`	Model used for semantic search and duplicate detection
`AI_TIMEOUT_MS`	`60000`	Per-request timeout in milliseconds

All variables are optional — the app starts and runs without them. AI-dependent features (image descriptions, tag suggestions, semantic search, duplicate detection) are simply unavailable until configured.

Media Storage

Variable	Description
`MEDIA_DIR`	Path inside the `api` container where uploads are stored (default: `/data/media`)

Uploaded files are stored here, outside the web root, and served only through the authenticated API.

App & Ports

Variable	Default	Description
`APP_URL`	—	Public-facing base URL, used to construct the OIDC redirect URI
`NODE_ENV`	`production`	Set to `development` for verbose logging
`APP_PORT`	`80`	Host port the app is served on — all traffic (UI, API, auth) goes through here
`PORT`	`3000`	Internal port the API listens on inside its container (rarely needs changing)

The API is not exposed directly to the host. All requests go through nginx on APP_PORT, which proxies /api/* and /auth/* to the API container internally.

Change APP_PORT if port 80 is already in use on your host:

APP_PORT=8080

Running the App

Start

docker compose up -d

Stop

docker compose down

Stop and remove all data (destructive)

docker compose down -v

View logs

# All services
docker compose logs -f

# API only
docker compose logs -f api

Rebuild after code changes

docker compose up --build -d

Development

Use the dev override file, which mounts source directories into the containers for hot-reload:

docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build

The frontend dev server (Vite) runs on port 5173 and proxies /api and /auth to the API container. Open http://localhost:5173 during development.

Running services individually

If you prefer to run the API and frontend outside Docker during development:

# Start only the infrastructure
docker compose up -d db redis

# API (from /api)
cd api
npm install
npm run dev

# Frontend (from /app)
cd app
npm install
npm run dev

Ensure your .env uses localhost hostnames for DATABASE_URL and REDIS_URL when running outside Docker.

Authentication

Tea Leaves uses Authentik as its sole authentication provider via OIDC. There are no local usernames or passwords — all login is handled through the SSO flow.

/auth/login — redirects to Authentik
/auth/callback — handles the OIDC return, creates or updates the user record, establishes a session
/auth/logout — destroys the session
/auth/me — returns the current user (used by the frontend to check auth state)

All API routes except /auth/* and /health require a valid session. Unauthenticated requests receive a 401.

AI Integration

Tea Leaves calls an external AI server over HTTP. No AI services run inside the Docker Compose stack. AI is optional — the app runs fully without it; AI-dependent features (image descriptions, tag suggestions, semantic search) are simply unavailable until configured.

Supported providers

Provider	Value	Notes
Ollama	`ollama`	Default. Uses `/api/chat` and `/api/embed` native endpoints.
OpenAI-compatible	`openai`	Covers OpenAI, llama.cpp, LM Studio, vLLM, Jan, Kobold, and any server with `/v1/chat/completions` + `/v1/embeddings`.
Anthropic	`anthropic`	Uses `/v1/messages`. No embedding API — configure a separate Ollama or OpenAI embedding provider in the Settings UI (`ai.embeddingProvider`, `ai.embeddingBaseUrl`, `ai.embeddingApiKey`).

Set AI_PROVIDER in .env to select the provider, or change it any time in the Settings UI. See the Configuration section for all AI variables.

Model types

Type	Purpose
Vision model (`AI_VISION_MODEL`)	Describe uploaded images, suggest tags from images
Embedding model (`AI_EMBEDDING_MODEL`)	Semantic search, duplicate detection; not used by Anthropic

Example models by provider:

Provider	Vision	Embedding
Ollama	`llava`, `llava:13b`, `llama3.2-vision`, `qwen2-vl:7b`	`nomic-embed-text` (768-dim), `mxbai-embed-large`
OpenAI	`gpt-4o`, `gpt-4o-mini`	`text-embedding-3-small` (1536-dim)
Anthropic	`claude-sonnet-4-6`, `claude-opus-4-6`	(none — use separate provider)

Embedding dimensions

The database column defaults to 768 dimensions (matching nomic-embed-text). If you use a model with different dimensions, adjust the column before first boot:

ALTER TABLE entries ALTER COLUMN text_embedding TYPE vector(1536);

Replace 1536 with the actual output dimension of your chosen model.

Settings panel

All AI settings (provider, base URL, API key, model names, timeout, and a separate embedding provider for Anthropic users) are live-editable from the Settings UI with no restart required. The env vars serve as boot-time defaults that seed the settings table on first run.

All AI suggestions require your approval — nothing is applied to your data automatically.

Data & Backups

Volumes

Volume	Contents
`pg_data`	All PostgreSQL data
`redis_data`	Redis persistence
`media_files`	Uploaded images and screenshots

Export

Export any time from the Settings → Export section:

Format	Contents
`entries.json`	All entries with tags and media refs
`entries.csv`	Flat CSV, suitable for spreadsheets
`motifs.json`	All motifs with entries and connections
`motif/<id>.json`	Single motif (entries + connections)
`motif/<id>.md`	Single motif as readable Markdown essay
`full.json`	Everything in one file

Backups

Restic-based automated backups cover the PostgreSQL database and media files. Configure from Settings → Backup:

Repository — any restic backend: local path, Backblaze B2 (s3:s3.us-west-004.backblazeb2.com/bucket), S3-compatible, SFTP, rclone
Schedule — cron expression (e.g. 0 3 * * * for 3am daily); leave blank to disable
Retention — number of snapshots to keep (older ones are pruned automatically)
Verification — restic check runs after every backup; result shown in job history

A status badge in the top bar shows when the last backup ran and warns if it failed or is overdue.

Import

Tea Leaves can import your existing posts from Twitter and Bluesky. Imports run as background jobs (BullMQ, Redis-backed) so large archives don't time out. Progress is shown live on the Import page while the job runs.

Twitter

Export your data from Twitter/X (Settings → Your account → Download an archive of your data). You will receive a .zip file. Upload it directly — no unpacking needed.

The importer reads data/tweet.js inside the archive. That file uses a JavaScript assignment format (window.YTD.tweet.part0 = [...]); the importer strips this prefix automatically. Each tweet's full_text (or text fallback), id_str, created_at, and first expanded URL are imported.

Bluesky

Bluesky does not currently offer a first-party data export. You can use a community tool such as bsky-export or similar to produce a JSON export.

The importer accepts a .json file containing an array of post objects. Two formats are supported:

Flat: [{ "text": "…", "createdAt": "…" }, …]
AT Protocol: [{ "uri": "at://did:plc:…/app.bsky.feed.post/…", "value": { "text": "…", "createdAt": "…" } }, …]

Duplicate detection

Pass 1 (source ID): Any post whose source_id (e.g. twitter:1234567890) already exists in your entries is silently skipped — no duplicate is created.

Pass 2 (semantic similarity): When AI is configured, posts that are semantically similar to existing entries above the configured threshold are flagged for review rather than imported automatically.

Flagged duplicates appear in the Duplicate review queue on the Import page. For each pair you can see the existing entry alongside the incoming post and choose to Skip (discard the incoming post) or Import anyway (create a new entry regardless).

Keyboard Shortcuts

Key	Action
`n`	New entry (on Entries page)
`s`	Focus search (on Search page)
`c`	Open Quick Capture
`?`	Show all shortcuts

Quick Capture

The + floating button (bottom-right on mobile, bottom-right corner on desktop) opens a quick-capture sheet for fast note entry. Paste a URL, type a note, or drop an image hint. Select certainty and save with ⌘↵.

If you're offline, captures are saved to an IndexedDB queue and synced automatically when the network returns. A badge on the button shows how many items are queued.

Build Phases

All 10 phases are complete at v1.0.0. See CHANGELOG.md for the full history and ROADMAP.md for what's planned next.