- TypeScript 99.2%
- CSS 0.3%
- Dockerfile 0.3%
- HTML 0.2%
New source types 'notes' and 'conversation' extend the existing BullMQ import pipeline with file upload, paste UI, Pass 1 dedup, and attributed_to support. Migration 010 widens the import_jobs CHECK constraint. - parseNotesText: blank-line split, content-hash sourceId for dedup - parseConversationJson: flexible field aliases, author → attributed_to - attributed_to now written to entries for all source types (was dropped) - POST /api/import/notes (.txt/.md) and /conversation (.json) - PasteCard component: textarea paste with optional speaker attribution - ImportPage restructured into social vs notes/conversations sections - 14 new parser unit tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| api | ||
| app | ||
| .env.example | ||
| .gitignore | ||
| AUDIT-REPORT-v1.0.0.md | ||
| CHANGELOG.md | ||
| CLAUDE.md | ||
| docker-compose.dev.yml | ||
| docker-compose.test.yml | ||
| docker-compose.yml | ||
| README.md | ||
| ROADMAP.md | ||
| Tea Leaves - spec.md | ||
Tea Leaves
A self-hosted research tool for organizing, connecting, and publishing observations about themes and motifs across media. Notes scattered across Twitter, Bluesky, phone notes, and conversations come together into one structured, searchable, connected workspace.
Table of Contents
- Overview
- Architecture
- Prerequisites
- Setup
- Configuration
- Running the App
- Development
- Authentication
- AI Integration
- Data & Backups
- Import
- Keyboard Shortcuts
- Quick Capture
- Build Phases · Roadmap · Changelog
Overview
Tea Leaves is a mobile-first PWA designed for a single researcher. It is entirely self-hosted with no cloud dependencies — all storage, AI processing, and serving happens on your own infrastructure.
Key principles:
- Your data is irreplaceable. Every decision protects years of research.
- Mobile-first, but fully usable on desktop.
- No cloud. Everything runs on your homelab.
- Settings are live-editable in the UI — no restarts required.
- Export everything, at any time, in standard formats.
Architecture
[Browser / Phone]
│
▼
[Reverse Proxy] ← Tailscale tunnel from remote server (external, not in this stack)
│
▼
app:80 ← nginx (React frontend + reverse proxy)
│
├── /api/* ──────► api:3000 ← Node.js / Express (internal only)
├── /auth/* ──────► api:3000
└── /* ──────► React app (static files)
│
┌─────────┼─────────┐
▼ ▼ ▼
db:5432 redis:6379 [AI Server]
(PostgreSQL (sessions, (external,
+ pgvector) job queues) HTTP calls)
Docker Compose services:
| Service | Image | Purpose |
|---|---|---|
app |
Built from ./app |
React frontend, served by nginx |
api |
Built from ./api |
Node.js + Express backend |
db |
pgvector/pgvector:pg16 |
PostgreSQL with vector search |
redis |
redis:7-alpine |
Sessions, rate limiting, background jobs |
The AI server and reverse proxy (Tailscale + whatever proxy you use) are external — this stack makes HTTP calls out to AI and receives traffic in from the proxy.
Prerequisites
- Docker and Docker Compose (v2)
- An Authentik instance with an OAuth2/OIDC provider configured for Tea Leaves
- (Optional) A self-hosted AI server — Ollama, any OpenAI-compatible server (llama.cpp, LM Studio, vLLM), or an Anthropic API key. AI features are disabled until configured but the app runs fully without it.
Setup
1. Clone the repository
git clone <your-repo-url> tea-leaves
cd tea-leaves
2. Create your environment file
cp .env.example .env
Then edit .env and fill in every value. See Configuration for details.
3. Configure Authentik
In your Authentik instance, create a new OAuth2/OpenID Connect Provider:
- Name: Tea Leaves
- Client type: Confidential
- Redirect URIs:
https://yourdomain.com/auth/callback - Scopes:
openid,email,profile
Then create an Application backed by that provider. Copy the Client ID and Client Secret into your .env.
Set AUTHENTIK_ISSUER to the provider's issuer URL — typically:
https://auth.yourdomain.com/application/o/tea-leaves/
4. Build and start
docker compose up --build -d
Database migrations run automatically on API startup. On first boot, logs will show each migration being applied:
Applying migration: 001_users.sql
✓ 001_users.sql
Tea Leaves API listening on port 3000 (production)
5. Verify
curl http://localhost:3000/health
# {"status":"ok"}
Navigate to http://localhost (or whatever APP_PORT you set) and you should see the Tea Leaves login screen.
Configuration
All configuration lives in .env. Copy .env.example as your starting point.
Database
| Variable | Description |
|---|---|
POSTGRES_DB |
Database name (default: tealeaves) |
POSTGRES_USER |
Database user (default: tealeaves) |
POSTGRES_PASSWORD |
Required. Database password — use a strong random value |
DATABASE_URL |
Full connection string — must match the three vars above |
DATABASE_URLuses the internal Docker hostnamedb, notlocalhost.
Redis
| Variable | Description |
|---|---|
REDIS_URL |
Redis connection string (default: redis://redis:6379) |
Authentik OIDC
| Variable | Description |
|---|---|
AUTHENTIK_ISSUER |
OIDC issuer URL from your Authentik provider |
AUTHENTIK_CLIENT_ID |
OAuth2 client ID |
AUTHENTIK_CLIENT_SECRET |
OAuth2 client secret |
AUTHENTIK_REDIRECT_URI |
Must exactly match the redirect URI registered in Authentik |
Session
| Variable | Description |
|---|---|
SESSION_SECRET |
Required. Long random string (32+ characters) used to sign session cookies |
Generate a suitable value with:
openssl rand -base64 48
AI Server
All AI settings can be changed live in the Settings UI without a restart. The env vars below serve as boot-time defaults that seed the settings table on first run.
| Variable | Default | Description |
|---|---|---|
AI_PROVIDER |
ollama |
ollama · openai · anthropic |
AI_BASE_URL |
http://localhost:11434 |
Base URL without path suffix |
AI_API_KEY |
(blank) | Required for OpenAI and Anthropic; leave blank for Ollama |
AI_VISION_MODEL |
llava |
Model used for image description and tag suggestions |
AI_EMBEDDING_MODEL |
nomic-embed-text |
Model used for semantic search and duplicate detection |
AI_TIMEOUT_MS |
60000 |
Per-request timeout in milliseconds |
All variables are optional — the app starts and runs without them. AI-dependent features (image descriptions, tag suggestions, semantic search, duplicate detection) are simply unavailable until configured.
Media Storage
| Variable | Description |
|---|---|
MEDIA_DIR |
Path inside the api container where uploads are stored (default: /data/media) |
Uploaded files are stored here, outside the web root, and served only through the authenticated API.
App & Ports
| Variable | Default | Description |
|---|---|---|
APP_URL |
— | Public-facing base URL, used to construct the OIDC redirect URI |
NODE_ENV |
production |
Set to development for verbose logging |
APP_PORT |
80 |
Host port the app is served on — all traffic (UI, API, auth) goes through here |
PORT |
3000 |
Internal port the API listens on inside its container (rarely needs changing) |
The API is not exposed directly to the host. All requests go through nginx on APP_PORT, which proxies /api/* and /auth/* to the API container internally.
Change APP_PORT if port 80 is already in use on your host:
APP_PORT=8080
Running the App
Start
docker compose up -d
Stop
docker compose down
Stop and remove all data (destructive)
docker compose down -v
View logs
# All services
docker compose logs -f
# API only
docker compose logs -f api
Rebuild after code changes
docker compose up --build -d
Development
Use the dev override file, which mounts source directories into the containers for hot-reload:
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build
The frontend dev server (Vite) runs on port 5173 and proxies /api and /auth to the API container. Open http://localhost:5173 during development.
Running services individually
If you prefer to run the API and frontend outside Docker during development:
# Start only the infrastructure
docker compose up -d db redis
# API (from /api)
cd api
npm install
npm run dev
# Frontend (from /app)
cd app
npm install
npm run dev
Ensure your .env uses localhost hostnames for DATABASE_URL and REDIS_URL when running outside Docker.
Authentication
Tea Leaves uses Authentik as its sole authentication provider via OIDC. There are no local usernames or passwords — all login is handled through the SSO flow.
/auth/login— redirects to Authentik/auth/callback— handles the OIDC return, creates or updates the user record, establishes a session/auth/logout— destroys the session/auth/me— returns the current user (used by the frontend to check auth state)
All API routes except /auth/* and /health require a valid session. Unauthenticated requests receive a 401.
AI Integration
Tea Leaves calls an external AI server over HTTP. No AI services run inside the Docker Compose stack. AI is optional — the app runs fully without it; AI-dependent features (image descriptions, tag suggestions, semantic search) are simply unavailable until configured.
Supported providers
| Provider | Value | Notes |
|---|---|---|
| Ollama | ollama |
Default. Uses /api/chat and /api/embed native endpoints. |
| OpenAI-compatible | openai |
Covers OpenAI, llama.cpp, LM Studio, vLLM, Jan, Kobold, and any server with /v1/chat/completions + /v1/embeddings. |
| Anthropic | anthropic |
Uses /v1/messages. No embedding API — configure a separate Ollama or OpenAI embedding provider in the Settings UI (ai.embeddingProvider, ai.embeddingBaseUrl, ai.embeddingApiKey). |
Set AI_PROVIDER in .env to select the provider, or change it any time in the Settings UI. See the Configuration section for all AI variables.
Model types
| Type | Purpose |
|---|---|
Vision model (AI_VISION_MODEL) |
Describe uploaded images, suggest tags from images |
Embedding model (AI_EMBEDDING_MODEL) |
Semantic search, duplicate detection; not used by Anthropic |
Example models by provider:
| Provider | Vision | Embedding |
|---|---|---|
| Ollama | llava, llava:13b, llama3.2-vision, qwen2-vl:7b |
nomic-embed-text (768-dim), mxbai-embed-large |
| OpenAI | gpt-4o, gpt-4o-mini |
text-embedding-3-small (1536-dim) |
| Anthropic | claude-sonnet-4-6, claude-opus-4-6 |
(none — use separate provider) |
Embedding dimensions
The database column defaults to 768 dimensions (matching nomic-embed-text). If you use a model with different dimensions, adjust the column before first boot:
ALTER TABLE entries ALTER COLUMN text_embedding TYPE vector(1536);
Replace 1536 with the actual output dimension of your chosen model.
Settings panel
All AI settings (provider, base URL, API key, model names, timeout, and a separate embedding provider for Anthropic users) are live-editable from the Settings UI with no restart required. The env vars serve as boot-time defaults that seed the settings table on first run.
All AI suggestions require your approval — nothing is applied to your data automatically.
Data & Backups
Volumes
| Volume | Contents |
|---|---|
pg_data |
All PostgreSQL data |
redis_data |
Redis persistence |
media_files |
Uploaded images and screenshots |
Export
Export any time from the Settings → Export section:
| Format | Contents |
|---|---|
entries.json |
All entries with tags and media refs |
entries.csv |
Flat CSV, suitable for spreadsheets |
motifs.json |
All motifs with entries and connections |
motif/<id>.json |
Single motif (entries + connections) |
motif/<id>.md |
Single motif as readable Markdown essay |
full.json |
Everything in one file |
Backups
Restic-based automated backups cover the PostgreSQL database and media files. Configure from Settings → Backup:
- Repository — any restic backend: local path, Backblaze B2 (
s3:s3.us-west-004.backblazeb2.com/bucket), S3-compatible, SFTP, rclone - Schedule — cron expression (e.g.
0 3 * * *for 3am daily); leave blank to disable - Retention — number of snapshots to keep (older ones are pruned automatically)
- Verification —
restic checkruns after every backup; result shown in job history
A status badge in the top bar shows when the last backup ran and warns if it failed or is overdue.
Import
Tea Leaves can import your existing posts from Twitter and Bluesky. Imports run as background jobs (BullMQ, Redis-backed) so large archives don't time out. Progress is shown live on the Import page while the job runs.
Export your data from Twitter/X (Settings → Your account → Download an archive of your data). You will receive a .zip file. Upload it directly — no unpacking needed.
The importer reads data/tweet.js inside the archive. That file uses a JavaScript assignment format (window.YTD.tweet.part0 = [...]); the importer strips this prefix automatically. Each tweet's full_text (or text fallback), id_str, created_at, and first expanded URL are imported.
Bluesky
Bluesky does not currently offer a first-party data export. You can use a community tool such as bsky-export or similar to produce a JSON export.
The importer accepts a .json file containing an array of post objects. Two formats are supported:
- Flat:
[{ "text": "…", "createdAt": "…" }, …] - AT Protocol:
[{ "uri": "at://did:plc:…/app.bsky.feed.post/…", "value": { "text": "…", "createdAt": "…" } }, …]
Duplicate detection
Pass 1 (source ID): Any post whose source_id (e.g. twitter:1234567890) already exists in your entries is silently skipped — no duplicate is created.
Pass 2 (semantic similarity): When AI is configured, posts that are semantically similar to existing entries above the configured threshold are flagged for review rather than imported automatically.
Flagged duplicates appear in the Duplicate review queue on the Import page. For each pair you can see the existing entry alongside the incoming post and choose to Skip (discard the incoming post) or Import anyway (create a new entry regardless).
Keyboard Shortcuts
| Key | Action |
|---|---|
n |
New entry (on Entries page) |
s |
Focus search (on Search page) |
c |
Open Quick Capture |
? |
Show all shortcuts |
Quick Capture
The + floating button (bottom-right on mobile, bottom-right corner on desktop) opens a quick-capture sheet for fast note entry. Paste a URL, type a note, or drop an image hint. Select certainty and save with ⌘↵.
If you're offline, captures are saved to an IndexedDB queue and synced automatically when the network returns. A badge on the button shows how many items are queued.
Build Phases
All 10 phases are complete at v1.0.0. See CHANGELOG.md for the full history and ROADMAP.md for what's planned next.