# Audio Classifier - Technical Implementation TODO

## Phase 1: Project Structure & Dependencies

### 1.1 Root structure
- [ ] Create root `.gitignore`
- [ ] Create root `README.md` with setup instructions
- [ ] Create `docker-compose.yml` (PostgreSQL + pgvector)
- [ ] Create `.env.example`

### 1.2 Backend structure (Python/FastAPI)
- [ ] Create `backend/` directory
- [ ] Create `backend/requirements.txt`:
  - fastapi==0.109.0
  - uvicorn[standard]==0.27.0
  - sqlalchemy==2.0.25
  - psycopg2-binary==2.9.9
  - pgvector==0.2.4
  - librosa==0.10.1
  - essentia-tensorflow==2.1b6.dev1110
  - pydantic==2.5.3
  - pydantic-settings==2.1.0
  - python-multipart==0.0.6
  - mutagen==1.47.0
  - numpy==1.24.3
  - scipy==1.11.4
- [ ] Create `backend/pyproject.toml` (optional, for poetry users)
- [ ] Create `backend/.env.example`
- [ ] Create `backend/Dockerfile`
- [ ] Create `backend/src/__init__.py`

### 1.3 Backend core modules structure
- [ ] `backend/src/core/__init__.py`
- [ ] `backend/src/core/audio_processor.py` - librosa feature extraction
- [ ] `backend/src/core/essentia_classifier.py` - Essentia models (genre/mood/instruments)
- [ ] `backend/src/core/analyzer.py` - Main orchestrator
- [ ] `backend/src/core/file_scanner.py` - Recursive folder scanning
- [ ] `backend/src/core/waveform_generator.py` - Peaks extraction for visualization

### 1.4 Backend database modules
- [ ] `backend/src/models/__init__.py`
- [ ] `backend/src/models/database.py` - SQLAlchemy engine + session
- [ ] `backend/src/models/schema.py` - SQLAlchemy models (AudioTrack)
- [ ] `backend/src/models/crud.py` - CRUD operations
- [ ] `backend/src/alembic/` - Migration setup
- [ ] `backend/src/alembic/versions/001_initial_schema.py` - CREATE TABLE + pgvector extension

### 1.5 Backend API structure
- [ ] `backend/src/api/__init__.py`
- [ ] `backend/src/api/main.py` - FastAPI app + CORS + startup/shutdown events
- [ ] `backend/src/api/routes/__init__.py`
- [ ] `backend/src/api/routes/tracks.py` - GET /tracks, GET /tracks/{id}, DELETE /tracks/{id}
- [ ] `backend/src/api/routes/search.py` - GET /search?q=...&genre=...&mood=...
- [ ] `backend/src/api/routes/analyze.py` - POST /analyze/folder, GET /analyze/status/{job_id}
- [ ] `backend/src/api/routes/audio.py` - GET /audio/stream/{id}, GET /audio/download/{id}, GET /audio/waveform/{id}
- [ ] `backend/src/api/routes/similar.py` - GET /tracks/{id}/similar
- [ ] `backend/src/api/routes/stats.py` - GET /stats (total tracks, genres distribution)

### 1.6 Backend utils
- [ ] `backend/src/utils/__init__.py`
- [ ] `backend/src/utils/config.py` - Pydantic Settings for env vars
- [ ] `backend/src/utils/logging.py` - Logging setup
- [ ] `backend/src/utils/validators.py` - Audio file validation

### 1.7 Frontend structure (Next.js 14)
- [ ] `npx create-next-app@latest frontend --typescript --tailwind --app --no-src-dir`
- [ ] `cd frontend && npm install`
- [ ] Install deps: `shadcn-ui`, `@tanstack/react-query`, `zustand`, `axios`, `lucide-react`, `recharts`
- [ ] `npx shadcn-ui@latest init`
- [ ] Add shadcn components: button, input, slider, select, card, dialog, progress, toast

### 1.8 Frontend structure details
- [ ] `frontend/app/layout.tsx` - Root layout with QueryClientProvider
- [ ] `frontend/app/page.tsx` - Main library view
- [ ] `frontend/app/tracks/[id]/page.tsx` - Track detail page
- [ ] `frontend/components/SearchBar.tsx`
- [ ] `frontend/components/FilterPanel.tsx`
- [ ] `frontend/components/TrackCard.tsx`
- [ ] `frontend/components/TrackDetails.tsx`
- [ ] `frontend/components/AudioPlayer.tsx`
- [ ] `frontend/components/WaveformDisplay.tsx`
- [ ] `frontend/components/BatchScanner.tsx`
- [ ] `frontend/components/SimilarTracks.tsx`
- [ ] `frontend/lib/api.ts` - Axios client with base URL
- [ ] `frontend/lib/types.ts` - TypeScript interfaces
- [ ] `frontend/hooks/useSearch.ts`
- [ ] `frontend/hooks/useTracks.ts`
- [ ] `frontend/hooks/useAudioPlayer.ts`
- [ ] `frontend/.env.local.example`

---

## Phase 2: Database Schema & Migrations

### 2.1 PostgreSQL setup
- [ ] `docker-compose.yml`: service postgres with pgvector image `pgvector/pgvector:pg16`
- [ ] Expose port 5432
- [ ] Volume for persistence: `postgres_data:/var/lib/postgresql/data`
- [ ] Init script: `backend/init-db.sql` with CREATE EXTENSION vector

### 2.2 SQLAlchemy models
- [ ] Define `AudioTrack` model in `schema.py`:
  - id: UUID (PK)
  - filepath: String (unique, indexed)
  - filename: String
  - duration_seconds: Float
  - file_size_bytes: Integer
  - format: String (mp3/wav)
  - analyzed_at: DateTime
  - tempo_bpm: Float
  - key: String
  - time_signature: String
  - energy: Float
  - danceability: Float
  - valence: Float
  - loudness_lufs: Float
  - spectral_centroid: Float
  - zero_crossing_rate: Float
  - genre_primary: String (indexed)
  - genre_secondary: ARRAY[String]
  - genre_confidence: Float
  - mood_primary: String (indexed)
  - mood_secondary: ARRAY[String]
  - mood_arousal: Float
  - mood_valence: Float
  - instruments: ARRAY[String]
  - has_vocals: Boolean
  - vocal_gender: String (nullable)
  - embedding: Vector(512) (nullable, for future CLAP)
  - embedding_model: String (nullable)
  - metadata: JSON
- [ ] Create indexes: filepath, genre_primary, mood_primary, tempo_bpm

### 2.3 Alembic migrations
- [ ] `alembic init backend/src/alembic`
- [ ] Configure `alembic.ini` with DB URL
- [ ] Create initial migration with schema above
- [ ] Add pgvector extension in migration

---

## Phase 3: Core Audio Processing

### 3.1 audio_processor.py - Librosa feature extraction
- [ ] Function `load_audio(filepath: str) -> Tuple[np.ndarray, int]`
- [ ] Function `extract_tempo(y, sr) -> float` - librosa.beat.tempo
- [ ] Function `extract_key(y, sr) -> str` - librosa.feature.chroma_cqt + key detection
- [ ] Function `extract_spectral_features(y, sr) -> dict`:
  - spectral_centroid
  - zero_crossing_rate
  - spectral_rolloff
  - spectral_bandwidth
- [ ] Function `extract_mfcc(y, sr) -> np.ndarray`
- [ ] Function `extract_chroma(y, sr) -> np.ndarray`
- [ ] Function `extract_energy(y, sr) -> float` - RMS energy
- [ ] Function `extract_all_features(filepath: str) -> dict` - orchestrator

### 3.2 essentia_classifier.py - Essentia TensorFlow models
- [ ] Download Essentia models (mtg-jamendo):
  - genre: https://essentia.upf.edu/models/classification-heads/mtg_jamendo_genre/mtg_jamendo_genre-discogs-effnet-1.pb
  - mood: https://essentia.upf.edu/models/classification-heads/mtg_jamendo_moodtheme/mtg_jamendo_moodtheme-discogs-effnet-1.pb
  - instrument: https://essentia.upf.edu/models/classification-heads/mtg_jamendo_instrument/mtg_jamendo_instrument-discogs-effnet-1.pb
- [ ] Store models in `backend/models/` directory
- [ ] Class `EssentiaClassifier`:
  - `__init__()`: load models
  - `predict_genre(audio_path: str) -> dict`: returns {primary, secondary[], confidence}
  - `predict_mood(audio_path: str) -> dict`: returns {primary, secondary[], arousal, valence}
  - `predict_instruments(audio_path: str) -> List[dict]`: returns [{name, confidence}, ...]
- [ ] Add model metadata files (class labels) in JSON

### 3.3 waveform_generator.py
- [ ] Function `generate_peaks(filepath: str, num_peaks: int = 800) -> List[float]`
  - Load audio with librosa
  - Downsample to num_peaks points
  - Return normalized amplitude values
- [ ] Cache peaks in JSON file next to audio (optional)

### 3.4 file_scanner.py
- [ ] Function `scan_folder(path: str, recursive: bool = True) -> List[str]`
  - Walk directory tree
  - Filter by extensions: .mp3, .wav, .flac, .m4a, .ogg
  - Return list of absolute paths
- [ ] Function `get_file_metadata(filepath: str) -> dict`
  - Use mutagen for ID3 tags
  - Return: filename, size, format

### 3.5 analyzer.py - Main orchestrator
- [ ] Class `AudioAnalyzer`:
  - `__init__()`
  - `analyze_file(filepath: str) -> AudioAnalysis`:
    1. Validate file exists and is audio
    2. Extract features (audio_processor)
    3. Classify genre/mood/instruments (essentia_classifier)
    4. Get file metadata (file_scanner)
    5. Return structured AudioAnalysis object
  - `analyze_folder(path: str, recursive: bool, progress_callback) -> List[AudioAnalysis]`:
    - Scan folder
    - Parallel processing with ThreadPoolExecutor (num_workers=4)
    - Progress updates
- [ ] Pydantic model `AudioAnalysis` matching JSON schema from architecture

---

## Phase 4: Database CRUD Operations

### 4.1 crud.py - CRUD functions
- [ ] `create_track(session, analysis: AudioAnalysis) -> AudioTrack`
- [ ] `get_track_by_id(session, track_id: UUID) -> Optional[AudioTrack]`
- [ ] `get_track_by_filepath(session, filepath: str) -> Optional[AudioTrack]`
- [ ] `get_tracks(session, skip: int, limit: int, filters: dict) -> List[AudioTrack]`
  - Support filters: genre, mood, bpm_min, bpm_max, energy_min, energy_max, has_vocals
- [ ] `search_tracks(session, query: str, filters: dict, limit: int) -> List[AudioTrack]`
  - Full-text search on: genre_primary, mood_primary, instruments, filename
  - Combined with filters
- [ ] `get_similar_tracks(session, track_id: UUID, limit: int) -> List[AudioTrack]`
  - If embeddings exist: vector similarity with pgvector
  - Fallback: similar genre + mood + BPM range
- [ ] `delete_track(session, track_id: UUID) -> bool`
- [ ] `get_stats(session) -> dict`
  - Total tracks
  - Genres distribution
  - Moods distribution
  - Average BPM
  - Total duration

---

## Phase 5: FastAPI Backend Implementation

### 5.1 config.py - Settings
- [ ] `class Settings(BaseSettings)`:
  - DATABASE_URL: str
  - CORS_ORIGINS: List[str]
  - ANALYSIS_USE_CLAP: bool = False
  - ANALYSIS_NUM_WORKERS: int = 4
  - ESSENTIA_MODELS_PATH: str
  - AUDIO_LIBRARY_PATH: str (optional default scan path)
- [ ] Load from `.env`

### 5.2 main.py - FastAPI app
- [ ] Create FastAPI app with metadata (title, version, description)
- [ ] Add CORS middleware (allow frontend origin)
- [ ] Add startup event: init DB engine, load Essentia models
- [ ] Add shutdown event: cleanup
- [ ] Include routers from routes/
- [ ] Health check endpoint: GET /health

### 5.3 routes/tracks.py
- [ ] `GET /api/tracks`:
  - Query params: skip, limit, genre, mood, bpm_min, bpm_max, energy_min, energy_max, has_vocals, sort_by
  - Return paginated list of tracks
  - Include total count
- [ ] `GET /api/tracks/{track_id}`:
  - Return full track details
  - 404 if not found
- [ ] `DELETE /api/tracks/{track_id}`:
  - Soft delete or hard delete (remove from DB only, keep file)
  - Return success

### 5.4 routes/search.py
- [ ] `GET /api/search`:
  - Query params: q (search query), genre, mood, bpm_min, bpm_max, limit
  - Full-text search + filters
  - Return matching tracks

### 5.5 routes/audio.py
- [ ] `GET /api/audio/stream/{track_id}`:
  - Get track from DB
  - Return FileResponse with media_type audio/mpeg
  - Support Range requests for seeking (Accept-Ranges: bytes)
  - headers: Content-Disposition: inline
- [ ] `GET /api/audio/download/{track_id}`:
  - Same as stream but Content-Disposition: attachment
- [ ] `GET /api/audio/waveform/{track_id}`:
  - Get track from DB
  - Generate or load cached peaks (waveform_generator)
  - Return JSON: {peaks: [], duration: float}

### 5.6 routes/analyze.py
- [ ] `POST /api/analyze/folder`:
  - Body: {path: str, recursive: bool}
  - Validate path exists
  - Start background job (asyncio Task or Celery)
  - Return job_id
- [ ] `GET /api/analyze/status/{job_id}`:
  - Return job status: {status: "pending|running|completed|failed", progress: int, total: int, errors: []}
- [ ] Background worker implementation:
  - Scan folder
  - For each file: analyze, save to DB (skip if already exists by filepath)
  - Update job status
  - Store job state in-memory dict or Redis

### 5.7 routes/similar.py
- [ ] `GET /api/tracks/{track_id}/similar`:
  - Query params: limit (default 10)
  - Get similar tracks (CRUD function)
  - Return list of tracks

### 5.8 routes/stats.py
- [ ] `GET /api/stats`:
  - Get stats (CRUD function)
  - Return JSON with counts, distributions

---

## Phase 6: Frontend Implementation

### 6.1 API client (lib/api.ts)
- [ ] Create axios instance with baseURL from env var (NEXT_PUBLIC_API_URL)
- [ ] API functions:
  - `getTracks(params: FilterParams): Promise<{tracks: Track[], total: number}>`
  - `getTrack(id: string): Promise<Track>`
  - `deleteTrack(id: string): Promise<void>`
  - `searchTracks(query: string, filters: FilterParams): Promise<Track[]>`
  - `getSimilarTracks(id: string, limit: number): Promise<Track[]>`
  - `analyzeFolder(path: string, recursive: boolean): Promise<{jobId: string}>`
  - `getAnalyzeStatus(jobId: string): Promise<JobStatus>`
  - `getStats(): Promise<Stats>`

### 6.2 TypeScript types (lib/types.ts)
- [ ] `interface Track` matching AudioTrack model
- [ ] `interface FilterParams`
- [ ] `interface JobStatus`
- [ ] `interface Stats`

### 6.3 Hooks
- [ ] `hooks/useTracks.ts`:
  - useQuery for fetching tracks with filters
  - Pagination state
  - Mutation for delete
- [ ] `hooks/useSearch.ts`:
  - Debounced search query
  - Combined filters state
- [ ] `hooks/useAudioPlayer.ts`:
  - Current track state
  - Play/pause/seek controls
  - Volume control
  - Queue management (optional)

### 6.4 Components - UI primitives (shadcn)
- [ ] Install shadcn components: button, input, slider, select, card, dialog, badge, progress, toast, dropdown-menu, tabs

### 6.5 SearchBar.tsx
- [ ] Input with search icon
- [ ] Debounced onChange (300ms)
- [ ] Clear button
- [ ] Optional: suggestions dropdown

### 6.6 FilterPanel.tsx
- [ ] Genre multi-select (fetch available genres from API or hardcode)
- [ ] Mood multi-select
- [ ] BPM range slider (min/max)
- [ ] Energy range slider
- [ ] Has vocals checkbox
- [ ] Sort by dropdown (Latest, BPM, Duration, Name)
- [ ] Clear all filters button

### 6.7 TrackCard.tsx
- [ ] Props: track: Track, onPlay, onDelete
- [ ] Display: filename, duration, BPM, genre, mood, instruments (badges)
- [ ] Inline AudioPlayer component
- [ ] Buttons: Play, Download, Similar, Details
- [ ] Hover effects

### 6.8 AudioPlayer.tsx
- [ ] Props: trackId, filename, duration
- [ ] HTML5 audio element with ref
- [ ] WaveformDisplay child component
- [ ] Progress slider (seek support)
- [ ] Play/Pause button
- [ ] Volume slider with icon
- [ ] Time display (current / total)
- [ ] Download button (calls /api/audio/download/{id})

### 6.9 WaveformDisplay.tsx
- [ ] Props: trackId, currentTime, duration
- [ ] Fetch peaks from /api/audio/waveform/{id}
- [ ] Canvas rendering:
  - Draw bars for each peak
  - Color played portion differently (blue vs gray)
  - Click to seek
- [ ] Loading state while fetching peaks

### 6.10 TrackDetails.tsx (Modal/Dialog)
- [ ] Props: trackId, open, onClose
- [ ] Fetch full track details
- [ ] Display all metadata in organized sections:
  - Audio info: duration, format, file size
  - Musical features: tempo, key, time signature, energy, danceability, valence
  - Classification: genre (primary + secondary), mood (primary + secondary + arousal/valence), instruments
  - Spectral features: spectral centroid, zero crossing rate, loudness
- [ ] Similar tracks section (preview)
- [ ] Download button

### 6.11 SimilarTracks.tsx
- [ ] Props: trackId, limit
- [ ] Fetch similar tracks
- [ ] Display as list of mini TrackCards
- [ ] Click to navigate or play

### 6.12 BatchScanner.tsx
- [ ] Input for folder path
- [ ] Recursive checkbox
- [ ] Scan button
- [ ] Progress bar (poll /api/analyze/status/{jobId})
- [ ] Status messages (pending, running X/Y, completed, errors)
- [ ] Error list if any

### 6.13 Main page (app/page.tsx)
- [ ] SearchBar at top
- [ ] FilterPanel in sidebar or collapsible
- [ ] BatchScanner in header or dedicated section
- [ ] TrackCard grid/list
- [ ] Pagination controls (Load More or page numbers)
- [ ] Total tracks count
- [ ] Loading states
- [ ] Empty state if no tracks

### 6.14 Track detail page (app/tracks/[id]/page.tsx)
- [ ] Fetch track by ID
- [ ] Large AudioPlayer
- [ ] Full metadata display (similar to TrackDetails modal)
- [ ] SimilarTracks section
- [ ] Back to library button

### 6.15 Layout (app/layout.tsx)
- [ ] QueryClientProvider setup
- [ ] Toast provider (for notifications)
- [ ] Global styles
- [ ] Header with app title and nav

---

## Phase 7: Docker & Deployment

### 7.1 docker-compose.yml
- [ ] Service: postgres
  - image: pgvector/pgvector:pg16
  - environment: POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DB
  - ports: 5432:5432
  - volumes: postgres_data, init-db.sql
- [ ] Service: backend
  - build: ./backend
  - depends_on: postgres
  - environment: DATABASE_URL
  - ports: 8000:8000
  - volumes: audio files mount (read-only)
- [ ] Service: frontend (optional, or dev mode only)
  - build: ./frontend
  - ports: 3000:3000
  - environment: NEXT_PUBLIC_API_URL=http://localhost:8000

### 7.2 Backend Dockerfile
- [ ] FROM python:3.11-slim
- [ ] Install system deps: ffmpeg, libsndfile1
- [ ] COPY requirements.txt
- [ ] RUN pip install -r requirements.txt
- [ ] COPY src/
- [ ] Download Essentia models during build or on startup
- [ ] CMD: uvicorn src.api.main:app --host 0.0.0.0 --port 8000

### 7.3 Frontend Dockerfile (production build)
- [ ] FROM node:20-alpine
- [ ] COPY package.json, package-lock.json
- [ ] RUN npm ci
- [ ] COPY app/, components/, lib/, hooks/, public/
- [ ] RUN npm run build
- [ ] CMD: npm start

---

## Phase 8: Documentation & Scripts

### 8.1 Root README.md
- [ ] Project description
- [ ] Features list
- [ ] Tech stack
- [ ] Prerequisites (Docker, Node, Python)
- [ ] Quick start:
  - Clone repo
  - Copy .env.example to .env
  - docker-compose up
  - Access frontend at localhost:3000
- [ ] Development setup
- [ ] API documentation link (FastAPI /docs)
- [ ] Architecture diagram (optional)

### 8.2 Backend README.md
- [ ] Setup instructions
- [ ] Environment variables documentation
- [ ] Essentia models download instructions
- [ ] API endpoints list
- [ ] Database schema
- [ ] Running migrations

### 8.3 Frontend README.md
- [ ] Setup instructions
- [ ] Environment variables
- [ ] Available scripts (dev, build, start)
- [ ] Component structure

### 8.4 Scripts
- [ ] `scripts/download-essentia-models.sh` - Download Essentia models
- [ ] `scripts/init-db.sh` - Run migrations
- [ ] `backend/src/cli.py` - CLI for manual analysis (optional)

---

## Phase 9: Testing & Validation

### 9.1 Backend tests (optional but recommended)
- [ ] Test audio_processor.extract_all_features with sample file
- [ ] Test essentia_classifier with sample file
- [ ] Test CRUD operations
- [ ] Test API endpoints with pytest + httpx

### 9.2 Frontend tests (optional)
- [ ] Test API client functions
- [ ] Test hooks
- [ ] Component tests with React Testing Library

### 9.3 Integration test
- [ ] Full flow: analyze folder -> save to DB -> search -> play -> download

---

## Phase 10: Optimizations & Polish

### 10.1 Performance
- [ ] Add database indexes
- [ ] Cache waveform peaks
- [ ] Optimize audio loading (lazy loading for large libraries)
- [ ] Add compression for API responses

### 10.2 UX improvements
- [ ] Loading skeletons
- [ ] Error boundaries
- [ ] Toast notifications for actions
- [ ] Keyboard shortcuts (space to play/pause, arrows to seek)
- [ ] Dark mode support

### 10.3 Backend improvements
- [ ] Rate limiting
- [ ] Request validation with Pydantic
- [ ] Logging (structured logs)
- [ ] Error handling middleware

---

## Implementation order priority

1. **Phase 2** (Database) - Foundation
2. **Phase 3** (Audio processing) - Core logic
3. **Phase 4** (CRUD) - Data layer
4. **Phase 5.1-5.2** (FastAPI setup) - API foundation
5. **Phase 5.3-5.8** (API routes) - Complete backend
6. **Phase 6.1-6.3** (Frontend setup + API client + hooks) - Frontend foundation
7. **Phase 6.4-6.12** (Components) - UI implementation
8. **Phase 6.13-6.15** (Pages) - Complete frontend
9. **Phase 7** (Docker) - Deployment
10. **Phase 8** (Documentation) - Final polish

---

## Notes for implementation

- Use type hints everywhere in Python
- Use TypeScript strict mode in frontend
- Handle errors gracefully (try/catch, proper HTTP status codes)
- Add logging at key points (file analysis start/end, DB operations)
- Validate file paths (security: prevent path traversal)
- Consider file locking for concurrent analysis
- Add progress updates for long operations
- Use environment variables for all config
- Keep audio files outside Docker volumes for performance
- Consider caching Essentia predictions (expensive)
- Add retry logic for failed analyses
- Support cancellation for long-running jobs

## Files to download/prepare before starting

1. Essentia models (3 files):
   - mtg_jamendo_genre-discogs-effnet-1.pb
   - mtg_jamendo_moodtheme-discogs-effnet-1.pb
   - mtg_jamendo_instrument-discogs-effnet-1.pb
2. Class labels JSON for each model
3. Sample audio files for testing

## External dependencies verification

- librosa: check version compatibility with numpy
- essentia-tensorflow: verify CPU-only build works
- pgvector: verify PostgreSQL extension installation
- FFmpeg: required by librosa for audio decoding

## Security considerations

- Validate all file paths (no ../ traversal)
- Sanitize user input in search queries
- Rate limit API endpoints
- CORS: whitelist frontend origin only
- Don't expose full filesystem paths in API responses
- Consider adding authentication later (JWT)

## Future enhancements (not in current scope)

- CLAP embeddings for semantic search
- Batch export to CSV/JSON
- Playlist creation
- Audio trimming/preview segments
- Duplicate detection (audio fingerprinting)
- Tag editing (write back to files)
- Multi-user support with authentication
- WebSocket for real-time analysis progress
- Audio visualization (spectrogram, chromagram)