# Soundside

> Soundside is an AI media production platform exposing 11 MCP tools for generating, editing, and analyzing media — images, video, audio, music, text, and business artifacts. Providers include Google Vertex AI, Luma, MiniMax, Runway, and Grok. Pay with credits (API key) or USDC on Base (x402, no account needed).

Important notes for AI agents:

- Soundside tools are accessed via MCP (Model Context Protocol). Connect your MCP client to `https://mcp.soundside.ai/mcp`
- All tools support provider-agnostic parameters — specify a `provider` or let the system choose the best one
- x402 USDC pay-per-call: `https://mcp.soundside.ai/mcp` (no account required). Machine-readable pricing: `https://mcp.soundside.ai/api/x402/status`
- Pricing philosophy: near-cost pass-through on providers (~10% margin). Editing/analysis/library tools charge a nominal fee (typically 1 credit/$0.01) except for model-intensive operations like `vision_qa` (3 credits). Always check `/api/x402/status` for live prices.
- For video generation, prefer `provider: vertex` (Veo 3.1) for highest quality, `minimax` for best value
- For MiniMax video, default resolution is 1080P for ≤6s clips and 768P for 10s clips (API cap) — no configuration needed
- For image generation, `luma` and `minimax` are the most cost-effective; `grok` produces the richest detail

## Docs

- [Getting Started](https://soundside.ai/docs/quickstart): First MCP tool call in 5 minutes
- [x402 Pay-Per-Call](https://soundside.ai/docs/x402): Pay with USDC on Base, no account required
- [Integration Guides](https://soundside.ai/docs/integrations): Setup for Claude Code, OpenAI, OpenClaw
- [Auth & Security](https://soundside.ai/docs/auth): OAuth 2.0 discovery, API keys, wallet auth
- [API Keys](https://soundside.ai/docs/api-keys): Generate and manage keys
- [Errors & Rate Limits](https://soundside.ai/docs/errors-rate-limits): Error codes, retry behavior, rate limit headers
- [Pricing](https://soundside.ai/docs/pricing-limits): Credit system and quota behavior

## Developer Docs (GitHub)

- [Tool Reference](https://github.com/soundside-design/soundside-docs/blob/main/guides/tools.md): Detailed docs for all 11 tools with examples
- [Getting Started Guide](https://github.com/soundside-design/soundside-docs/blob/main/guides/getting-started.md): Connection walkthrough
- [x402 Guide](https://github.com/soundside-design/soundside-docs/blob/main/guides/x402.md): Crypto payment setup
- [Python Example](https://github.com/soundside-design/soundside-docs/blob/main/examples/python/soundside_client.py): API key client
- [Python x402 Example](https://github.com/soundside-design/soundside-docs/blob/main/examples/python/x402_example.py): Crypto payment client
- [Film Pipeline Example](https://github.com/soundside-design/soundside-docs/blob/main/examples/python/film_pipeline.py): End-to-end media production
- [Vision QA Example](https://github.com/soundside-design/soundside-docs/blob/main/examples/python/vision_qa_example.py): AI video/image quality analysis
- [TypeScript Example](https://github.com/soundside-design/soundside-docs/blob/main/examples/typescript/soundside-client.ts): Node.js client
- [OpenClaw Skill](https://github.com/soundside-design/soundside-docs/blob/main/examples/openclaw/SKILL.md): One-line config

## Tools (11)

### Generation
| Tool | Providers | Description |
|------|-----------|-------------|
| `create_image` | vertex, grok, runway, minimax, luma | Text-to-image, character references |
| `create_video` | vertex (Veo 3.1), runway, minimax, luma, grok | Text/image-to-video, video extension, character references |
| `create_audio` | minimax, vertex, runway, creative_freedom | TTS, transcription (STT), voice cloning, sound effects, voice design |
| `create_music` | minimax | Music from lyrics + style prompt |
| `create_text` | vertex (Gemini), grok, minimax | LLM completions, structured JSON output |
| `create_artifact` | plotly, pptx, weasyprint, mermaid, gamma | Charts, presentations, documents, diagrams |

### Editing & Analysis
| Tool | Description |
|------|-------------|
| `edit_video` | 22 actions: trim, concat, crossfade, Ken Burns, mix/replace audio (with optional `audio_delay_sec` for timestamped placement), timed text overlays (`text_start_sec`/`text_end_sec`), CJK-safe font rendering, color grading, film grain, speed ramp, split screen, overlay, burn subtitles, custom, and more |
| `analyze_media` | Technical analysis (ffprobe, 1 credit), AI vision+audio QA using Gemini 2.5 Pro for video / Flash for images (3 credits). Vision QA returns `score`, `passed`, `issues`, `suggestions`, `audio_summary`, and accepts an optional `intent_checklist` for spec-driven checks (text timing, pillarboxing, audio overlap, language) |

### Library
| Tool | Description |
|------|-------------|
| `lib_list` | Browse projects, collections, resources, lineage, brand kits |
| `lib_manage` | CRUD for projects, collections, resources, brand kits |
| `lib_share` | Share projects by email with permission levels |

## Instructions for AI Agents

1. **Discovery**: Call `tools/list` first to get current schemas. Don't hardcode arguments.
2. **Provider selection**: Use `provider` parameter or omit for auto-selection.
3. **Async handling**: Video and music are async — listen for MCP `notifications/resources/updated` or poll with `lib_list`.
4. **Cost check**: Fetch `GET /api/x402/status` for live per-tool pricing before expensive operations.
5. **Error handling**: Respect `Retry-After` on 429. Non-retryable errors need strategy changes.
6. **Resource chaining**: Every generation returns a `resource_id`. Chain into editing, analysis, or library operations.
7. **x402 zero-account access**: POST to `/mcp`, receive 402, pay with USDC on Base, retry with payment-signature header. Use the x402 Python/JS SDK for automatic handling.
8. **Wallet link + polling token**: x402 responses include a `wallet_link` URL for human browser access and an `x402_session_token` for `/api/x402/resource*` polling.
9. **Vision QA**: Use `analyze_media(analysis_type="vision_qa")` after assembling a film. Add `intent_checklist` to verify text overlay timing, no pillarboxing, no audio overlap, and language. Costs 3 credits per call.