dialogue-audio
Multi-speaker dialogue audio creation with Dia TTS. Covers speaker tags, emotion control, pacing, conversation flow, and post-production. Use for: podcasts, audiobooks, explainers, character dialogue, conversational content. Triggers: dialogue audio, multi speaker, conversation audio, dia tts, two speakers, podcast audio, character voices, voice acting, dialogue generation, conversation tts, multi voice, speaker tags, dialogue recording
USE THIS SKILL
DOWNLOAD THE APP TO INSTALL AND USE /dialogue-audio ON YOUR DEVICE
Scan to open on your device
Opens skill content in Expo Go
COMMAND
/dialogue-audio
CATEGORY
Productivity
REPOSITORY
inf-sh/skills
COMMIT
—
SKILL PROMPT
---
name: dialogue-audio
description: "Multi-speaker dialogue audio creation with Dia TTS. Covers speaker tags, emotion control, pacing, conversation flow, and post-production. Use for: podcasts, audiobooks, explainers, character dialogue, conversational content. Triggers: dialogue audio, multi speaker, conversation audio, dia tts, two speakers, podcast audio, character voices, voice acting, dialogue generation, conversation tts, multi voice, speaker tags, dialogue recording"
allowed-tools: Bash(infsh *)
---
# Dialogue Audio
Create realistic multi-speaker dialogue with Dia TTS via [inference.sh](https://inference.sh) CLI.
## Quick Start
> Requires inference.sh CLI (`infsh`). Get installation instructions: `npx skills add inference-sh/skills@agent-tools`
```bash
infsh login
# Two-speaker conversation
infsh app run falai/dia-tts --input '{
"prompt": "[S1] Have you tried the new feature yet? [S2] Not yet, but I heard it saves a ton of time. [S1] It really does. I cut my workflow in half. [S2] Okay, I am definitely trying it today."
}'
```
## Speaker Tags
Dia TTS uses `[S1]` and `[S2]` to distinguish two speakers.
| Tag | Role | Voice |
|-----|------|-------|
| `[S1]` | Speaker 1 | Automatically assigned voice A |
| `[S2]` | Speaker 2 | Automatically assigned voice B |
**Rules:**
- Always start each speaker turn with the tag
- Tags must be uppercase: `[S1]` not `[s1]`
- Maximum 2 speakers per generation
- Each speaker maintains consistent voice within a session
## Emotion & Expression Control
Dia TTS interprets punctuation and non-speech cues for emotional delivery.
### Punctuation Effects
| Punctuation | Effect | Example |
|-------------|--------|---------|
| `.` | Neutral, declarative, medium pause | "This is important." |
| `!` | Emphasis, excitement, energy | "This is amazing!" |
| `?` | Rising intonation, questioning | "Are you sure about that?" |
| `...` | Hesitation, trailing off, long pause | "I thought it would work... but it didn't." |
| `,` |
[... prompt truncated for preview ...]