Style Instructions Guide: Directing AI Voices With Emotion, Texture, and Performance
1. What this guide is for
Use this guide to direct Narration Box voices with style instructions of all kinds.
This is a creative guide for making AI narration sound more human, intentional, cinematic, expressive, and context-aware.
Style instructions help you tell the voice how to perform a paragraph or block of text. They can guide emotion, energy, pacing, attitude, vocal texture, delivery style, intensity, and character presence.
The best results come from experimenting. Users are encouraged to experiment with emotions and expressions and see what works and test the limits. This guide presents a description of all the possibilities we have seen and tested. Give this guide a go to see what is currently working to spark your curiosity.
A single word can work.
A short descriptive phrase can work.
A more specific performance direction can also work.
The goal is to try different versions until the voice matches the scene, audience, and format.
2. What style instructions can do
Style instructions can help you shape:
Emotion: happy, sad, angry, anxious, romantic, fearful, excited, calm
Intensity: soft, explosive, restrained, dramatic, understated
Texture: raspy, breathless, trembling, clipped, warm, cold, growling
Delivery: persuasive, storytelling, documentary-style, conversational, intimate
Character mood: suspicious, commanding, fragile, sarcastic, heroic, sinister
Scene atmosphere: eerie, suspenseful, meditative, cinematic, haunting
Audience fit: audiobook, podcast, YouTube narration, e-learning, product demo, ad, meditation, fiction, character dialogue
The same text can feel completely different depending on the instruction.
Example:
Text: “I didn’t expect you to come back.”
Possible style instructions:
[soft]
[shocked]
[angry]
[whispering]
[cold]
[heartbroken]
[sinister]
[relieved]
Each version creates a different character, scene, and meaning.
3. The basic rule: you can go simple or super specific
You can go for long prompts that describe emotions in detail or short. If your prompt can be diluted to one or two words, you are advised to shorten but our voices also work on super specific prompts, for example:
Parisian French-accent
Mexican Spanish-accent
Yorkshire accent, blunt
A single word also works:
excited
calm
angry
nervous
romantic
eerie
confident
sad
Short descriptive phrases work too:
quietly excited
barely holding back tears
speaking with calm authority
nervous but trying to sound confident
warm and reassuring
low and threatening
softly amused
tired but determined
You can also be highly specific when the performance needs a particular identity, region, accent, format, character type, or scene style.
For example, a style instruction can describe:
the kind of narrator
the kind of speaker
the emotional state
the accent or regional sound
the social setting
the level of confidence
the level of warmth
the intended audience
the genre of the content
the pace and energy of the delivery
This means users are not limited to simple emotions. They can experiment with more detailed voice direction when they want a very specific result.
Useful prompt patterns:
Speak in a calm, serious, documentary-style voice
Narrate like a warm audiobook storyteller
Use a confident startup founder tone
Speak with a soft regional British accent
Sound like a relaxed coastal American speaker
Read this like a slow-burn mystery narrator
Speak like a polished product demo narrator
Use a gentle, patient teacher voice
Narrate with a formal newsreader tone
Speak with a casual conversational accent
More specific instructions are useful when the scene needs nuance:
speaking like a detective revealing the final clue
trying to stay calm while clearly scared
narrating a dark horror scene in a low, tense voice
sounding friendly but slightly suspicious
explaining something complex with patient confidence
The best workflow is:
Start with a simple emotion.
Test the output.
Add texture or intensity or nuances/details
Add accent, region, or narrator type if needed.
Test again.
Adjust based on the content and voice.
Sometimes a specific narrator style or accent direction gives the voice the exact character the project needs or a very niche specific or situational description of the emotion.
4. Style instruction vs inline tag: how they work together
There are two ways to direct performance:
A. Style instruction for the full block
This controls the overall delivery of the paragraph or text block.
Example style instruction:
calm and reassuring
Text:
You are safe now. Take a deep breath. We are going to handle this one step at a time.
This is useful when the whole paragraph should carry the same emotional tone.
B. Inline tag inside the text
Inline tags are used inside the text when you want a specific moment to shift emotion, pacing, or texture.
Example:
I thought I was ready for this, [nervous] but the moment the door opened, I forgot how to breathe.
This is useful when the sentence changes emotional direction.
C. When both are used together
If a text block has both a style instruction and an inline tag, the voice will usually follow the inline tag at that specific moment.
Example style instruction:
calm
Text:
Everything looked normal at first. [whispering] Then I heard someone breathing behind the wall.
The block begins with the calm instruction, then the inline tag shapes the specific moment.
5. Important inline tag placement rule
Inline tags work best when they are placed after the first word of the sentence or block.
Recommended
I [whispering] knew someone was standing behind me.
The room [eerie] felt colder than it had a minute ago.
She [angry] told him never to come back.
Avoid placing the inline tag as the first word
Avoid:
[whispering] I knew someone was standing behind me.
When the inline tag is placed right at the beginning of the block, the narrator may read it as part of the text instead of treating it as a performance direction.
Practical rule
Write at least one normal word first, then add the inline tag.
6. Style instructions and inline tags in non-English text
For best results, use English tags and English style instructions even when the transcript is in another language.
Example Hindi text:
Mujhe [nervous] lag raha tha ki koi mere peeche khada hai.
Example Spanish text:
No sabía qué decir, pero [soft] quería que ella se quedara.
Example French text:
Il faisait sombre dans la pièce, et [whispering] quelqu’un respirait près de la porte.
The spoken content can be in another language, but the performance direction should remain in English.
7. Why experimentation matters
AI voices are context-aware. This means the voice does not only read the style instruction. It also reacts to the sentence itself.
For example, the instruction “happy” may sound different depending on the sentence:
I finally got the job.
You came back.
We won. We actually won.
I can’t believe this is happening.
The same style instruction can produce excitement, relief, disbelief, warmth, or joy depending on the words.
That is why testing matters.
A good prompt is not always the longest prompt. Often, the best instruction is the one that gives the voice just enough direction without over-controlling the performance.
8. How to experiment with prompts
For every emotion or delivery style, test three versions:
Version 1: Simple
sad
Version 2: Specific
quietly sad
Version 3: Performance-led
trying not to cry while speaking softly
Then compare which one fits the scene best.
For example:
Text:
I kept your letter all these years. I don’t know why. Maybe I was waiting for a reason to let go.
Prompt options:
sad
soft and heartbroken
holding back tears
fragile and reflective
quiet grief
Each one may work, but each one creates a different version of the scene.
9. Recommended prompt format
Keep style instructions short and clear.
Good:
warm
serious
excited
low and tense
soft and intimate
angry but controlled
nervous and breathless
cinematic horror narrator
confident product demo voice
Too much:
Please speak in a very emotionally complex way where the narrator sounds like they are sad but also happy and confused and scared and excited at the same time while slowly shifting their tone across the paragraph.
Better:
conflicted and emotional
sad but relieved
nervous but hopeful
10. Emotion prompt library
This section gives users copyable style instructions grouped by emotional family. Use these as starting points, not fixed rules. The same prompt can behave slightly differently depending on the voice, the sentence, the language, and the emotional context of the paragraph.
For best results, test 2–4 nearby prompts before choosing the final one. You can definitely make your own prompt and test with different words and emotions.
Happy / Positive
Use these when the scene needs joy, charm, celebration, relief, playfulness, optimism, or light emotional lift.
Style instructions:
happy
quietly smiling
relieved happiness
softly happy
light joy
excited but controlled
proud and happy
smirking
chuckling
amused
playful
teasing
relieved
surprised
cheering
optimistic
mischievous
triumphant
sentimental
enthusiastic
Voice textures:
giddy
bubbly
sparkly
airy
speak while laughing
shy
shy and laughing
shy and questioning
shy and doubtful
shy and stuttering
coughing while laughing hard
fake laughing and whispering
laughing with grunt
romantic smile while talking
hesitation
surprised
Example:
I [excited but controlled] can’t believe we actually did it. After all those late nights, all those failed attempts, this is finally real.
Good for:
celebration scenes
YouTube hooks
product wins
character banter
romantic comedy moments
founder/product update videos
Be careful with:
too much excitement across long passages
forced laughter where the text is not funny
making professional content sound too casual
Sad / Emotional
Use these when the scene needs grief, vulnerability, disappointment, regret, loneliness, memory, emotional heaviness, or quiet pain.
Style instructions:
sad
heartbroken
crying
teary
choked up
grieving
lonely
hurt
disappointed
regretful
guilty
defeated
hopeless
melancholic
nostalgic
vulnerable
mourning
wistful
Voice textures:
heavy
fragile
grunting
hollow
trembling
breathy
low
muted
shivering of anxiety
strained
thin
broken
weary
distant
slow and sad
raw hatred
empty
subdued
shaky
paranoid
breathiness
hoarse voice
dismissive
Example:
I [fragile] thought I had made peace with it. But when I saw the empty chair, everything came back at once.
Good for:
emotional audiobook passages
memoir narration
loss scenes
reflective nonfiction
quiet character moments
dramatic storytelling
Be careful with:
asking for crying too often
making long paragraphs too slow
using heavy sadness on informational content
Anger / Conflict
Use these when the scene needs irritation, rage, bitterness, confrontation, disgust, impatience, threat, or controlled intensity.
Style instructions:
angry
irritated
frustrated
furious
rage
cold
bitter
resentful
sarcastic
snapping
yelling
scolding
defensive
threatening
disgusted
impatient
seething
commanding
Voice textures:
sharp
clipped
tense
harsh
explosive
gritted
cutting
rough
boiling
staccato
metallic and piercing
Example:
You [cold] knew exactly what would happen, and you still let me walk into that room alone.
Good for:
confrontational dialogue
villain scenes
arguments
intense fiction
dramatic monologues
firm brand/product warnings
Be careful with:
overusing yelling
making every line aggressive
combining anger with too many other emotions
using rage where controlled anger would sound more natural
Fear / Anxiety
Use these when the scene needs panic, nervousness, dread, unease, hesitation, shock, paranoia, or desperation.
Style instructions:
anxious
nervous
panicked
terrified
shocked
hesitant
uneasy
paranoid
breathless
trembling
shivering
pleading
whispering fearfully
desperate
startled
Voice textures:
shaky
tight
breathy
rushed
uneven
strained
underconfident
whispery
jitter
taut
speak like you have a lump in the throat
Example:
I [trembling] tried to call out, but my voice broke before the words could leave my mouth.
Good for:
horror scenes
thriller narration
character panic
anxious confessions
high-stakes dialogue
suspenseful audiobook moments
Be careful with:
using panic for long sections
placing fear tags at the very beginning of the block
asking for extreme fear when the sentence itself is neutral
Love / Affection / Intimacy
Use these when the scene needs tenderness, attraction, warmth, affection, longing, comfort, softness, or emotional closeness.
Style instructions:
loving
tender
romantic
flirty
adoring
caring
fond
intimate
gentle
comforting
reassuring
soft smile
shy
bashful
yearning
vulnerable
breathy
sexy
lusty
delicate
Voice textures:
melting
hushed
vocal fry
passionate
playful and flirtatious
smirking
authoritative
sultry
smoky
ASMR whisper
lower pitch
Example:
I [soft smile] don’t know when it happened. I just know that every quiet moment started to feel better when you were there.
Good for:
romance scenes
intimate dialogue
emotional reassurance
relationship storytelling
soft audiobook narration
character-driven fiction
Be careful with:
making every romantic line breathy
using sexy or lusty prompts in the wrong content context
pushing intimacy where the text needs simple warmth
Confidence / Authority
Use these when the voice needs to sound clear, firm, persuasive, polished, grounded, professional, heroic, or leader-like.
Style instructions:
confident
bold
commanding
firm
serious
determined
heroic
calm authority
persuasive
professional
leader-like
grounded
strong with vocal fry
polished
serious with unhappy voice
concern
Voice textures:
laughing with confidence
laughing with grunt
talking with smirk
intimidation
disgust
Example:
We [calm authority] do not need to move faster. We need to move with discipline, clarity, and intent.
Good for:
product demos
business narration
founder videos
educational content
leadership scenes
documentary narration
sales and marketing videos
Be careful with:
making confidence sound arrogant
using intimidation in normal business content
adding vocal fry where the voice does not naturally support it
Mystery / Dark / Horror
Use these when the scene needs dread, darkness, mystery, danger, suspense, evil presence, or cinematic horror.
Style instructions:
dark
ominous
sinister
suspenseful
haunting
eerie
whispering
dangerous
menacing
mysterious
cold
brooding
grim
foreboding
growling
low and tense
Voice textures:
stammering of fear
evil laughing
sinister laugh
gravelly
whispering and laughing
shadowy
screeching
raspy
heavy
Example:
The hallway [eerie] stretched farther than it should have. At the end of it, something smiled in the dark.
Good for:
horror stories
thriller audiobooks
mystery narration
dark fantasy
villain dialogue
suspenseful YouTube narration
cinematic trailers
Be careful with:
overusing growling or screeching
asking for long non-verbal horror sounds
making the voice too theatrical for subtle suspense
Calm / Meditation / Soft Narration
Use these when the voice needs to feel peaceful, centered, slow, gentle, reassuring, or emotionally steady.
Style instructions:
calm
meditative
soothing
peaceful
grounded
slow and gentle
soft and steady
warm and relaxed
reassuring
quiet confidence
Example:
Take a slow breath in. Let your shoulders soften. For the next few moments, there is nowhere else you need to be.
Good for:
meditation audio
sleep stories
wellness narration
guided breathing
reflective audiobook sections
calm product walkthroughs
Be careful with:
making important instructional content too slow
using too much softness where clarity matters
repeating the same calm direction across every paragraph without variation
11. Prompting by use case
Audiobooks
Use style instructions to guide narration, character emotion, chapter tone, and scene intensity.
Recommended prompts:
reflective
intimate storyteller
serious and grounded
warm narrator
tense and cinematic
emotional but restrained
dark and suspenseful
Where it works well:
Fiction scenes
Character dialogue
Emotional turning points
Chapter openings
Suspense sequences
Reflective nonfiction passages
Where to be careful:
Do not over-tag every line.
Do not make every paragraph highly emotional.
Keep narration consistent unless the scene demands a shift.
YouTube narration
Use style instructions to increase retention, energy, and clarity.
Recommended prompts:
confident
energetic
curious
serious
conversational
persuasive
fast-paced
documentary-style
Where it works well:
Hooks
Explainers
Storytelling videos
Productivity content
Educational channels
Faceless YouTube narration
Where to be careful:
Too much excitement can sound forced.
Use contrast: serious moments should sound serious, not constantly energetic.
Podcasts
Use style instructions to create natural host-like delivery, relaxed conversation, and expressive storytelling.
Recommended prompts:
conversational
casually excited
reflective
curious
warm host
thoughtful
lightly amused
Where it works well:
Podcast intros
Host monologues
Interview-style narration
Commentary
Story-driven episodes
Where to be careful:
Avoid making both speakers sound equally excited all the time.
Use different energy levels for different speakers.
E-learning and training
Use style instructions to make learning content clear, patient, and easier to follow.
Recommended prompts:
clear and patient
calm teacher
confident instructor
warm and helpful
serious and professional
encouraging
Where it works well:
Course modules
Explainer lessons
Step-by-step tutorials
Safety training
Corporate learning
Where to be careful:
Avoid overly dramatic emotions unless the lesson requires storytelling.
Prioritize clarity over performance.
Product demos and ads
Use style instructions to make the voice persuasive, clean, benefit-led, and brand-safe.
Recommended prompts:
confident
crisp
persuasive
polished
warm and professional
energetic but clear
premium brand voice
Where it works well:
Product walkthroughs
Feature launches
Landing page videos
Explainer ads
App demos
Where to be careful:
Do not overdo sales energy.
Keep the delivery believable and useful.
12. What not to do
Do not over-prompt every sentence
Bad:
I [happy] opened the door. I [excited] saw the box. I [shocked] picked it up. I [curious] looked inside.
Better:
Style instruction:
excited and curious
Text:
I opened the door and saw the box waiting there. For a second, I just stared at it. Then I picked it up, wondering what could possibly be inside.
Do not stack too many emotions in one instruction
Avoid:
happy, sad, angry, romantic, nervous, calm, excited, sarcastic
Better:
conflicted
nervous but hopeful
angry but trying to stay calm
Do not expect impossible physical actions
The voice can suggest laughter, fear, breathiness, tension, whispering, softness, or intensity.
But it cannot always produce every long and very extreme physical sound. But it can in some cases, you are encouraged to test the limits of the voices.
For example, prompts like these may be inconsistent but can also produce some really creative results:
crying loudly for 10 seconds
screaming continuously
coughing repeatedly
singing a full melody
perfectly imitating a celebrity
producing exact sound effects
Use style instructions for performance direction, not full sound design.
Do not use vague creative instructions when you need precision
Vague:
make it better
more human
cinematic
Better:
low and tense
soft and reflective
energetic
serious and documentary-style
warm and reassuring
13. When style instructions may not work perfectly
Results may vary depending on:
The selected voice
The language of the transcript
The length of the paragraph
The emotional clarity of the sentence
Whether the instruction matches the text
Whether too many tags are used together
Whether the instruction asks for something too physical or unrealistic
For example, a sentence about a calm product tutorial may just not become convincing horror narration just because the style says “terrified.” The text and the style instruction should support each other.
Good voice direction works best when the writing, prompt, and voice choice are aligned.
14. Matching the prompt to the writing
The text itself matters.
Weak pairing:
Style instruction:
terrified
Text:
Today we will learn how to export an audio file.
Better pairing:
Style instruction:
calm instructor
Text:
Today we will learn how to export an audio file step by step.
Strong pairing:
Style instruction:
terrified
Text:
I heard the lock turn from the other side, even though I was alone in the house.
The prompt gives direction, but the sentence gives the voice something to perform.
15. Choosing the right voice for the prompt
Not every voice will respond the same way to every instruction.
Some voices are naturally better for:
Audiobook narration
Emotional fiction
Calm meditation
YouTube hooks
Corporate training
Dark cinematic narration
Romantic dialogue
Comedy or playful delivery
Product demos
A prompt can shape the voice, but the base voice still matters.
For best results:
Choose a voice that already fits the general use case.
Add a style instruction to guide the performance.
Use inline tags only where the moment needs a shift.
Test a short sample before generating long content.
16. Advanced prompting patterns
Emotion + intensity
mildly annoyed
quietly excited
deeply sad
extremely nervous
barely controlled anger
soft but intense
Emotion + contrast
angry but calm
nervous but trying to sound confident
sad but relieved
friendly but suspicious
romantic but hesitant
Delivery + use case
documentary-style narration
YouTube explainer voice
audiobook storyteller
calm meditation guide
premium brand narrator
confident product demo voice
Character + scene
old detective telling a secret
villain speaking softly
teacher explaining patiently
scared child whispering
exhausted soldier giving orders
Texture + performance
raspy and low
breathless and nervous
clipped and tense
warm and rounded
thin and distant
growling and dangerous
17. Full example: turning plain text into directed narration
Plain text
I walked into the room and saw the envelope on the table. Nobody had touched it. Nobody had even come near it. But somehow, it was open.
Version 1: Simple style instruction
suspenseful
Version 2: More specific
quiet and tense
Version 3: Inline tag added
I walked into the room and saw the envelope on the table. Nobody had touched it. Nobody had even come near it. But somehow, it [eerie] was open.
Version 4: More cinematic
Style instruction:
low and suspenseful
Text:
I walked into the room and saw the envelope on the table. Nobody had touched it. Nobody had even come near it. But somehow, it [eerie] was open.
This shows how style instruction and inline tags can work together without overloading the paragraph.
18. Practical workflow before generating long audio
Before generating a full chapter, course, podcast, or video script:
Pick the right voice.
Choose the broad emotional tone.
Generate a short test paragraph.
Try one simple prompt.
Try one more specific prompt.
Add inline tags only where needed.
Compare outputs.
Save the version that works.
Apply it consistently across the larger project.
This avoids wasting time on long generations that do not match the intended style.
19. Marketing section: what these voices make possible
Narration Box voices are not limited to flat text-to-speech.
With style instructions and inline tags, creators can build:
Audiobooks with emotional chapter flow
Fiction scenes with fear, tension, warmth, romance, and conflict
YouTube narration that shifts between hook, explanation, and payoff
Podcasts with host-like delivery
Product demos that sound polished and persuasive
Meditation audio that feels calm and intentional
Training modules that sound clear and human
Horror stories with atmosphere and dread
Character dialogue with distinct emotional beats
The real power is not just generating a voice. It is directing a performance.
20. Accents and regional voice direction
Style instructions can also be used to guide accent, region, and speaking style.
This is useful when your content needs a voice that feels more local, more character-specific, or better matched to the audience. You can ask for broad accents, regional accents, or a softer hint of an accent.
Examples of accent prompt patterns:
speak with a British accent
speak with a soft British accent
speak with a Yorkshire accent
speak with a Southern English accent
speak with a London accent
speak with an Irish accent
speak with a Scottish accent
speak with a Welsh accent
speak with an American accent
speak with a Southern American accent
speak with a New York accent
speak with a Californian accent
speak with a neutral US accent
speak with a Canadian accent
speak with an Australian accent
speak with a New Zealand accent
speak with an Indian English accent
speak with a neutral Indian English accent
speak with a South African accent
speak with a French accent while speaking English
speak with a Spanish accent while speaking English
speak with an Italian accent while speaking English
speak with a German accent while speaking English
You can also combine accent with emotion or use case:
soft British accent, calm and reflective
neutral US accent, confident and polished
Indian English accent, warm and professional
Southern American accent, relaxed and conversational
Irish accent, intimate audiobook storyteller
Australian accent, friendly product narrator
London accent, sharp and sarcastic
Scottish accent, serious and grounded
Accent prompting is experimental. Some voices will respond more strongly than others. Some voices may produce only a light regional flavor instead of a perfect accent. That is normal.
For best results:
Test the accent on a short paragraph first.
Try both broad and specific versions.
Combine the accent with the emotional style.
Avoid stacking too many accent and emotion instructions together.
Use English accent instructions even when the transcript is in another language.
Broad prompt:
British accent
More specific prompt:
soft Yorkshire accent, warm and conversational
Use accent directions when they help the listener believe the narrator, character, brand, or region. Do not use them just to make every voice sound different. The accent should support the content.
21. Realistic limits
Style instructions can dramatically improve performance, but they are not magic buttons.
They cannot guarantee:
Perfect acting in every voice
Identical output every time
Full sound effects
Long non-verbal acting sequences
Perfect results with unclear writing
Perfect control when too many emotions are stacked
The same emotional range across every voice
The best results come from pairing the right voice, the right writing, and the right instruction.
22. Best practices
Use English tags, even for non-English transcripts.
Put inline tags after the first word, not at the start of the block.
Use style instructions for the whole paragraph.
Use inline tags for specific emotional moments.
Keep prompts short unless the scene needs nuance.
Do not over-tag every sentence.
Match the emotion to the actual writing.
Test short samples before generating long audio.
Try multiple versions of the same prompt.
Pick voices that naturally fit the use case.
Now try a short sample with one voice, one paragraph, and three different style instructions.
Start simple. Then add texture. Then add one inline tag.
That is usually enough to turn plain narration into a directed performance.