Skip to main content

Enbee V2 Style Instructions Guide

A practical guide to writing style instructions for Enbee V2 voices; emotions, accents, vocal texture, pacing, and performance direction.

G
Written by Gerard Smith

Style Instructions Guide: Directing AI Voices With Emotion, Texture, and Performance

1. What this guide is for

Use this guide to direct Narration Box voices with style instructions of all kinds.

This is a creative guide for making AI narration sound more human, intentional, cinematic, expressive, and context-aware.

Style instructions help you tell the voice how to perform a paragraph or block of text. They can guide emotion, energy, pacing, attitude, vocal texture, delivery style, intensity, and character presence.

The best results come from experimenting. Users are encouraged to experiment with emotions and expressions and see what works and test the limits. This guide presents a description of all the possibilities we have seen and tested. Give this guide a go to see what is currently working to spark your curiosity.

A single word can work.

A short descriptive phrase can work.

A more specific performance direction can also work.

The goal is to try different versions until the voice matches the scene, audience, and format.


2. What style instructions can do

Style instructions can help you shape:

  • Emotion: happy, sad, angry, anxious, romantic, fearful, excited, calm

  • Intensity: soft, explosive, restrained, dramatic, understated

  • Texture: raspy, breathless, trembling, clipped, warm, cold, growling

  • Delivery: persuasive, storytelling, documentary-style, conversational, intimate

  • Character mood: suspicious, commanding, fragile, sarcastic, heroic, sinister

  • Scene atmosphere: eerie, suspenseful, meditative, cinematic, haunting

  • Audience fit: audiobook, podcast, YouTube narration, e-learning, product demo, ad, meditation, fiction, character dialogue

The same text can feel completely different depending on the instruction.

Example:

Text: “I didn’t expect you to come back.”

Possible style instructions:

  • [soft]

  • [shocked]

  • [angry]

  • [whispering]

  • [cold]

  • [heartbroken]

  • [sinister]

  • [relieved]

Each version creates a different character, scene, and meaning.


3. The basic rule: you can go simple or super specific

You can go for long prompts that describe emotions in detail or short. If your prompt can be diluted to one or two words, you are advised to shorten but our voices also work on super specific prompts, for example:

  • Parisian French-accent

  • Mexican Spanish-accent

  • Yorkshire accent, blunt

A single word also works:

  • excited

  • calm

  • angry

  • nervous

  • romantic

  • eerie

  • confident

  • sad

Short descriptive phrases work too:

  • quietly excited

  • barely holding back tears

  • speaking with calm authority

  • nervous but trying to sound confident

  • warm and reassuring

  • low and threatening

  • softly amused

  • tired but determined

You can also be highly specific when the performance needs a particular identity, region, accent, format, character type, or scene style.

For example, a style instruction can describe:

  • the kind of narrator

  • the kind of speaker

  • the emotional state

  • the accent or regional sound

  • the social setting

  • the level of confidence

  • the level of warmth

  • the intended audience

  • the genre of the content

  • the pace and energy of the delivery

This means users are not limited to simple emotions. They can experiment with more detailed voice direction when they want a very specific result.

Useful prompt patterns:

  • Speak in a calm, serious, documentary-style voice

  • Narrate like a warm audiobook storyteller

  • Use a confident startup founder tone

  • Speak with a soft regional British accent

  • Sound like a relaxed coastal American speaker

  • Read this like a slow-burn mystery narrator

  • Speak like a polished product demo narrator

  • Use a gentle, patient teacher voice

  • Narrate with a formal newsreader tone

  • Speak with a casual conversational accent

More specific instructions are useful when the scene needs nuance:

  • speaking like a detective revealing the final clue

  • trying to stay calm while clearly scared

  • narrating a dark horror scene in a low, tense voice

  • sounding friendly but slightly suspicious

  • explaining something complex with patient confidence

The best workflow is:

  1. Start with a simple emotion.

  2. Test the output.

  3. Add texture or intensity or nuances/details

  4. Add accent, region, or narrator type if needed.

  5. Test again.

  6. Adjust based on the content and voice.

Sometimes a specific narrator style or accent direction gives the voice the exact character the project needs or a very niche specific or situational description of the emotion.


4. Style instruction vs inline tag: how they work together

There are two ways to direct performance:

A. Style instruction for the full block

This controls the overall delivery of the paragraph or text block.

Example style instruction:

calm and reassuring

Text:

You are safe now. Take a deep breath. We are going to handle this one step at a time.

This is useful when the whole paragraph should carry the same emotional tone.


B. Inline tag inside the text

Inline tags are used inside the text when you want a specific moment to shift emotion, pacing, or texture.

Example:

I thought I was ready for this, [nervous] but the moment the door opened, I forgot how to breathe.

This is useful when the sentence changes emotional direction.


C. When both are used together

If a text block has both a style instruction and an inline tag, the voice will usually follow the inline tag at that specific moment.

Example style instruction:

calm

Text:

Everything looked normal at first. [whispering] Then I heard someone breathing behind the wall.

The block begins with the calm instruction, then the inline tag shapes the specific moment.


5. Important inline tag placement rule

Inline tags work best when they are placed after the first word of the sentence or block.

Recommended

I [whispering] knew someone was standing behind me.

The room [eerie] felt colder than it had a minute ago.

She [angry] told him never to come back.

Avoid placing the inline tag as the first word

Avoid:

[whispering] I knew someone was standing behind me.

When the inline tag is placed right at the beginning of the block, the narrator may read it as part of the text instead of treating it as a performance direction.

Practical rule

Write at least one normal word first, then add the inline tag.


6. Style instructions and inline tags in non-English text

For best results, use English tags and English style instructions even when the transcript is in another language.

Example Hindi text:

Mujhe [nervous] lag raha tha ki koi mere peeche khada hai.

Example Spanish text:

No sabía qué decir, pero [soft] quería que ella se quedara.

Example French text:

Il faisait sombre dans la pièce, et [whispering] quelqu’un respirait près de la porte.

The spoken content can be in another language, but the performance direction should remain in English.


7. Why experimentation matters

AI voices are context-aware. This means the voice does not only read the style instruction. It also reacts to the sentence itself.

For example, the instruction “happy” may sound different depending on the sentence:

I finally got the job.

You came back.

We won. We actually won.

I can’t believe this is happening.

The same style instruction can produce excitement, relief, disbelief, warmth, or joy depending on the words.

That is why testing matters.

A good prompt is not always the longest prompt. Often, the best instruction is the one that gives the voice just enough direction without over-controlling the performance.


8. How to experiment with prompts

For every emotion or delivery style, test three versions:

Version 1: Simple

sad

Version 2: Specific

quietly sad

Version 3: Performance-led

trying not to cry while speaking softly

Then compare which one fits the scene best.

For example:

Text:

I kept your letter all these years. I don’t know why. Maybe I was waiting for a reason to let go.

Prompt options:

  • sad

  • soft and heartbroken

  • holding back tears

  • fragile and reflective

  • quiet grief

Each one may work, but each one creates a different version of the scene.


9. Recommended prompt format

Keep style instructions short and clear.

Good:

  • warm

  • serious

  • excited

  • low and tense

  • soft and intimate

  • angry but controlled

  • nervous and breathless

  • cinematic horror narrator

  • confident product demo voice

Too much:

  • Please speak in a very emotionally complex way where the narrator sounds like they are sad but also happy and confused and scared and excited at the same time while slowly shifting their tone across the paragraph.

Better:

  • conflicted and emotional

  • sad but relieved

  • nervous but hopeful


10. Emotion prompt library

This section gives users copyable style instructions grouped by emotional family. Use these as starting points, not fixed rules. The same prompt can behave slightly differently depending on the voice, the sentence, the language, and the emotional context of the paragraph.

For best results, test 2–4 nearby prompts before choosing the final one. You can definitely make your own prompt and test with different words and emotions.


Happy / Positive

Use these when the scene needs joy, charm, celebration, relief, playfulness, optimism, or light emotional lift.

Style instructions:

  • happy

  • quietly smiling

  • relieved happiness

  • softly happy

  • light joy

  • excited but controlled

  • proud and happy

  • smirking

  • chuckling

  • amused

  • playful

  • teasing

  • relieved

  • surprised

  • cheering

  • optimistic

  • mischievous

  • triumphant

  • sentimental

  • enthusiastic

Voice textures:

  • giddy

  • bubbly

  • sparkly

  • airy

  • speak while laughing

  • shy

  • shy and laughing

  • shy and questioning

  • shy and doubtful

  • shy and stuttering

  • coughing while laughing hard

  • fake laughing and whispering

  • laughing with grunt

  • romantic smile while talking

  • hesitation

  • surprised

Example:

I [excited but controlled] can’t believe we actually did it. After all those late nights, all those failed attempts, this is finally real.

Good for:

  • celebration scenes

  • YouTube hooks

  • product wins

  • character banter

  • romantic comedy moments

  • founder/product update videos

Be careful with:

  • too much excitement across long passages

  • forced laughter where the text is not funny

  • making professional content sound too casual


Sad / Emotional

Use these when the scene needs grief, vulnerability, disappointment, regret, loneliness, memory, emotional heaviness, or quiet pain.

Style instructions:

  • sad

  • heartbroken

  • crying

  • teary

  • choked up

  • grieving

  • lonely

  • hurt

  • disappointed

  • regretful

  • guilty

  • defeated

  • hopeless

  • melancholic

  • nostalgic

  • vulnerable

  • mourning

  • wistful

Voice textures:

  • heavy

  • fragile

  • grunting

  • hollow

  • trembling

  • breathy

  • low

  • muted

  • shivering of anxiety

  • strained

  • thin

  • broken

  • weary

  • distant

  • slow and sad

  • raw hatred

  • empty

  • subdued

  • shaky

  • paranoid

  • breathiness

  • hoarse voice

  • dismissive

Example:

I [fragile] thought I had made peace with it. But when I saw the empty chair, everything came back at once.

Good for:

  • emotional audiobook passages

  • memoir narration

  • loss scenes

  • reflective nonfiction

  • quiet character moments

  • dramatic storytelling

Be careful with:

  • asking for crying too often

  • making long paragraphs too slow

  • using heavy sadness on informational content


Anger / Conflict

Use these when the scene needs irritation, rage, bitterness, confrontation, disgust, impatience, threat, or controlled intensity.

Style instructions:

  • angry

  • irritated

  • frustrated

  • furious

  • rage

  • cold

  • bitter

  • resentful

  • sarcastic

  • snapping

  • yelling

  • scolding

  • defensive

  • threatening

  • disgusted

  • impatient

  • seething

  • commanding

Voice textures:

  • sharp

  • clipped

  • tense

  • harsh

  • explosive

  • gritted

  • cutting

  • rough

  • boiling

  • staccato

  • metallic and piercing

Example:

You [cold] knew exactly what would happen, and you still let me walk into that room alone.

Good for:

  • confrontational dialogue

  • villain scenes

  • arguments

  • intense fiction

  • dramatic monologues

  • firm brand/product warnings

Be careful with:

  • overusing yelling

  • making every line aggressive

  • combining anger with too many other emotions

  • using rage where controlled anger would sound more natural


Fear / Anxiety

Use these when the scene needs panic, nervousness, dread, unease, hesitation, shock, paranoia, or desperation.

Style instructions:

  • anxious

  • nervous

  • panicked

  • terrified

  • shocked

  • hesitant

  • uneasy

  • paranoid

  • breathless

  • trembling

  • shivering

  • pleading

  • whispering fearfully

  • desperate

  • startled

Voice textures:

  • shaky

  • tight

  • breathy

  • rushed

  • uneven

  • strained

  • underconfident

  • whispery

  • jitter

  • taut

  • speak like you have a lump in the throat

Example:

I [trembling] tried to call out, but my voice broke before the words could leave my mouth.

Good for:

  • horror scenes

  • thriller narration

  • character panic

  • anxious confessions

  • high-stakes dialogue

  • suspenseful audiobook moments

Be careful with:

  • using panic for long sections

  • placing fear tags at the very beginning of the block

  • asking for extreme fear when the sentence itself is neutral


Love / Affection / Intimacy

Use these when the scene needs tenderness, attraction, warmth, affection, longing, comfort, softness, or emotional closeness.

Style instructions:

  • loving

  • tender

  • romantic

  • flirty

  • adoring

  • caring

  • fond

  • intimate

  • gentle

  • comforting

  • reassuring

  • soft smile

  • shy

  • bashful

  • yearning

  • vulnerable

  • breathy

  • sexy

  • lusty

  • delicate

Voice textures:

  • melting

  • hushed

  • vocal fry

  • passionate

  • playful and flirtatious

  • smirking

  • authoritative

  • sultry

  • smoky

  • ASMR whisper

  • lower pitch

Example:

I [soft smile] don’t know when it happened. I just know that every quiet moment started to feel better when you were there.

Good for:

  • romance scenes

  • intimate dialogue

  • emotional reassurance

  • relationship storytelling

  • soft audiobook narration

  • character-driven fiction

Be careful with:

  • making every romantic line breathy

  • using sexy or lusty prompts in the wrong content context

  • pushing intimacy where the text needs simple warmth


Confidence / Authority

Use these when the voice needs to sound clear, firm, persuasive, polished, grounded, professional, heroic, or leader-like.

Style instructions:

  • confident

  • bold

  • commanding

  • firm

  • serious

  • determined

  • heroic

  • calm authority

  • persuasive

  • professional

  • leader-like

  • grounded

  • strong with vocal fry

  • polished

  • serious with unhappy voice

  • concern

Voice textures:

  • laughing with confidence

  • laughing with grunt

  • talking with smirk

  • intimidation

  • disgust

Example:

We [calm authority] do not need to move faster. We need to move with discipline, clarity, and intent.

Good for:

  • product demos

  • business narration

  • founder videos

  • educational content

  • leadership scenes

  • documentary narration

  • sales and marketing videos

Be careful with:

  • making confidence sound arrogant

  • using intimidation in normal business content

  • adding vocal fry where the voice does not naturally support it


Mystery / Dark / Horror

Use these when the scene needs dread, darkness, mystery, danger, suspense, evil presence, or cinematic horror.

Style instructions:

  • dark

  • ominous

  • sinister

  • suspenseful

  • haunting

  • eerie

  • whispering

  • dangerous

  • menacing

  • mysterious

  • cold

  • brooding

  • grim

  • foreboding

  • growling

  • low and tense

Voice textures:

  • stammering of fear

  • evil laughing

  • sinister laugh

  • gravelly

  • whispering and laughing

  • shadowy

  • screeching

  • raspy

  • heavy

Example:

The hallway [eerie] stretched farther than it should have. At the end of it, something smiled in the dark.

Good for:

  • horror stories

  • thriller audiobooks

  • mystery narration

  • dark fantasy

  • villain dialogue

  • suspenseful YouTube narration

  • cinematic trailers

Be careful with:

  • overusing growling or screeching

  • asking for long non-verbal horror sounds

  • making the voice too theatrical for subtle suspense


Calm / Meditation / Soft Narration

Use these when the voice needs to feel peaceful, centered, slow, gentle, reassuring, or emotionally steady.

Style instructions:

  • calm

  • meditative

  • soothing

  • peaceful

  • grounded

  • slow and gentle

  • soft and steady

  • warm and relaxed

  • reassuring

  • quiet confidence

Example:

Take a slow breath in. Let your shoulders soften. For the next few moments, there is nowhere else you need to be.

Good for:

  • meditation audio

  • sleep stories

  • wellness narration

  • guided breathing

  • reflective audiobook sections

  • calm product walkthroughs

Be careful with:

  • making important instructional content too slow

  • using too much softness where clarity matters

  • repeating the same calm direction across every paragraph without variation


11. Prompting by use case

Audiobooks

Use style instructions to guide narration, character emotion, chapter tone, and scene intensity.

Recommended prompts:

  • reflective

  • intimate storyteller

  • serious and grounded

  • warm narrator

  • tense and cinematic

  • emotional but restrained

  • dark and suspenseful

Where it works well:

  • Fiction scenes

  • Character dialogue

  • Emotional turning points

  • Chapter openings

  • Suspense sequences

  • Reflective nonfiction passages

Where to be careful:

  • Do not over-tag every line.

  • Do not make every paragraph highly emotional.

  • Keep narration consistent unless the scene demands a shift.


YouTube narration

Use style instructions to increase retention, energy, and clarity.

Recommended prompts:

  • confident

  • energetic

  • curious

  • serious

  • conversational

  • persuasive

  • fast-paced

  • documentary-style

Where it works well:

  • Hooks

  • Explainers

  • Storytelling videos

  • Productivity content

  • Educational channels

  • Faceless YouTube narration

Where to be careful:

  • Too much excitement can sound forced.

  • Use contrast: serious moments should sound serious, not constantly energetic.


Podcasts

Use style instructions to create natural host-like delivery, relaxed conversation, and expressive storytelling.

Recommended prompts:

  • conversational

  • casually excited

  • reflective

  • curious

  • warm host

  • thoughtful

  • lightly amused

Where it works well:

  • Podcast intros

  • Host monologues

  • Interview-style narration

  • Commentary

  • Story-driven episodes

Where to be careful:

  • Avoid making both speakers sound equally excited all the time.

  • Use different energy levels for different speakers.


E-learning and training

Use style instructions to make learning content clear, patient, and easier to follow.

Recommended prompts:

  • clear and patient

  • calm teacher

  • confident instructor

  • warm and helpful

  • serious and professional

  • encouraging

Where it works well:

  • Course modules

  • Explainer lessons

  • Step-by-step tutorials

  • Safety training

  • Corporate learning

Where to be careful:

  • Avoid overly dramatic emotions unless the lesson requires storytelling.

  • Prioritize clarity over performance.


Product demos and ads

Use style instructions to make the voice persuasive, clean, benefit-led, and brand-safe.

Recommended prompts:

  • confident

  • crisp

  • persuasive

  • polished

  • warm and professional

  • energetic but clear

  • premium brand voice

Where it works well:

  • Product walkthroughs

  • Feature launches

  • Landing page videos

  • Explainer ads

  • App demos

Where to be careful:

  • Do not overdo sales energy.

  • Keep the delivery believable and useful.


12. What not to do

Do not over-prompt every sentence

Bad:

I [happy] opened the door. I [excited] saw the box. I [shocked] picked it up. I [curious] looked inside.

Better:

Style instruction:

excited and curious

Text:

I opened the door and saw the box waiting there. For a second, I just stared at it. Then I picked it up, wondering what could possibly be inside.


Do not stack too many emotions in one instruction

Avoid:

happy, sad, angry, romantic, nervous, calm, excited, sarcastic

Better:

conflicted

nervous but hopeful

angry but trying to stay calm


Do not expect impossible physical actions

The voice can suggest laughter, fear, breathiness, tension, whispering, softness, or intensity.

But it cannot always produce every long and very extreme physical sound. But it can in some cases, you are encouraged to test the limits of the voices.

For example, prompts like these may be inconsistent but can also produce some really creative results:

  • crying loudly for 10 seconds

  • screaming continuously

  • coughing repeatedly

  • singing a full melody

  • perfectly imitating a celebrity

  • producing exact sound effects

Use style instructions for performance direction, not full sound design.


Do not use vague creative instructions when you need precision

Vague:

make it better

more human

cinematic

Better:

low and tense

soft and reflective

energetic

serious and documentary-style

warm and reassuring


13. When style instructions may not work perfectly

Results may vary depending on:

  • The selected voice

  • The language of the transcript

  • The length of the paragraph

  • The emotional clarity of the sentence

  • Whether the instruction matches the text

  • Whether too many tags are used together

  • Whether the instruction asks for something too physical or unrealistic

For example, a sentence about a calm product tutorial may just not become convincing horror narration just because the style says “terrified.” The text and the style instruction should support each other.

Good voice direction works best when the writing, prompt, and voice choice are aligned.


14. Matching the prompt to the writing

The text itself matters.

Weak pairing:

Style instruction:

terrified

Text:

Today we will learn how to export an audio file.

Better pairing:

Style instruction:

calm instructor

Text:

Today we will learn how to export an audio file step by step.

Strong pairing:

Style instruction:

terrified

Text:

I heard the lock turn from the other side, even though I was alone in the house.

The prompt gives direction, but the sentence gives the voice something to perform.


15. Choosing the right voice for the prompt

Not every voice will respond the same way to every instruction.

Some voices are naturally better for:

  • Audiobook narration

  • Emotional fiction

  • Calm meditation

  • YouTube hooks

  • Corporate training

  • Dark cinematic narration

  • Romantic dialogue

  • Comedy or playful delivery

  • Product demos

A prompt can shape the voice, but the base voice still matters.

For best results:

  1. Choose a voice that already fits the general use case.

  2. Add a style instruction to guide the performance.

  3. Use inline tags only where the moment needs a shift.

  4. Test a short sample before generating long content.


16. Advanced prompting patterns

Emotion + intensity

  • mildly annoyed

  • quietly excited

  • deeply sad

  • extremely nervous

  • barely controlled anger

  • soft but intense

Emotion + contrast

  • angry but calm

  • nervous but trying to sound confident

  • sad but relieved

  • friendly but suspicious

  • romantic but hesitant

Delivery + use case

  • documentary-style narration

  • YouTube explainer voice

  • audiobook storyteller

  • calm meditation guide

  • premium brand narrator

  • confident product demo voice

Character + scene

  • old detective telling a secret

  • villain speaking softly

  • teacher explaining patiently

  • scared child whispering

  • exhausted soldier giving orders

Texture + performance

  • raspy and low

  • breathless and nervous

  • clipped and tense

  • warm and rounded

  • thin and distant

  • growling and dangerous


17. Full example: turning plain text into directed narration

Plain text

I walked into the room and saw the envelope on the table. Nobody had touched it. Nobody had even come near it. But somehow, it was open.

Version 1: Simple style instruction

suspenseful

Version 2: More specific

quiet and tense

Version 3: Inline tag added

I walked into the room and saw the envelope on the table. Nobody had touched it. Nobody had even come near it. But somehow, it [eerie] was open.

Version 4: More cinematic

Style instruction:

low and suspenseful

Text:

I walked into the room and saw the envelope on the table. Nobody had touched it. Nobody had even come near it. But somehow, it [eerie] was open.

This shows how style instruction and inline tags can work together without overloading the paragraph.


18. Practical workflow before generating long audio

Before generating a full chapter, course, podcast, or video script:

  1. Pick the right voice.

  2. Choose the broad emotional tone.

  3. Generate a short test paragraph.

  4. Try one simple prompt.

  5. Try one more specific prompt.

  6. Add inline tags only where needed.

  7. Compare outputs.

  8. Save the version that works.

  9. Apply it consistently across the larger project.

This avoids wasting time on long generations that do not match the intended style.


19. Marketing section: what these voices make possible

Narration Box voices are not limited to flat text-to-speech.

With style instructions and inline tags, creators can build:

  • Audiobooks with emotional chapter flow

  • Fiction scenes with fear, tension, warmth, romance, and conflict

  • YouTube narration that shifts between hook, explanation, and payoff

  • Podcasts with host-like delivery

  • Product demos that sound polished and persuasive

  • Meditation audio that feels calm and intentional

  • Training modules that sound clear and human

  • Horror stories with atmosphere and dread

  • Character dialogue with distinct emotional beats

The real power is not just generating a voice. It is directing a performance.


20. Accents and regional voice direction

Style instructions can also be used to guide accent, region, and speaking style.

This is useful when your content needs a voice that feels more local, more character-specific, or better matched to the audience. You can ask for broad accents, regional accents, or a softer hint of an accent.

Examples of accent prompt patterns:

  • speak with a British accent

  • speak with a soft British accent

  • speak with a Yorkshire accent

  • speak with a Southern English accent

  • speak with a London accent

  • speak with an Irish accent

  • speak with a Scottish accent

  • speak with a Welsh accent

  • speak with an American accent

  • speak with a Southern American accent

  • speak with a New York accent

  • speak with a Californian accent

  • speak with a neutral US accent

  • speak with a Canadian accent

  • speak with an Australian accent

  • speak with a New Zealand accent

  • speak with an Indian English accent

  • speak with a neutral Indian English accent

  • speak with a South African accent

  • speak with a French accent while speaking English

  • speak with a Spanish accent while speaking English

  • speak with an Italian accent while speaking English

  • speak with a German accent while speaking English

You can also combine accent with emotion or use case:

  • soft British accent, calm and reflective

  • neutral US accent, confident and polished

  • Indian English accent, warm and professional

  • Southern American accent, relaxed and conversational

  • Irish accent, intimate audiobook storyteller

  • Australian accent, friendly product narrator

  • London accent, sharp and sarcastic

  • Scottish accent, serious and grounded

Accent prompting is experimental. Some voices will respond more strongly than others. Some voices may produce only a light regional flavor instead of a perfect accent. That is normal.

For best results:

  1. Test the accent on a short paragraph first.

  2. Try both broad and specific versions.

  3. Combine the accent with the emotional style.

  4. Avoid stacking too many accent and emotion instructions together.

  5. Use English accent instructions even when the transcript is in another language.

Broad prompt:

British accent

More specific prompt:

soft Yorkshire accent, warm and conversational

Use accent directions when they help the listener believe the narrator, character, brand, or region. Do not use them just to make every voice sound different. The accent should support the content.


21. Realistic limits

Style instructions can dramatically improve performance, but they are not magic buttons.

They cannot guarantee:

  • Perfect acting in every voice

  • Identical output every time

  • Full sound effects

  • Long non-verbal acting sequences

  • Perfect results with unclear writing

  • Perfect control when too many emotions are stacked

  • The same emotional range across every voice

The best results come from pairing the right voice, the right writing, and the right instruction.


22. Best practices

  • Use English tags, even for non-English transcripts.

  • Put inline tags after the first word, not at the start of the block.

  • Use style instructions for the whole paragraph.

  • Use inline tags for specific emotional moments.

  • Keep prompts short unless the scene needs nuance.

  • Do not over-tag every sentence.

  • Match the emotion to the actual writing.

  • Test short samples before generating long audio.

  • Try multiple versions of the same prompt.

  • Pick voices that naturally fit the use case.

Now try a short sample with one voice, one paragraph, and three different style instructions.

Start simple. Then add texture. Then add one inline tag.

That is usually enough to turn plain narration into a directed performance.

Did this answer your question?