Skip to main content

Enbee V2 Guide

A practical guide to using style instructions and inline tags with Enbee V2 voices for emotions, accents, vocal texture, pacing, and performance direction.

G
Written by Gerard Smith

Style Instructions Guide: Directing AI Voices With Emotion, Texture, and Performance

1. What this guide is for

Use this guide to direct Narration Box's Enbee V2 voices with style instructions.

This is a creative guide for making AI narration sound more human, intentional, cinematic, expressive, and context-aware.

Style instructions help you tell the voice how to perform a paragraph or block of text. They can guide emotion, energy, pacing, attitude, vocal texture, delivery style, intensity, and character presence.

The best results come from experimenting. Users are encouraged to experiment with emotions and expressions and see what works and test the limits.

This guide presents a description of all the possibilities we have seen and tested. Give this guide a go to see what is currently working to spark your curiosity.

Difference between Style Instructions and Inline Tags

Style instructions are a type of prompt that you would give to a whole text block. This controls the overall delivery of the paragraph or text block. These instructions will have a dedicated box on the top of every text block.

Style instructions guide the whole speaking style of the text block and work for only Enbee V2 Voices.

Inline tags are mostly one or seldom two word emotions written inside square brackets that guide the speaking style of a particular sentence or a word. These are used inside the text when you want a specific moment to shift emotion, pacing, or texture. For example:

They usually guide the speaking style of a word or a sentence that comes after the tag or in the line. How to implement the emotion, its placement and extension is judged by the narrator.

Important: inline tags are short expression cues, not full acting instructions.

Use the Style Instructions box when you want to describe the overall voice, tone, accent, mood, pacing, or performance style of a full block.

Use inline tags only when you want a short emotional or delivery shift inside the text.

Good inline tags are usually one word or a very short phrase.

Examples:

  • [whisper]

  • [laughs]

  • [sad]

  • [angry]

  • [excited]

  • [shocked]

  • [slow]

  • [sarcastic]

  • [sigh]

Do not write long acting directions inside square brackets.

Bad:

  • [held 3 sec, steady pitch]

  • [soft relieved exhale with deep vocal tone underneath]

  • [hushed conspiratorial tone]

  • [very deep chest-centered smoky throaty male shamanic voice]

  • [say this like a sad mother speaking slowly to her child]

These are too detailed for inline tags. Enbee V2 may ignore them, partially follow them, or read them aloud as normal text.

Better:

Style Instructions:

Speak in a deep, calm, meditative voice with slow pacing and warm authority.

Text:

Please. [sigh] Thank you.

Edge cases:

  • If a text block has style instructions and inline tags both, the narrator will pick and speak the overall emotion of style instructions but will also specify text areas around inline tags that need the tag’s particular emotion.

NOTE:

  • Inline tags work best when they are placed after the first word of the sentence or block.

  • Avoid placing the inline tag as the first word. When the inline tag is placed right at the beginning of the block, the narrator may read it as part of the text instead of treating it as a performance direction.

  • For best results, use English tags and English style instructions even when the transcript is in another language.

  • Inline tags should be short. If the tag contains a full sentence, commas, timing details, vocal technique, or a long emotional description, it may be read aloud or handled unpredictably.

    Good:

    “There, [whisper] there, everything is alright now.”

    Risky:

    “There, [hushed conspiratorial tone] there, everything is alright now.”

    Better:

    Style Instructions:

    Speak in a hushed, reassuring tone.

    Text:

    “There, [whisper] there, everything is alright now.”

    AI voices are context-aware. This means the voice does not only follow the style instruction or inline tag. It also reacts to the sentence itself. Use inline tags only where the emotional shift really matters.

2. What style instructions can do

Style instructions can help you shape:

  • Emotion: happy, sad, angry, anxious, romantic, fearful, excited, calm

  • Intensity: soft, explosive, restrained, dramatic, understated

  • Texture: raspy, breathless, trembling, clipped, warm, cold, growling

  • Delivery: persuasive, storytelling, documentary-style, conversational, intimate

  • Character mood: suspicious, commanding, fragile, sarcastic, heroic, sinister

  • Scene atmosphere: eerie, suspenseful, meditative, cinematic, haunting

  • Audience fit: audiobook, podcast, YouTube narration, e-learning, product demo, ad, meditation, fiction, character dialogue

The same text can feel completely different depending on the instruction.

Example:

Text: “I didn’t expect you to come back.”

Possible style instructions:

  • Speak in a soft tone

  • Speak like you are shocked

  • Speak like you are angry

  • Speak in a whispering way

,etc.

Each version creates a different character, scene, and meaning.

3. The basic rule: you can go simple or super specific

You can go for long prompts that describe emotions in detail or any kind of localized accents. for example:

  • Speak in a Parisian French-accent

  • Speak in a Mexican Spanish-accent

  • Speak in a Yorkshire accent, blunt

You can also be highly specific when the performance needs a particular identity, region, accent, format, character type, or scene style.

Useful prompt patterns:

  • Speak in a calm, serious, documentary-style voice

  • Narrate like a warm audiobook storyteller

  • Use a confident startup founder tone

  • Speak with a soft regional British accent

  • Sound like a relaxed coastal American speaker

  • Read this like a slow-burn mystery narrator

  • Speak like a polished product demo narrator

  • Use a gentle, patient teacher voice

  • Narrate with a formal newsreader tone

  • Speak with a casual conversational accent

More specific instructions are useful when the scene needs nuance:

  • Speaking like a detective revealing the final clue

  • Trying to stay calm while clearly scared

  • Narrating a dark horror scene in a low, tense voice

  • Sounding friendly but slightly suspicious

  • Explaining something complex with patient confidence

4. Emotion prompt library

This section gives users copyable style instructions grouped by emotional family. Use these as starting points, not fixed rules. The same prompt can behave slightly differently depending on the voice, the sentence, the language, and the emotional context of the paragraph.

For best results, test 2–4 nearby prompts before choosing the final one. You can definitely make your own prompt and test with different styles and emotions.

Happy / Positive

Use these when the scene needs joy, charm, celebration, relief, playfulness, optimism, or light emotional lift.

Style instructions:

  • Speak in a naturally happy and warm voice

  • Speak with a soft smile in your voice

  • Speak with gentle relief and quiet happiness

  • Speak in a softly happy, tender voice

  • Speak with light, easy joy

  • Speak with controlled excitement and steady energy

  • Speak with proud, heartfelt happiness

  • Speak with a subtle smirk and playful confidence

  • Speak as if gently chuckling while talking

  • Speak with amused warmth and light humor

  • Speak in a playful, lively voice

  • Speak with a teasing, affectionate tone

  • Speak with visible relief and softened emotion

  • Speak with natural surprise and curiosity

  • Speak in an encouraging, celebratory voice

  • Speak with calm optimism and positive energy

  • Speak with a mischievous, knowing tone

  • Speak with a triumphant, victorious voice

  • Speak in a sentimental, reflective voice

  • Speak with bright enthusiasm and energetic warmth

Voice textures:

  • Speak with giddy, excited happiness

  • Speak in a bubbly, cheerful voice

  • Speak with a sparkly, bright sense of joy

  • Speak in an airy, light, effortless voice

  • Speak while laughing naturally

  • Speak in a shy, gentle voice

  • Speak shyly while laughing softly

  • Speak with shy curiosity, as if asking a question

  • Speak with shy doubt and uncertainty

  • Speak shyly with a slight stutter

  • Speak as if coughing while laughing hard

  • Speak with a fake laugh while whispering

  • Speak while laughing with a low grunt

  • Speak with a romantic smile in your voice

  • Speak with natural hesitation and uncertainty

  • Speak with natural surprise and curiosity

Be careful with:

  • Forced contradiction

  • forced laughter where the text is not funny

Sad / Emotional

Use these when the scene needs grief, vulnerability, disappointment, regret, loneliness, memory, emotional heaviness, or quiet pain.

Style instructions:

  • Speak in a sad, heavy voice

  • Speak with heartbroken emotion and quiet pain

  • Speak as if crying while trying to continue

  • Speak with a teary, fragile voice

  • Speak as if choked up and holding back tears

  • Speak with deep grief and emotional weight

  • Speak in a lonely, distant voice

  • Speak with hurt and emotional tenderness

  • Speak with quiet disappointment

  • Speak with regret and softened sadness

  • Speak with guilt and emotional hesitation

  • Speak in a defeated, exhausted voice

  • Speak with hopelessness and fading energy

  • Speak in a melancholic, reflective voice

  • Speak with nostalgic sadness and warmth

  • Speak in a vulnerable, exposed voice

  • Speak with mourning and deep sorrow

  • Speak with wistful longing and quiet sadness

Voice textures:

  • Speak in a heavy voice

  • Speak in a fragile voice

  • Speak with grunts

  • Speak in a hollow voice

  • Speak with a trembling voice

  • Speak in a breathy voice

  • Speak in a low voice

  • Speak in a muted voice

  • Speak as if shivering with anxiety

  • Speak in a strained voice

  • Speak in a thin voice

  • Speak in a broken voice

  • Speak in a weary voice

  • Speak in a distant voice

  • Speak slowly and sadly

  • Speak with raw hatred

  • Speak in an empty voice

  • Speak in a subdued voice

  • Speak in a shaky voice

  • Speak in a paranoid voice

  • Speak with breathiness

  • Speak in a hoarse voice

  • Speak in a dismissive tone

Anger / Conflict

Use these when the scene needs irritation, rage, bitterness, confrontation, disgust, impatience, threat, or controlled intensity.

Style instructions:

  • Speak in an angry voice

  • Speak in an irritated voice

  • Speak in a frustrated voice

  • Speak in a furious voice

  • Speak with rage

  • Speak in a cold voice

  • Speak in a bitter voice

  • Speak in a resentful voice

  • Speak in a sarcastic voice

  • Speak as if snapping

  • Speak as if yelling

  • Speak in a scolding voice

  • Speak in a defensive voice

  • Speak in a threatening voice

  • Speak in a disgusted voice

  • Speak in an impatient voice

  • Speak as if seething

  • Speak in a commanding voice

Voice textures:

  • Speak in a sharp voice

  • Speak in a clipped voice

  • Speak in a tense voice

  • Speak in a harsh voice

  • Speak in an explosive voice

  • Speak through gritted teeth

  • Speak in a cutting voice

  • Speak in a rough voice

  • Speak as if boiling with anger

  • Speak in a staccato rhythm

  • Speak in a metallic and piercing voice

Fear / Anxiety

Use these when the scene needs panic, nervousness, dread, unease, hesitation, shock, paranoia, or desperation.

Style instructions:

  • Speak in an anxious voice

  • Speak in a nervous voice

  • Speak in a panicked voice

  • Speak in a terrified voice

  • Speak in a shocked voice

  • Speak hesitantly

  • Speak in an uneasy voice

  • Speak in a paranoid voice

  • Speak breathlessly

  • Speak with a trembling voice

  • Speak with a shivering voice

  • Speak in a pleading voice

  • Speak in a fearful whisper

  • Speak in a desperate voice

  • Speak in a startled voice

Voice textures:

  • Speak in a shaky voice

  • Speak in a tight voice

  • Speak in a breathy voice

  • Speak in a rushed voice

  • Speak in an uneven voice

  • Speak in a strained voice

  • Speak in an underconfident voice

  • Speak in a whispery voice

  • Speak with a jittery voice

  • Speak in a taut voice

  • Speak like you have a lump in the throat

Love / Affection / Intimacy

Use these when the scene needs tenderness, attraction, warmth, affection, longing, comfort, softness, or emotional closeness.

Style instructions:

  • Speak in a loving voice

  • Speak in a tender voice

  • Speak in a romantic voice

  • Speak in a flirty voice

  • Speak in an adoring voice

  • Speak in a caring voice

  • Speak in a fond voice

  • Speak in an intimate voice

  • Speak in a gentle voice

  • Speak in a comforting voice

  • Speak in a reassuring voice

  • Speak with a soft smile

  • Speak in a shy voice

  • Speak in a bashful voice

  • Speak with yearning

  • Speak in a vulnerable voice

  • Speak in a breathy voice

  • Speak in a sexy voice

  • Speak in a lusty voice

  • Speak in a delicate voice

Voice textures:

  • Speak in a melting voice

  • Speak in a hushed voice

  • Speak with vocal fry

  • Speak in a passionate voice

  • Speak in a playful and flirtatious voice

  • Speak with a smirk

  • Speak in an authoritative voice

  • Speak in a sultry voice

  • Speak in a smoky voice

  • Speak in an ASMR whisper

  • Speak in a lower pitch

Confidence / Authority

Use these when the voice needs to sound clear, firm, persuasive, polished, grounded, professional, heroic, or leader-like.

Style instructions:

  • Speak in a confident voice

  • Speak in a bold voice

  • Speak in a commanding voice

  • Speak in a firm voice

  • Speak in a serious voice

  • Speak in a determined voice

  • Speak in a heroic voice

  • Speak with calm authority

  • Speak in a persuasive voice

  • Speak in a professional voice

  • Speak like a leader

  • Speak in a grounded voice

  • Speak in a strong voice with vocal fry

  • Speak in a polished voice

  • Speak in a serious and unhappy voice

  • Speak with concern

Voice textures:

  • Speak while laughing with confidence

  • Speak while laughing with a grunt

  • Speak with a smirk

  • Speak with intimidation

  • Speak in a disgusted voice

Mystery / Dark / Horror

Use these when the scene needs dread, darkness, mystery, danger, suspense, evil presence, or cinematic horror.

Style instructions:

  • Speak in a dark voice

  • Speak in an ominous voice

  • Speak in a sinister voice

  • Speak in a suspenseful voice

  • Speak in a haunting voice

  • Speak in an eerie voice

  • Speak in a whispering voice

  • Speak in a dangerous voice

  • Speak in a menacing voice

  • Speak in a mysterious voice

  • Speak in a cold voice

  • Speak in a brooding voice

  • Speak in a grim voice

  • Speak in a foreboding voice

  • Speak with a growling voice

  • Speak in a low and tense voice

Voice textures:

  • Speak with fearful stammering

  • Speak with an evil laugh

  • Speak with a sinister laugh

  • Speak in a gravelly voice

  • Speak while whispering and laughing

  • Speak in a shadowy voice

  • Speak in a screeching voice

  • Speak in a raspy voice

  • Speak in a heavy voice

Calm / Meditation / Soft Narration

Use these when the voice needs to feel peaceful, centered, slow, gentle, reassuring, or emotionally steady.

Style instructions:

  • Speak in a calm voice

  • Speak in a meditative voice

  • Speak in a soothing voice

  • Speak in a peaceful voice

  • Speak in a grounded voice

  • Speak slowly and gently

  • Speak in a soft and steady voice

  • Speak in a warm and relaxed voice

  • Speak in a reassuring voice

  • Speak with quiet confidence

5. What not to do

Do not use square brackets for full acting directions

Inline tags are not the same as Style Instructions.

Do not put detailed vocal direction, long descriptions, timing instructions, or full performance notes inside square brackets.

Bad:

  • [held 4 sec, steady pitch, fading breath] Thank you.

  • [soft relieved exhale with deep vocal tone underneath] Thank you.

  • [very deep chest-centered smoky throaty male shamanic voice with dark warmth] Please.

  • [commanding but warm with elongated vowels and ancient ceremonial stillness] Listen.

These should go in Style Instructions, not inside the script.

Better:

Style Instructions:

Speak in a deep, smoky, meditative voice with calm authority and slow pacing.

Text:

[slow] Please. (2s pause) Thank you. (3s pause) [sigh] Thank you.

Rule of thumb:

  • If it describes the whole performance, put it in Style Instructions.

  • If it describes a short moment, use a short inline tag.

  • If it is a pause, use the pause format, such as:

    • (1s pause)

    • (2s pause)

    • (4s pause)

Do not over-prompt every sentence

Bad:

I [happy] opened the door. I [excited] saw the box. I [shocked] picked it up. I [curious] looked inside.

The Narrator automatically picks up emotions that need to be considered and speak with the required tone, style instructions and inline tags.

Do not stack too many emotions in one instruction

Avoid:

happy, sad, angry, romantic, nervous, calm, excited, sarcastic

Do not expect impossible physical actions

The voice can suggest laughter, fear, breathiness, tension, whispering, softness, or intensity.

But it cannot always produce every long and very extreme physical sound. But it can in some cases, you are encouraged to test the limits of the voices.

For example, prompts like these may be inconsistent but can also produce some really creative results:

  • Crying loudly for 10 seconds

  • Screaming continuously

  • Coughing repeatedly

  • Singing a full melody

  • Perfectly imitating a celebrity

  • Producing exact sound effects

Use style instructions for performance direction, not full sound design.

Do not use vague creative instructions when you need precision

Vague:

  • Make it better

  • More human

  • Cinematic

  • Better

  • Tense voice

  • Soft voice

  • Energetic

  • Serious

6. When style instructions may not work perfectly

Results may vary depending on:

  • The selected voice

  • The length of the paragraph

  • The emotional clarity of the sentence

  • Whether the instruction matches the text

  • Whether too many unrelated tags are used together

  • Whether the instruction asks for something too physical or unrealistic

Good voice direction works best when the writing, prompt, and voice choice are aligned.

7. Accents and regional voice direction

Style instructions can also be used to guide accent, region, and speaking style.

This is useful when your content needs a voice that feels more local, more character-specific, or better matched to the audience. You can ask for broad accents, regional accents, or a softer hint of an accent.

Examples of accent prompt patterns:

  • Speak with a British accent

  • Speak with a soft British accent

  • Speak with a Yorkshire accent

  • Speak with a Southern English accent

  • Speak with a London accent

  • Speak with an Irish accent

  • Speak with a Scottish accent

  • Speak with a Welsh accent

  • Speak with an American accent

  • speak with a Southern American accent

  • Speak with a New York accent

  • Speak with a Californian accent

  • Speak with a neutral US accent

  • Speak with a Canadian accent

  • Speak with an Australian accent

  • Speak with a New Zealand accent

  • Speak with an Indian English accent

  • Speak with a neutral Indian English accent

  • Speak with a South African accent

  • Speak with a French accent

  • Speak with a Spanish accent

  • Speak with an Italian accent

  • Speak with a German accent

and more...

Accent prompting is experimental. Some voices will respond more strongly than others. Some voices may produce only a light regional flavor instead of a perfect accent. That is normal.

For best results:

  1. Test the accent on a short paragraph first.

  2. Try both broad and specific versions.

  3. Combine the accent with the emotional style.

  4. Avoid stacking too many accent and emotion instructions together.

  5. Use English accent instructions even when the transcript is in another language.

8. Best practices

  • Use English tags, even for non-English transcripts.

  • Put inline tags after the first word, not at the start of the block.

  • Use inline tags for specific emotional moments.

  • Match the emotion to the actual writing.

  • Test short samples before generating long audio.

  • Try multiple versions of the same prompt.

  • Pick voices that naturally fit the use case.

Did this answer your question?