Style Instructions Guide: Directing AI Voices With Emotion, Texture, and Performance
1. What this guide is for
Use this guide to direct Narration Box's Enbee V2 voices with style instructions.
This is a creative guide for making AI narration sound more human, intentional, cinematic, expressive, and context-aware.
Style instructions help you tell the voice how to perform a paragraph or block of text. They can guide emotion, energy, pacing, attitude, vocal texture, delivery style, intensity, and character presence.
The best results come from experimenting. Users are encouraged to experiment with emotions and expressions and see what works and test the limits.
This guide presents a description of all the possibilities we have seen and tested. Give this guide a go to see what is currently working to spark your curiosity.
Difference between Style Instructions and Inline Tags
Style instructions are a type of prompt that you would give to a whole text block. This controls the overall delivery of the paragraph or text block. These instructions will have a dedicated box on the top of every text block.
Style instructions guide the whole speaking style of the text block and work for only Enbee V2 Voices.
Inline tags are mostly one or seldom two word emotions written inside square brackets that guide the speaking style of a particular sentence or a word. These are used inside the text when you want a specific moment to shift emotion, pacing, or texture. For example:
They usually guide the speaking style of a word or a sentence that comes after the tag or in the line. How to implement the emotion, its placement and extension is judged by the narrator.
Important: inline tags are short expression cues, not full acting instructions.
Use the Style Instructions box when you want to describe the overall voice, tone, accent, mood, pacing, or performance style of a full block.
Use inline tags only when you want a short emotional or delivery shift inside the text.
Good inline tags are usually one word or a very short phrase.
Examples:
[whisper]
[laughs]
[sad]
[angry]
[excited]
[shocked]
[slow]
[sarcastic]
[sigh]
Do not write long acting directions inside square brackets.
Bad:
[held 3 sec, steady pitch]
[soft relieved exhale with deep vocal tone underneath]
[hushed conspiratorial tone]
[very deep chest-centered smoky throaty male shamanic voice]
[say this like a sad mother speaking slowly to her child]
These are too detailed for inline tags. Enbee V2 may ignore them, partially follow them, or read them aloud as normal text.
Better:
Style Instructions:
Speak in a deep, calm, meditative voice with slow pacing and warm authority.
Text:
Please. [sigh] Thank you.
Edge cases:
If a text block has style instructions and inline tags both, the narrator will pick and speak the overall emotion of style instructions but will also specify text areas around inline tags that need the tag’s particular emotion.
NOTE:
Inline tags work best when they are placed after the first word of the sentence or block.
Avoid placing the inline tag as the first word. When the inline tag is placed right at the beginning of the block, the narrator may read it as part of the text instead of treating it as a performance direction.
For best results, use English tags and English style instructions even when the transcript is in another language.
Inline tags should be short. If the tag contains a full sentence, commas, timing details, vocal technique, or a long emotional description, it may be read aloud or handled unpredictably.
Good:
“There, [whisper] there, everything is alright now.”
Risky:
“There, [hushed conspiratorial tone] there, everything is alright now.”
Better:
Style Instructions:
Speak in a hushed, reassuring tone.
Text:
“There, [whisper] there, everything is alright now.”
AI voices are context-aware. This means the voice does not only follow the style instruction or inline tag. It also reacts to the sentence itself. Use inline tags only where the emotional shift really matters.
2. What style instructions can do
Style instructions can help you shape:
Emotion: happy, sad, angry, anxious, romantic, fearful, excited, calm
Intensity: soft, explosive, restrained, dramatic, understated
Texture: raspy, breathless, trembling, clipped, warm, cold, growling
Delivery: persuasive, storytelling, documentary-style, conversational, intimate
Character mood: suspicious, commanding, fragile, sarcastic, heroic, sinister
Scene atmosphere: eerie, suspenseful, meditative, cinematic, haunting
Audience fit: audiobook, podcast, YouTube narration, e-learning, product demo, ad, meditation, fiction, character dialogue
The same text can feel completely different depending on the instruction.
Example:
Text: “I didn’t expect you to come back.”
Possible style instructions:
Speak in a soft tone
Speak like you are shocked
Speak like you are angry
Speak in a whispering way
,etc.
Each version creates a different character, scene, and meaning.
3. The basic rule: you can go simple or super specific
You can go for long prompts that describe emotions in detail or any kind of localized accents. for example:
Speak in a Parisian French-accent
Speak in a Mexican Spanish-accent
Speak in a Yorkshire accent, blunt
You can also be highly specific when the performance needs a particular identity, region, accent, format, character type, or scene style.
Useful prompt patterns:
Speak in a calm, serious, documentary-style voice
Narrate like a warm audiobook storyteller
Use a confident startup founder tone
Speak with a soft regional British accent
Sound like a relaxed coastal American speaker
Read this like a slow-burn mystery narrator
Speak like a polished product demo narrator
Use a gentle, patient teacher voice
Narrate with a formal newsreader tone
Speak with a casual conversational accent
More specific instructions are useful when the scene needs nuance:
Speaking like a detective revealing the final clue
Trying to stay calm while clearly scared
Narrating a dark horror scene in a low, tense voice
Sounding friendly but slightly suspicious
Explaining something complex with patient confidence
4. Emotion prompt library
This section gives users copyable style instructions grouped by emotional family. Use these as starting points, not fixed rules. The same prompt can behave slightly differently depending on the voice, the sentence, the language, and the emotional context of the paragraph.
For best results, test 2–4 nearby prompts before choosing the final one. You can definitely make your own prompt and test with different styles and emotions.
Happy / Positive
Use these when the scene needs joy, charm, celebration, relief, playfulness, optimism, or light emotional lift.
Style instructions:
Speak in a naturally happy and warm voice
Speak with a soft smile in your voice
Speak with gentle relief and quiet happiness
Speak in a softly happy, tender voice
Speak with light, easy joy
Speak with controlled excitement and steady energy
Speak with proud, heartfelt happiness
Speak with a subtle smirk and playful confidence
Speak as if gently chuckling while talking
Speak with amused warmth and light humor
Speak in a playful, lively voice
Speak with a teasing, affectionate tone
Speak with visible relief and softened emotion
Speak with natural surprise and curiosity
Speak in an encouraging, celebratory voice
Speak with calm optimism and positive energy
Speak with a mischievous, knowing tone
Speak with a triumphant, victorious voice
Speak in a sentimental, reflective voice
Speak with bright enthusiasm and energetic warmth
Voice textures:
Speak with giddy, excited happiness
Speak in a bubbly, cheerful voice
Speak with a sparkly, bright sense of joy
Speak in an airy, light, effortless voice
Speak while laughing naturally
Speak in a shy, gentle voice
Speak shyly while laughing softly
Speak with shy curiosity, as if asking a question
Speak with shy doubt and uncertainty
Speak shyly with a slight stutter
Speak as if coughing while laughing hard
Speak with a fake laugh while whispering
Speak while laughing with a low grunt
Speak with a romantic smile in your voice
Speak with natural hesitation and uncertainty
Speak with natural surprise and curiosity
Be careful with:
Forced contradiction
forced laughter where the text is not funny
Sad / Emotional
Use these when the scene needs grief, vulnerability, disappointment, regret, loneliness, memory, emotional heaviness, or quiet pain.
Style instructions:
Speak in a sad, heavy voice
Speak with heartbroken emotion and quiet pain
Speak as if crying while trying to continue
Speak with a teary, fragile voice
Speak as if choked up and holding back tears
Speak with deep grief and emotional weight
Speak in a lonely, distant voice
Speak with hurt and emotional tenderness
Speak with quiet disappointment
Speak with regret and softened sadness
Speak with guilt and emotional hesitation
Speak in a defeated, exhausted voice
Speak with hopelessness and fading energy
Speak in a melancholic, reflective voice
Speak with nostalgic sadness and warmth
Speak in a vulnerable, exposed voice
Speak with mourning and deep sorrow
Speak with wistful longing and quiet sadness
Voice textures:
Speak in a heavy voice
Speak in a fragile voice
Speak with grunts
Speak in a hollow voice
Speak with a trembling voice
Speak in a breathy voice
Speak in a low voice
Speak in a muted voice
Speak as if shivering with anxiety
Speak in a strained voice
Speak in a thin voice
Speak in a broken voice
Speak in a weary voice
Speak in a distant voice
Speak slowly and sadly
Speak with raw hatred
Speak in an empty voice
Speak in a subdued voice
Speak in a shaky voice
Speak in a paranoid voice
Speak with breathiness
Speak in a hoarse voice
Speak in a dismissive tone
Anger / Conflict
Use these when the scene needs irritation, rage, bitterness, confrontation, disgust, impatience, threat, or controlled intensity.
Style instructions:
Speak in an angry voice
Speak in an irritated voice
Speak in a frustrated voice
Speak in a furious voice
Speak with rage
Speak in a cold voice
Speak in a bitter voice
Speak in a resentful voice
Speak in a sarcastic voice
Speak as if snapping
Speak as if yelling
Speak in a scolding voice
Speak in a defensive voice
Speak in a threatening voice
Speak in a disgusted voice
Speak in an impatient voice
Speak as if seething
Speak in a commanding voice
Voice textures:
Speak in a sharp voice
Speak in a clipped voice
Speak in a tense voice
Speak in a harsh voice
Speak in an explosive voice
Speak through gritted teeth
Speak in a cutting voice
Speak in a rough voice
Speak as if boiling with anger
Speak in a staccato rhythm
Speak in a metallic and piercing voice
Fear / Anxiety
Use these when the scene needs panic, nervousness, dread, unease, hesitation, shock, paranoia, or desperation.
Style instructions:
Speak in an anxious voice
Speak in a nervous voice
Speak in a panicked voice
Speak in a terrified voice
Speak in a shocked voice
Speak hesitantly
Speak in an uneasy voice
Speak in a paranoid voice
Speak breathlessly
Speak with a trembling voice
Speak with a shivering voice
Speak in a pleading voice
Speak in a fearful whisper
Speak in a desperate voice
Speak in a startled voice
Voice textures:
Speak in a shaky voice
Speak in a tight voice
Speak in a breathy voice
Speak in a rushed voice
Speak in an uneven voice
Speak in a strained voice
Speak in an underconfident voice
Speak in a whispery voice
Speak with a jittery voice
Speak in a taut voice
Speak like you have a lump in the throat
Love / Affection / Intimacy
Use these when the scene needs tenderness, attraction, warmth, affection, longing, comfort, softness, or emotional closeness.
Style instructions:
Speak in a loving voice
Speak in a tender voice
Speak in a romantic voice
Speak in a flirty voice
Speak in an adoring voice
Speak in a caring voice
Speak in a fond voice
Speak in an intimate voice
Speak in a gentle voice
Speak in a comforting voice
Speak in a reassuring voice
Speak with a soft smile
Speak in a shy voice
Speak in a bashful voice
Speak with yearning
Speak in a vulnerable voice
Speak in a breathy voice
Speak in a sexy voice
Speak in a lusty voice
Speak in a delicate voice
Voice textures:
Speak in a melting voice
Speak in a hushed voice
Speak with vocal fry
Speak in a passionate voice
Speak in a playful and flirtatious voice
Speak with a smirk
Speak in an authoritative voice
Speak in a sultry voice
Speak in a smoky voice
Speak in an ASMR whisper
Speak in a lower pitch
Confidence / Authority
Use these when the voice needs to sound clear, firm, persuasive, polished, grounded, professional, heroic, or leader-like.
Style instructions:
Speak in a confident voice
Speak in a bold voice
Speak in a commanding voice
Speak in a firm voice
Speak in a serious voice
Speak in a determined voice
Speak in a heroic voice
Speak with calm authority
Speak in a persuasive voice
Speak in a professional voice
Speak like a leader
Speak in a grounded voice
Speak in a strong voice with vocal fry
Speak in a polished voice
Speak in a serious and unhappy voice
Speak with concern
Voice textures:
Speak while laughing with confidence
Speak while laughing with a grunt
Speak with a smirk
Speak with intimidation
Speak in a disgusted voice
Mystery / Dark / Horror
Use these when the scene needs dread, darkness, mystery, danger, suspense, evil presence, or cinematic horror.
Style instructions:
Speak in a dark voice
Speak in an ominous voice
Speak in a sinister voice
Speak in a suspenseful voice
Speak in a haunting voice
Speak in an eerie voice
Speak in a whispering voice
Speak in a dangerous voice
Speak in a menacing voice
Speak in a mysterious voice
Speak in a cold voice
Speak in a brooding voice
Speak in a grim voice
Speak in a foreboding voice
Speak with a growling voice
Speak in a low and tense voice
Voice textures:
Speak with fearful stammering
Speak with an evil laugh
Speak with a sinister laugh
Speak in a gravelly voice
Speak while whispering and laughing
Speak in a shadowy voice
Speak in a screeching voice
Speak in a raspy voice
Speak in a heavy voice
Calm / Meditation / Soft Narration
Use these when the voice needs to feel peaceful, centered, slow, gentle, reassuring, or emotionally steady.
Style instructions:
Speak in a calm voice
Speak in a meditative voice
Speak in a soothing voice
Speak in a peaceful voice
Speak in a grounded voice
Speak slowly and gently
Speak in a soft and steady voice
Speak in a warm and relaxed voice
Speak in a reassuring voice
Speak with quiet confidence
5. What not to do
Do not use square brackets for full acting directions
Inline tags are not the same as Style Instructions.
Do not put detailed vocal direction, long descriptions, timing instructions, or full performance notes inside square brackets.
Bad:
[held 4 sec, steady pitch, fading breath] Thank you.
[soft relieved exhale with deep vocal tone underneath] Thank you.
[very deep chest-centered smoky throaty male shamanic voice with dark warmth] Please.
[commanding but warm with elongated vowels and ancient ceremonial stillness] Listen.
These should go in Style Instructions, not inside the script.
Better:
Style Instructions:
Speak in a deep, smoky, meditative voice with calm authority and slow pacing.
Text:
[slow] Please. (2s pause) Thank you. (3s pause) [sigh] Thank you.
Rule of thumb:
If it describes the whole performance, put it in Style Instructions.
If it describes a short moment, use a short inline tag.
If it is a pause, use the pause format, such as:
(1s pause)
(2s pause)
(4s pause)
Do not over-prompt every sentence
Bad:
I [happy] opened the door. I [excited] saw the box. I [shocked] picked it up. I [curious] looked inside.
The Narrator automatically picks up emotions that need to be considered and speak with the required tone, style instructions and inline tags.
Do not stack too many emotions in one instruction
Avoid:
happy, sad, angry, romantic, nervous, calm, excited, sarcastic
Do not expect impossible physical actions
The voice can suggest laughter, fear, breathiness, tension, whispering, softness, or intensity.
But it cannot always produce every long and very extreme physical sound. But it can in some cases, you are encouraged to test the limits of the voices.
For example, prompts like these may be inconsistent but can also produce some really creative results:
Crying loudly for 10 seconds
Screaming continuously
Coughing repeatedly
Singing a full melody
Perfectly imitating a celebrity
Producing exact sound effects
Use style instructions for performance direction, not full sound design.
Do not use vague creative instructions when you need precision
Vague:
Make it better
More human
Cinematic
Better
Tense voice
Soft voice
Energetic
Serious
6. When style instructions may not work perfectly
Results may vary depending on:
The selected voice
The length of the paragraph
The emotional clarity of the sentence
Whether the instruction matches the text
Whether too many unrelated tags are used together
Whether the instruction asks for something too physical or unrealistic
Good voice direction works best when the writing, prompt, and voice choice are aligned.
7. Accents and regional voice direction
Style instructions can also be used to guide accent, region, and speaking style.
This is useful when your content needs a voice that feels more local, more character-specific, or better matched to the audience. You can ask for broad accents, regional accents, or a softer hint of an accent.
Examples of accent prompt patterns:
Speak with a British accent
Speak with a soft British accent
Speak with a Yorkshire accent
Speak with a Southern English accent
Speak with a London accent
Speak with an Irish accent
Speak with a Scottish accent
Speak with a Welsh accent
Speak with an American accent
speak with a Southern American accent
Speak with a New York accent
Speak with a Californian accent
Speak with a neutral US accent
Speak with a Canadian accent
Speak with an Australian accent
Speak with a New Zealand accent
Speak with an Indian English accent
Speak with a neutral Indian English accent
Speak with a South African accent
Speak with a French accent
Speak with a Spanish accent
Speak with an Italian accent
Speak with a German accent
and more...
Accent prompting is experimental. Some voices will respond more strongly than others. Some voices may produce only a light regional flavor instead of a perfect accent. That is normal.
For best results:
Test the accent on a short paragraph first.
Try both broad and specific versions.
Combine the accent with the emotional style.
Avoid stacking too many accent and emotion instructions together.
Use English accent instructions even when the transcript is in another language.
8. Best practices
Use English tags, even for non-English transcripts.
Put inline tags after the first word, not at the start of the block.
Use inline tags for specific emotional moments.
Match the emotion to the actual writing.
Test short samples before generating long audio.
Try multiple versions of the same prompt.
Pick voices that naturally fit the use case.

