---
name: starpop-seedance-prompts
description: Write Seedance 2.0 video prompts for the Starpop app. Covers 13 UGC video categories with per-category guidance — handheld iPhone selfie, scripted 1-shot UGC with music, before/after, unboxing, product review, POV, street interview, cinematic short film, greenscreen with study, ASMR, reaction, podcast, and Pixar-animated. Use this skill whenever the user drops a product URL, asks to make UGC ads, write Seedance prompts, or mentions any of these categories. Starts with research mode by default (fetches the product page, extracts brand info, validates findings with the user), then asks which category combo they want, then writes prompts. Uses Starpop's @image[N] / @video[N] / @audio[N] reference syntax and the Add Actor workflow for human faces.
---

# Starpop × Seedance 2.0 — UGC Prompt Skill

Turn a product brief (or a URL) into production-ready Seedance 2.0 prompts for the Starpop app. Covers 13 UGC video categories with specific guidance for each — handheld iPhone selfie, scripted 1-shot UGC with music, before/after, unboxing, product review, POV, street interview, cinematic short film, greenscreen with study, ASMR, reaction, podcast, and Pixar animated.

This skill writes prompts only. The `starpop-workflow-agent` handles driving the Starpop UI (browser navigation, uploads, Generate clicks, share links). If the user asks to also run the ads, hand off to the workflow agent after prompt approval.

---

## When this skill fires

Trigger whenever the user:
- Drops a product URL and wants ads made
- Says "make me UGC ads for [product]"
- Asks to write a Seedance prompt
- Mentions any of the 13 video category names

If the user pastes only a URL with no explicit request, ask once: *"Want me to research this product and write some UGC prompts for it?"*

---

## Step 1 — Gather the brief

### Path A (default) — Research mode

When a URL is provided, always take this path.

1. **Fetch the URL.** Use `WebFetch` or a browser tool. Visit the specific product page, not just the homepage — cleaner extraction.
2. **Extract:** brand name, category, product(s), selling points, tone of voice from copy, target audience clues, price tier, visible discount codes.
3. **Make reasonable assumptions** for anything the site doesn't surface — default ad goal = Conversion, default platforms = TikTok + Reels, default tone = whatever the site conveys.
4. **Validate findings with the user FIRST** — do not jump to prompt writing. Post what you extracted vs what you assumed, clearly flagged with `[assumed]` next to anything you made up. Ask the user to confirm or correct each field. This is the single most important checkpoint for getting good prompts — bad brief = bad script.

Example validation output:

```
Here's what I got from your site:

Brand:             Zesty Paws
Category:          Pet / Wellness supplements
Hero product:      Calming Bites (selected from the page I visited)
Description:       Soft chew dog supplement for stress, anxiety, and hyperactivity
Selling points:    Vet-formulated, Suntheanine + chamomile, soft-chew format
Tone:              Warm, pet-parent-to-pet-parent, casual
Audience:          Dog owners of anxious dogs  [assumed from product positioning]
Price tier:        $$  [assumed]
Ad goal:           Conversion  [assumed]
Platforms:         TikTok + Instagram Reels  [assumed]
Discount code:     None found

Anything I got wrong or want to add? Confirm and I'll move to format selection.
```

Wait for the user's confirmation (or corrections) before moving to Step 2.

### Path B — Manual mode

When the user has no URL or prefers to answer directly, go question-by-question. Use `AskUserQuestion` for multiple-choice. One at a time.

- **Q1** Product name
- **Q2** Category (Supplement / Skincare / Fashion / Tech / Food / Fitness / Home / Pet / Other)
- **Q3** One-sentence description
- **Q4** Top 3–5 selling points
- **Q5** Price tier ($ / $$ / $$$)
- **Q6** Differentiator — what makes it different?
- **Q7** Target audience (demographic + interests)
- **Q8** Ad goal (Conversion default / Awareness / Consideration / Retargeting)
- **Q9** Platforms (TikTok / Reels / Shorts / Meta Feed / YouTube)
- **Q10** Tone (Casual default / Professional / Luxury / Educational / Edgy)
- **Q11** Discount code (optional)
- **Q12** Custom guardrails (phrases to use or avoid)

### Brand-less / testing fallback

If the user has no product at all and just wants to test the skill, ask them to describe the fictional or placeholder product in 2–3 sentences, then proceed.

---

## Step 2 — Pick categories + count + specs

After the brief is validated, ask the user which categories to write. Propose **3 recommended combos** up front, then list all 13 as individual options.

### Recommended combos

- **UGC Classic** — Handheld iPhone selfie + Scripted 1-shot UGC with music + Reaction video. The three workhorse formats for social-first ads.
- **Brand Story** — Before/After + Cinematic commercial + Product review. Transformation, prestige, credibility.
- **Scroll-stop** — Greenscreen with study + Reaction + Pixar animated. Pattern-interrupt formats that stop the feed.

### Full list (user can mix and match)

1. Handheld iPhone selfie
2. Scripted 1-shot UGC with music
3. Before / After
4. Unboxing / first-impression
5. Product review
6. POV / day-in-the-life
7. Street interview / vox pop
8. Cinematic commercial (short film)
9. Greenscreen with study
10. ASMR / sensorial product demo
11. Reaction video
12. Podcast video
13. Pixar animated

For each category picked, confirm aspect ratio, duration, and resolution — use the per-category defaults below unless the user overrides.

---

## Step 3 — Pre-prompt reasoning (run BEFORE writing)

You're the reasoning engine. The video model can render anything you describe — but it leaves room for interpretation on the parts you don't lock down. Your job is to **lock down what must be accurate** by passing the right combination of images and text, and to ask the user when the gap is bigger than that combination can close.

**The product is the thing that must be accurate.** Other elements in the scene can have interpretation room. The product cannot.

### A. Visual context — combine images + text for complete product understanding

Goal: zero room for interpretation about the product itself. Some products are simple — a blank piece of paper plus "white A4 printer paper" is enough. Others need multiple reference images because key details can't be captured in one frame: back of a bottle, the gummy or chew texture, sleeve or hood detail on a clothing piece, an open product + applicator, a hidden mechanism.

Reason about the actual product and what the camera will see in the planned shot. Which combination of images + text gives the model complete context? If one image plus a short description doesn't cover it, request additional images BEFORE writing. Don't impose category-based rules — reason from the actual product.

> Example: *"For the application shot I'll need a second image of the open balm so the model gets the applicator shape right — can you upload one?"*

### B. Interaction mechanics — describe motion to remove guesswork

The video model can reason about how products work, but inconsistently. It's like asking it to guess what's in a sealed box: infinite tries gets there eventually, but our job is to be right the first time.

If the actor interacts with the product, describe the exact mechanical motion in the prompt: lipbalm twist + glide, pump press + catch, pull-tab activation, tube squeeze direction, hidden-feature reveal. If the interaction is mechanically complex enough that text alone won't lock the motion in, request a video reference (`@video[1]`) for motion ground truth.

### C. Camera logistics — pick ONE setup and describe it

"iPhone selfie style" alone is ambiguous. Spell out the camera setup concretely:

- Selfie with arm extended → actor films herself with the front camera
- Mirror selfie → actor films her reflection with the rear camera, talks to the mirror
- Phone propped on a surface → camera fixed, both hands free
- Friend filming off-camera → handheld at chest height
- Tripod / locked-off → camera fixed

One setup, described concretely. Same principle: zero guesswork.

### D. Hand math — foresee interaction issues, don't impose rules

Humans have two hands. If the actor is holding the phone, ONE is free. This isn't a rule ("always start the can open"); it's a check to spot interaction issues that arise from uncertainty.

Example: a canned drink opening on screen is great — perfect for ASMR or specific creative beats. The problem only emerges if the actor needs to open AND drink in 2 seconds while holding a phone, because the model has to render 1-handed can-opening, which is hard to get right. Solutions vary by case: start the can already open, switch the camera setup so both hands are free (mirror, tripod, friend filming), pass a video reference of 1-handed opening, or put the product on a surface and free up both hands.

The agent's job is to **spot the potential issue** and either give the model enough context OR ask the user.

### E. Ask the user when ambiguity would affect the output

Baseline behavior, not a fallback. If a single clarifying question would meaningfully resolve uncertainty in any of the above (product accuracy, interaction mechanics, camera setup, hand math) — ask. Don't guess. The agent has the doorway to human input — use it.

Good asks are specific and easy to answer:

- *"This ad has the actor applying the balm — got an image of it open or mid-application?"*
- *"For the bathroom shot, is she filming her reflection or her front camera?"*
- *"The can-open + sip is two motions in 6 seconds — want to start with it already open?"*

One question, get the answer, continue.

---

## Step 4 — Write the prompts

Apply **all** General Rules below, then lean into the category-specific advice for each variant. Every prompt must follow every rule in the General Rules section — they're universal.

---

# GENERAL RULES

## 1. Open with a one-line format descriptor

Every prompt MUST start with a single sentence declaring the video type + subject + context. The model anchors on the first line — tell it what kind of video up front.

Good openings:
- *"Authentic handheld iPhone selfie UGC video promoting a dog supplement."*
- *"Cinematic ultra-realistic short film for a skincare brand, narrative-led."*
- *"Pixar-style 3D-animated product video for a protein powder, exaggerated emotional dog characters."*

Bad: no descriptor at all, or vague ("a nice product video").

## 2. Always include "No captions."

On its own line, near the top of the prompt. Every prompt. Every category. Seedance will sometimes generate overlaid captions by default if not told otherwise — and they look awful.

## 3. Starpop reference syntax

Assets are referenced with `@image[N]`, `@video[N]`, `@audio[N]` — each type has its own 1-indexed counter.

| Asset | Syntax | Limits |
|---|---|---|
| Reference images (product, scene, mood, study, chart) | `@image[1]`, `@image[2]`, … | Up to 9 |
| Reference videos | `@video[1]`, `@video[2]`, `@video[3]` | Up to 3 total, 15s combined |
| Reference audios | `@audio[1]`, `@audio[2]`, `@audio[3]` | Up to 3 total, 15s combined |

First reference image = `@image[1]`. First audio = `@audio[1]`. They don't share counters.

**At the top of every prompt, note what each reference is**, e.g.:
```
reference @image[1] for the product
reference @image[2] for the lifestyle setting
reference @audio[1] for the voice tone
```

## 4. Actors are OPTIONAL — don't assume one is provided, and max ONE actor per prompt

An "actor" in Starpop = a specific human face image the user uploads via the **Add Actor** button. Add Actor is a convenience upload path with backend face-processing for Seedance compliance. Once uploaded, the image lives in the same `@image[N]` array as any other reference image — it gets its own index based on upload order.

**Default assumption: no actor is provided.** Seedance generates humans from the prompt's description just fine for most UGC. Describe the subject specifically (age, features, clothing, expression, setting) and let the model render them. Do NOT require an actor image in every prompt.

**Hard limit: maximum ONE actor per prompt.** Even for formats with multiple humans (podcast, street interview, reaction duets), only ONE human can have a locked face via Add Actor. The second human is described in the prompt and generated by Seedance. Pick the most important human for the actor slot (usually the one holding / using the product).

**Only suggest adding an actor when it's genuinely useful:**
- User wants face consistency across multiple generations (a recurring brand creator)
- User has a specific real influencer / model they want featured
- The brand has an established actor identity

If the orchestrator (`starpop-workflow-agent`) confirms the user provided an actor image, reference it by its `@image[N]` index in the prompt (e.g., `@image[2]` if the product is `@image[1]`). Otherwise, describe the subject in natural language and don't reference an actor image at all.

**If this skill is used standalone (no orchestrator), ask once during the brief:** *"Do you have a specific actor image you want featured, or should Seedance generate the subject from the description?"* Default to "generate from description" unless the user provides one.

## 5. Audio Direction

Seedance 2.0 generates audio natively. Every prompt MUST include explicit audio direction. Default voice + room tone + speech pattern for the scene.

**Voice — match to demographic.** Examples:
- *"Warm female voice, mid-20s, casual, talking to a friend"*
- *"Deep male voice, 40s, genuine dad energy, not a narrator"*
- *"Light female voice, teens, curious, slight giggle"*

**Room tone — must match the setting:**
- Bathroom → slight reverb from tiled walls
- Bedroom → soft close acoustics, carpeted, minimal echo
- Kitchen → open space feel, subtle ambient sounds (fridge hum, distant street)
- Car → muffled close acoustics
- Outdoors → natural ambience, slight wind or traffic
- Living room → warm room tone, furnished space
- Studio / podcast → close-mic'd, subtle studio dampening

**Speech pattern:** natural delivery with pauses, filler words, contractions. NOT scripted. The model handles this better when you tell it explicitly: *"delivery is conversational, not rehearsed, with small natural pauses."*

## 6. Dialogue Rules

Dialogue must sound REAL, not scripted.

- Use contractions: *"I've been"*, *"it's literally"*, *"you're gonna"*, *"I don't"*
- Include filler words: *"like"*, *"honestly"*, *"so basically"*, *"okay"*
- Casual grammar — fragments and run-ons are fine
- Sound genuinely excited or skeptical, not rehearsed

**Good:** *"Okay so I've been using this for like two weeks and honestly? It actually works."*
**Bad:** *"This revolutionary product has transformed my routine completely."*

If the user supplies a specific script, use it verbatim. Don't "improve" it — user's voice is more authentic than any rewrite.

## 7. Avoid complex vocabulary — Seedance struggles with pronunciation

Seedance 2.0 is a ByteDance (Chinese) model. It struggles with words that are long, Latin-derived, or rare in spoken English. Audio will glitch, slur, or completely mispronounce them.

**Common offenders — always rephrase:**

| Avoid | Say instead |
|---|---|
| Ashwagandha | stress-support blend, adaptogen, calming herb |
| L-theanine | amino acid, calming compound |
| Bacopa monnieri | memory-support herb |
| Phosphatidylserine | brain-support nutrient |
| Acetyl-L-carnitine | energy nutrient |
| Glucosamine | joint-support nutrient |
| Probiotics (technical strain names) | good gut bacteria |
| Complex Latin or scientific product names | a plain-English equivalent |

**Rule of thumb:** if you'd stumble reading it aloud, Seedance will too. Rewrite for natural spoken English. Let the ingredient list live on the packaging, not in the audio track.

## 8. Visual-first storytelling — describe actions, not emotions

Seedance does NOT reliably render internal emotional states. If you write *"the dog is trembling in fear"*, the model underdoes it — subtle cues get lost every time. Every emotion or behavior must be translated into an **exaggerated, physically explicit action** the model can actually animate.

**Rule:** if you can't see it from 6 feet away on a phone screen, the model won't render it. Describe what an outside observer would literally see, not what the character feels.

| ❌ Don't write (internal / subtle) | ✅ Write instead (external / exaggerated) |
|---|---|
| *Trembling in fear* | *Jumping up at every thunderclap, darting under the table, ears flat against skull* |
| *Anxious* | *Pacing in tight circles, looking at the door every few seconds, whining* |
| *Calm* | *Eyes half-closed, slow deep breath, body fully stretched out on the rug* |
| *Happy* | *Tail thumping hard against the floor, front paws tapping, broad open-mouth pant* |
| *Sad / defeated* | *Head lowered below shoulders, tail tucked, slow heavy slump onto the rug* |
| *Surprised* | *Eyes wide, hand flying to mouth, takes a full step back* |
| *Confident* | *Shoulders back, direct eye contact, slow measured walk* |
| *Disappointed* | *Shoulders drop visibly, slow exhale, looks down at the floor* |
| *Excited* | *Laughs mid-sentence, slaps the table, leans way forward into the shot* |
| *Focused* | *Leans in, narrows eyes, both hands flat on the table* |
| *Relieved* | *Big exhale with shoulders dropping, small smile spreads slowly, unclenches hands* |

**Corollary — exaggeration beats subtlety.** The model consistently underdoes actions. If you want a soft reaction, describe a medium one. If you want a medium reaction, describe a big one. Dial it up one notch past what you actually want on-screen.

**Pair this with the separation rule:** if the action involves the camera reacting to the subject, write them as two distinct instructions.

## 9. Positive-only prompting — no negatives

Seedance 2.0 does not reliably honor negative instructions ("avoid jitter", "no warping", "no deformation"). State what you WANT, not what you don't want.

- ❌ *"Avoid jitter."* / *"No warping on the product."*
- ✅ *"Stable picture. Natural smooth movements."* / *"Sharp logo, clean surface, correct proportions."*

When a prompt pattern uses negatives, rewrite to positives before using.

## 10. Script Pacing & Timeline Rule (scripted dialogue only)

Dialogue STARTS at 00:01 and ENDS 2 seconds before the video ends. Silent open, dialogue window, silent close.

| Duration | Dialogue window | Max words |
|---|---|---|
| 5s | 00:01–00:03 | ~5 |
| 7s | 00:01–00:05 | ~10 |
| 10s | 00:01–00:08 | ~18 |
| 12s | 00:01–00:10 | ~23 |
| 15s | 00:01–00:13 | ~30 |

Longer dialogue than max → rushed delivery, muddled lip-sync. Cut it.

Multi-scene scripted formats (Scripted 1-shot UGC, multi-scene selfie) don't use this rule per-scene — they cut between scenes so each segment carries one short line.

## 11. Separation Rule — subject motion ≠ camera motion

Describe them as two separate instructions. Mixing them blends both into confused output.

- ❌ *"Spinning camera around a dancing person"*
- ✅ *"The dancer spins slowly. Camera holds fixed framing."*

## 12. Dangerous Keywords — never use alone

| ❌ Don't write | ✅ Write instead |
|---|---|
| `fast` alone | Only ONE element fast — e.g., *"Fast subject movement, slow camera"* |
| `cinematic` alone | *"35mm film tone, warm shadows"* or specific director reference |
| `epic` / `beautiful` / `amazing` | Describe the actual visual — lighting, composition, colors |
| `lots of movement` | One specific motion + speed modifier |
| `multiple angles` | One tracking shot; use timestamped beats for angle changes |
| `realistic` alone | *"Photorealistic, natural skin texture, practical lighting"* |
| `authentic UGC energy` alone | Specific camera + lighting + setting description |
| `high quality` alone | *"4K, sharp clarity, rich details"* |

## 13. Quality Suffix — final line of every prompt

> 4K, Ultra HD, rich details, sharp clarity, cinematic texture, natural colors, soft lighting, stable picture.

Positive only. Append as the last line of every prompt.

## 14. Per-category defaults

When the user doesn't specify aspect / duration / resolution, use these defaults. Always confirm with the user before generating.

| # | Category | Aspect | Duration | Resolution |
|---|---|---|---|---|
| 1 | Handheld iPhone selfie | 9:16 | 8–15s | 720p |
| 2 | Scripted 1-shot UGC with music | 9:16 | 10–15s | 720p |
| 3 | Before / After | 9:16 | 8–15s | 720p |
| 4 | Unboxing | 9:16 | 10–15s | 720p |
| 5 | Product review | 9:16 or 1:1 | 10–15s | 720p |
| 6 | POV / day-in-the-life | 9:16 | 10–15s | 720p |
| 7 | Street interview | 9:16 | 8–12s | 720p |
| 8 | Cinematic commercial | 16:9 or 9:16 | 10–15s | 720p |
| 9 | Greenscreen with study | 9:16 | 8–12s | 720p |
| 10 | ASMR | 9:16 | 6–10s | 720p |
| 11 | Reaction | 9:16 | 6–10s | 720p |
| 12 | Podcast video | 16:9 | 10–15s | 720p |
| 13 | Pixar animated | 9:16 | 10–15s | 720p |

Language default: **English only** unless the user specifies otherwise.

---

# VIDEO CATEGORIES

Each category below gives you: the 1-sentence visual anchor, the fundamental that makes the format work, and specific direction to bake into your prompt.

## 1. Handheld iPhone selfie (talking head UGC)

**Visual:** Subject holding their phone selfie-style, talking directly to camera in an everyday setting like a kitchen, bathroom, or bedroom.

**Fundamental:** Authenticity. Every detail should feel unstaged — clothing, lighting, environment, speech.

**Direction:** Describe the camera specifically — *"shot on smartphone front camera, slight handheld movement, close-up mirror perspective"* or *"selfie-arm's-length framing, medium-close"*. Don't write "iPhone selfie-style" alone — too vague. Anchor the lighting to a real-world source: morning window light, late-afternoon kitchen glow, practical desk lamp only, bathroom overhead. Describe the subject with specific details: age, expression, outfit, setting. Openers should never sound like an ad — lean on insider-knowledge (*"Nobody tells you this about…"*), pain-point (*"If you're X and dealing with Y…"*), or calm observation (*"No one really talks about…"*). Let the delivery breathe with contractions and filler words. Avoid polish — a slight handheld wobble reads more real than a perfectly stable shot.

## 2. Scripted 1-shot UGC with music

**Visual:** Multi-scene high-energy promo with a subject on camera, quick cuts between product and lifestyle moments, upbeat music driving the pace.

**Fundamental:** Energy. Every beat has to carry forward momentum — no dead air, no downbeat tones.

**Direction:** Structure is 4 beats. Each beat = a short scene direction (visible action) + one spoken line with real emotional energy. Mark the energy with parenthetical tags — *(laughing)*, *(excited)*, *(amazed)*. Emojis belong IN the dialogue, not in scene directions — they cue vocal lift. Product shows up in scene 1 or 2 (opening moment), a lifestyle or result flash happens in scene 3 (proof), and the final scene combines product + emotional payoff (smile, hug, dog licking face). Openers that work: *"Okay so..."*, *"Okay listen —"*, *"Not gonna lie…"*, *"Everyone keeps asking me…"*. Tone stays upbeat across all 4 beats — if any beat feels somber, rewrite it.

## 3. Before / After

**Visual:** Visual contrast of a subject, space, body, or pet in a worse state, then a cut to the improved state after the product was used.

**Fundamental:** The gap between the two states has to be visible on screen, not just claimed in dialogue.

**Direction:** Plan the "before" with specific problem signals — tired eyes, cluttered surface, itchy dog, dull skin, stressed posture, bloated belly. Plan the "after" with their direct opposites — bright eyes, clean calm surface, relaxed dog, glowing skin, confident posture. The product appears in a bridge moment between the two — use it (apply, pour, give) rather than just show it on a shelf. Give the "before" a bit longer than feels comfortable — the viewer needs to feel the problem for the transformation to land. Lighting can shift between states to amplify contrast (cooler/dimmer in the before, warmer/brighter in the after). Keep dialogue minimal; visual payoff is the whole point.

## 4. Unboxing / first-impression

**Visual:** Close-up of hands opening a package on a clean surface, slow product reveal, subject reacting with genuine surprise.

**Fundamental:** Tactile specificity. Hands, textures, and small sounds carry the format.

**Direction:** Describe the hands explicitly — *"manicured fingernails carefully lifting the lid"*, *"fingertips running along the embossed logo"*, *"pulling out the inner sleeve"*. The subject's face should be visible at the reveal moment for emotional payoff. Don't over-direct the reaction — *"genuine surprise, small smile"* reads more real than *"shocked and amazed"*. Sound design is huge here: tear of tape, rustle of tissue paper, soft thunk of the product hitting the surface. Keep the space clean — a staged but lived-in kitchen counter or bedroom desk. Clutter dilutes focus. The product gets the hero moment once; before that, it's about anticipation.

## 5. Product review (Amazon-review style)

**Visual:** Subject seated at a desk or table with the product in frame, speaking candidly to camera about what works and what doesn't.

**Fundamental:** Honest tone is the differentiator. Reviews die the moment they sound like ads.

**Direction:** Include at least one qualified caveat in the dialogue — *"the packaging is a little annoying but…"*, *"it's not magic, but…"*, *"I was skeptical because…"*. Specifics land: use real durations (*"about three weeks"*), comparisons (*"compared to the X I was using before"*), and honest limits. Frame the subject slightly off-center with the product visible beside them — never held up to the lens like a commercial. Natural desk lighting, a real desk with real stuff on it (coffee cup, notebook, a plant), not a staged set. Voice should be measured and conversational, not enthusiastic.

## 6. POV / day-in-the-life

**Visual:** First-person perspective footage shot through the user's eyes as they move through a real moment — making coffee, getting ready, walking the dog — with the product woven in.

**Fundamental:** Nothing can feel staged. The viewer must feel like they're watching a real morning/afternoon/evening.

**Direction:** Describe the hands (what they're holding, what they're doing), the body movement (leaning, reaching, walking), and the environmental sounds. The product appears mid-sequence — never front-loaded. It's just part of the routine. Example flow: hand reaches for kettle → pours coffee → grabs product off counter → takes dose → continues getting ready. Keep dialogue absent or minimal — one short voiceover line tucked at the end can work (*"and this is why I actually stuck with it"*). Lighting is always practical and real (morning light through blinds, bathroom overhead).

## 7. Street interview / vox pop

**Visual:** Handheld camera approaches a passerby on a city street, microphone visible in frame, rapid-fire question-and-answer about a product or topic.

**Fundamental:** The microphone must be visible — that's what cues the format to the viewer.

**Direction:** Interviewer is off-camera (you might see their hand holding the mic or their shoulder). Subject responds in medium-close shot, in an outdoor urban environment — describe the street specifically (*"bright afternoon on a busy downtown sidewalk"*, *"quiet tree-lined neighborhood block"*). Urban ambient sound is essential: traffic, footsteps, distant voices, occasional honk. Subject delivery should be totally unrehearsed — *"Uh, yeah I've actually tried that,"* *"Honestly? I don't know."* The pause between question and answer carries realism — don't cut it out. If the user wants multiple subjects in sequence, write each as a separate short clip with a different demographic.

## 8. Cinematic commercial (short film)

**Visual:** Film-quality footage with a character, story beat, and emotional arc — product embedded as a narrative object, not a hero shot.

**Fundamental:** Character arc. The subject must feel, want, or learn something during the clip. Without that, you have pretty B-roll, not a film.

**Direction:** Give the character an emotional signature in the opening — *"quiet confidence"*, *"exhausted but hopeful"*, *"giddy anticipation"*. Anchor the look to specific camera specs: *"Shot on Arri Alexa Mini LF, 32mm wide → 85mm close, shallow DOF, anamorphic falloff, rich natural grain, deep blacks, high contrast, no digital smoothing."* Then describe the "color world" in 2–3 sentences — setting + palette + lighting quality + atmospheric detail. Break the runtime into 3–5 beats, each with a speed rating (0.3x / 0.5x / 1x / 1.5x) and an explicit sound direction. Product is WORN, USED, REVEALED, GIVEN, or PIVOTED ON — never "sitting on marble" as a beauty shot. Minimize dialogue; let visuals + sound carry the story. One short line at a pivot moment is plenty.

## 9. Greenscreen with study / screenshot

**Visual:** Subject in the foreground with a study, chart, or article screenshot filling the entire background, pointing at the reference while speaking.

**Fundamental:** The study screenshot is the hook. It has to look real — real journal, real title, real finding.

**Direction:** Subject is framed like a smartphone selfie but slightly wider than usual so the background is legible behind them. The pointing gesture at the background MUST be written into the Actions block — without it, the model may not register that the background is the reference. Dialogue cites what's actually visible in the screenshot — don't exaggerate beyond the finding. Fast-paced TikTok delivery works; slow delivery feels like a lecture. The study image is referenced as `@image[2]` — the product is `@image[1]`. *(Note to workflow — finding and screenshotting the study is the workflow agent's responsibility, not this skill's.)*

## 10. ASMR / sensorial product demo

**Visual:** Extreme close-ups with enhanced ambient sound — fingers peeling a label, product scooping, soft tapping, lip-smacking — minimal or no speaking.

**Fundamental:** Sound design is 70% of the format. Describe every sound specifically.

**Direction:** Write the sound palette with detail — *"crinkle of plastic wrap, soft pop of the lid twisting off, slow scrape of the scoop through powder, light tap of the glass bottle on a marble counter, a single breathy inhale before the scoop"*. Visually, frame tight on fingers, textures, and surfaces. Lighting is soft, warm, single-source (window or small practical). Every action is slow and deliberate — no fast movement. Dialogue is either absent or a single breathy line at the end. Don't describe what the product does; describe what it sounds and feels like being used. Background is always clean and minimal — marble, wood, linen — no visual clutter.

## 11. Reaction video

**Visual:** Subject in tight frame reacting visibly (widened eyes, laughter, surprise) to tasting, using, or reading something — honest unfiltered response.

**Fundamental:** The build-up before the reaction. The viewer needs to see the neutral state first.

**Direction:** Open with the subject neutral or skeptical — *"okay, let's see"*, *"alright, I'll try it"*. Then cut or transition to the reaction moment. The reaction itself should include specific physical tells — eyebrows up, hand covers mouth, head tilts, slow nod, laugh breaks through. Don't over-script the reaction words — a short unfiltered line is enough (*"oh wait, that's actually good"*, *"no way"*, *"shut up"*). For product reactions, show the act of tasting / applying / testing BEFORE the face — don't cut straight to "surprised person." Lighting and framing are casual — home setting, window light, no studio feel.

## 12. Podcast video

**Visual:** Two people seated at a table with podcast microphones and a minimal backdrop, shot from a three-quarter angle, natural conversation between host and guest about the product.

**Fundamental:** It should feel like a clipped slice from a longer episode — mid-conversation, not opening line.

**Direction:** Subject 1 (host) poses a question or observation; Subject 2 (guest) responds with specifics about the product experience — never a sales pitch, always a lived response. Both should be in frame at once (two-shot) or alternate between close-ups at natural cut points. Set is minimal — mics on stands, maybe coffee cups, plain or slightly textured backdrop (acoustic panels, plants, bookshelf). Lighting is warm and soft with key light on each face. Audio direction matters a lot here: *"warm close-mic'd voice with subtle studio dampening"* — not "podcast sound." Delivery is measured and relaxed, not energetic.

## 13. Pixar animated

**Visual:** 3D-animated scene with exaggerated cute characters — talking dog, anthropomorphic bottle, worried pair of running shoes — expressing big emotions with soft lighting and expressive physics.

**Fundamental:** Emotional exaggeration. The whole game is making the character feel big, cute, and slightly naive.

**Direction:** Give the main character a clear feeling in the opening — worried, proud, embarrassed, hopeful, mischievous. Let their face, body, and physics amplify it: wobble, droop, bounce, sparkle, ears perking, eyebrows lifting, tail thumping. Objects CAN talk, pets CAN speak, gravity CAN bend — anything is on the table if the story calls for it. Lean into micro-expressions. The product is either the character itself (a bottle with a face) OR the thing that solves the character's problem in a cute satisfying visual moment. Dialogue should be short, warm, and a little naive — the way a Pixar character talks. Reference aesthetics: *"Pixar Luxo/Piper/Lava short-film feel, soft pastel lighting, expressive SSS skin, buttery animation, warm color palette."* Audio direction: *"warm friendly voice, childlike energy, with soft orchestral ambient score."*

---

## Step 5 — Present for approval

Present ALL prompts together in one message. Use this exact header format for each prompt:

```
Prompt [N] — [Category]
Duration: [X]s
Aspect Ratio: [X:X]

Assets:
- Product ([product name]) = @image[1]
- Actor ([who]) = @image[2] (via Add Actor)        [omit if no actor]
- Scene / mood ([what]) = @image[3]                [omit if not used]
- Voice reference = @audio[1]                      [omit if not used]
```

Then the full prompt text immediately below the header — every word, no truncation.

Only list assets actually used in THAT prompt. Indices depend on upload order in Starpop; flag the expected indices so the user knows which slot gets which image when they upload.

After all prompts are listed, ask: *"Any tweaks? I won't hand off until you confirm."*

Wait for explicit "yes" / "approved" / "go." Tweaks → revise → re-present all.

Ask: *"Here are [N] prompts. Any tweaks? I won't hand off until you confirm."*

Wait for explicit "yes" / "approved" / "go." Tweaks → revise → re-present all.

If the user also wants to RUN the prompts in Starpop, hand off to the `starpop-workflow-agent` skill with the approved plan.

---

## Troubleshooting

| Symptom | Fix |
|---|---|
| Dialogue rushed / bad lip-sync | Prompt exceeds max words for duration. Trim to the Pacing Rule. |
| Character drifts across beats | Reference `@image[N]` consistently + identical subject noun in every beat. |
| Pronunciation glitches on an ingredient or technical word | Rewrite per the "avoid complex vocabulary" table. |
| Product looks off (logo warped, wrong proportions) | Strengthen positives: *"Sharp logo, clean surface, correct proportions, exact label match to @image[1]."* |
| Handheld selfie feels overproduced | Cut abstract style words. Add specific camera (*"smartphone front camera, slight handheld"*) + lighting (*"soft window light, real skin texture"*). |
| 1-shot UGC feels flat | Emotional tag weak. Escalate: `(laughing)` → `(laughing, energy cranked)`. Add exclamations + emojis. |
| Cinematic feels like B-roll | Missing character arc. Rewrite opening beat to establish what the subject feels / wants. Embed product as narrative object. |
| Subtle emotion / behavior didn't render | Model underdid it. Rewrite with exaggerated physical action per the Visual-first rule — e.g., *"trembling"* → *"darting under the table, ears flat, jumping up at every thunderclap"*. |
| Greenscreen study looks fake | Verify you're using a real peer-reviewed screenshot as `@image[2]`, not a generated or stock image. |
| Pixar looks stiff | Add specific reference — *"Pixar Luxo/Piper-style animation, buttery framerate, expressive micro-movements."* |
| Podcast looks like a set | Describe the set as minimal + lived-in, not polished — *"acoustic panels and a leaning bookshelf behind them, two coffee cups on the table."* |

---

## Credit

Adapted from Sirio Berati's public [Seedance 2.0 Prompt Guide](https://seedance-prompt-guide.sirioberati.com). Category guidance refined through David's (Starpop) working references and testing.
