---
name: starpop-seedance-prompts
description: Write Seedance 2.0 video prompts for the Starpop app. Covers 13 UGC video categories with per-category guidance — handheld iPhone selfie, scripted 1-shot UGC with music, before/after, unboxing, product review, POV, street interview, cinematic short film, greenscreen with study, ASMR, reaction, podcast, and Pixar-animated. Use this skill whenever the user drops a product URL, asks to make UGC ads, write Seedance prompts, or mentions any of these categories. Starts with research mode by default (fetches the product page, extracts brand info, validates findings with the user), then asks which category combo they want, then writes prompts. Uses Starpop's @image[N] / @video[N] / @audio[N] reference syntax and the Add Actor workflow for human faces.
---

# Starpop × Seedance 2.0 — UGC Prompt Skill

Turn a product brief (or a URL) into production-ready Seedance 2.0 prompts for the Starpop app. Covers 13 UGC video categories with specific guidance for each — handheld iPhone selfie, scripted 1-shot UGC with music, before/after, unboxing, product review, POV, street interview, cinematic short film, greenscreen with study, ASMR, reaction, podcast, and Pixar animated.

This skill writes prompts only. The `starpop-workflow-agent` handles driving the Starpop UI (browser navigation, uploads, Generate clicks, share links). If the user asks to also run the ads, hand off to the workflow agent after prompt approval.

---

## When this skill fires

Trigger whenever the user:
- Drops a product URL and wants ads made
- Says "make me UGC ads for [product]"
- Asks to write a Seedance prompt
- Mentions any of the 13 video category names

If the user pastes only a URL with no explicit request, ask once: *"Want me to research this product and write some UGC prompts for it?"*

---

## Step 1 — Gather the brief

### Path A (default) — Research mode

When a URL is provided, always take this path.

1. **Fetch the URL.** Use `WebFetch` or a browser tool. Visit the specific product page, not just the homepage — cleaner extraction.
2. **Extract:** brand name, category, product(s), selling points, tone of voice from copy, target audience clues, price tier, visible discount codes.
3. **Make reasonable assumptions** for anything the site doesn't surface — default ad goal = Conversion, default platforms = TikTok + Reels, default tone = whatever the site conveys.
4. **Validate findings with the user FIRST** — do not jump to prompt writing. Post what you extracted vs what you assumed, clearly flagged with `[assumed]` next to anything you made up. Ask the user to confirm or correct each field. This is the single most important checkpoint for getting good prompts — bad brief = bad script.

Example validation output:

```
Here's what I got from your site:

Brand:             Zesty Paws
Category:          Pet / Wellness supplements
Hero product:      Calming Bites (selected from the page I visited)
Description:       Soft chew dog supplement for stress, anxiety, and hyperactivity
Selling points:    Vet-formulated, Suntheanine + chamomile, soft-chew format
Tone:              Warm, pet-parent-to-pet-parent, casual
Audience:          Dog owners of anxious dogs  [assumed from product positioning]
Price tier:        $$  [assumed]
Ad goal:           Conversion  [assumed]
Platforms:         TikTok + Instagram Reels  [assumed]
Discount code:     None found

Anything I got wrong or want to add? Confirm and I'll move to format selection.
```

Wait for the user's confirmation (or corrections) before moving to Step 2.

### Path B — Manual mode

When the user has no URL or prefers to answer directly, go question-by-question. Use `AskUserQuestion` for multiple-choice. One at a time.

- **Q1** Product name
- **Q2** Category (Supplement / Skincare / Fashion / Tech / Food / Fitness / Home / Pet / Other)
- **Q3** One-sentence description
- **Q4** Top 3–5 selling points
- **Q5** Price tier ($ / $$ / $$$)
- **Q6** Differentiator — what makes it different?
- **Q7** Target audience (demographic + interests)
- **Q8** Ad goal (Conversion default / Awareness / Consideration / Retargeting)
- **Q9** Platforms (TikTok / Reels / Shorts / Meta Feed / YouTube)
- **Q10** Tone (Casual default / Professional / Luxury / Educational / Edgy)
- **Q11** Discount code (optional)
- **Q12** Custom guardrails (phrases to use or avoid)

### Brand-less / testing fallback

If the user has no product at all and just wants to test the skill, ask them to describe the fictional or placeholder product in 2–3 sentences, then proceed.

---

## Step 2 — Pick categories + count + specs

After the brief is validated, ask the user which categories to write. Propose **3 recommended combos** up front, then list all 13 as individual options.

### Recommended combos

- **UGC Classic** — Show & Tell + Scripted 1-shot UGC with music + Duo Dynamic UGC. The three workhorse formats for social-first ads.
- **Brand Story** — Before/After + Cinematic commercial + Showcase & How-To. Transformation, polish, mastery.
- **Scroll-stop** — Greenscreen with study + Pixar animated + Gift Reaction. Pattern-interrupt formats that stop the feed.

### Full list (user can mix and match)

1. Show & Tell
2. Scripted 1-shot UGC with music
3. Before / After
4. Unboxing / first-impression
5. Showcase & How-To
6. POV / day-in-the-life
7. Street interview / vox pop
8. Cinematic commercial (short film)
9. Greenscreen with study
10. ASMR / sensorial product demo
11. Gift Reaction
12. Podcast video
13. Pixar animated
14. Duo Dynamic UGC

For each category picked, confirm aspect ratio, duration, and resolution — use the per-category defaults below unless the user overrides.

---

## Step 3 — Pre-prompt reasoning (run BEFORE writing)

You're the reasoning engine. The video model can render anything you describe — but it leaves room for interpretation on the parts you don't lock down. Your job is to **lock down what must be accurate** by passing the right combination of images and text, and to ask the user when the gap is bigger than that combination can close.

**The product is the thing that must be accurate.** Other elements in the scene can have interpretation room. The product cannot.

### A. Visual context — combine images + text for complete product understanding

Goal: zero room for interpretation about the product itself. Some products are simple — a blank piece of paper plus "white A4 printer paper" is enough. Others need multiple reference images because key details can't be captured in one frame: back of a bottle, the gummy or chew texture, sleeve or hood detail on a clothing piece, an open product + applicator, a hidden mechanism.

Reason about the actual product and what the camera will see in the planned shot. Which combination of images + text gives the model complete context? If one image plus a short description doesn't cover it, request additional images BEFORE writing. Don't impose category-based rules — reason from the actual product.

> Example: *"For the application shot I'll need a second image of the open balm so the model gets the applicator shape right — can you upload one?"*

### B. Interaction mechanics — describe motion to remove guesswork

The video model can reason about how products work, but inconsistently. It's like asking it to guess what's in a sealed box: infinite tries gets there eventually, but our job is to be right the first time.

If the actor interacts with the product, describe the exact mechanical motion in the prompt: lipbalm twist + glide, pump press + catch, pull-tab activation, tube squeeze direction, hidden-feature reveal. If the interaction is mechanically complex enough that text alone won't lock the motion in, request a video reference (`@video[1]`) for motion ground truth.

### C. Camera logistics — pick ONE setup and describe it

"iPhone selfie style" alone is ambiguous. Spell out the camera setup concretely:

- Selfie with arm extended → actor films herself with the front camera
- Mirror selfie → actor films her reflection with the rear camera, talks to the mirror
- Phone propped on a surface → camera fixed, both hands free
- Friend filming off-camera → handheld at chest height
- Tripod / locked-off → camera fixed

One setup, described concretely. Same principle: zero guesswork.

### D. Hand math — foresee interaction issues, don't impose rules

Humans have two hands. If the actor is holding the phone, ONE is free. This isn't a rule ("always start the can open"); it's a check to spot interaction issues that arise from uncertainty.

Example: a canned drink opening on screen is great — perfect for ASMR or specific creative beats. The problem only emerges if the actor needs to open AND drink in 2 seconds while holding a phone, because the model has to render 1-handed can-opening, which is hard to get right. Solutions vary by case: start the can already open, switch the camera setup so both hands are free (mirror, tripod, friend filming), pass a video reference of 1-handed opening, or put the product on a surface and free up both hands.

The agent's job is to **spot the potential issue** and either give the model enough context OR ask the user.

### E. Ask the user when ambiguity would affect the output

Baseline behavior, not a fallback. If a single clarifying question would meaningfully resolve uncertainty in any of the above (product accuracy, interaction mechanics, camera setup, hand math) — ask. Don't guess. The agent has the doorway to human input — use it.

Good asks are specific and easy to answer:

- *"This ad has the actor applying the balm — got an image of it open or mid-application?"*
- *"For the bathroom shot, is she filming her reflection or her front camera?"*
- *"The can-open + sip is two motions in 6 seconds — want to start with it already open?"*

One question, get the answer, continue.

---

## Step 4 — Write the prompts

Apply **all** General Rules below, then lean into the category-specific advice for each variant. Every prompt must follow every rule in the General Rules section — they're universal.

---

# GENERAL RULES

## 1. Open with a one-line format descriptor

Every prompt MUST start with a single sentence declaring the video type + subject + context. The model anchors on the first line — tell it what kind of video up front.

Good openings:
- *"Authentic handheld iPhone selfie UGC video promoting a dog supplement."*
- *"Cinematic ultra-realistic short film for a skincare brand, narrative-led."*
- *"Pixar-style 3D-animated product video for a protein powder, exaggerated emotional dog characters."*

Bad: no descriptor at all, or vague ("a nice product video").

## 2. Always include "No captions."

On its own line, near the top of the prompt. Every prompt. Every category. Seedance will sometimes generate overlaid captions by default if not told otherwise — and they look awful.

## 3. Starpop reference syntax

Assets are referenced with `@image[N]`, `@video[N]`, `@audio[N]` — each type has its own 1-indexed counter.

| Asset | Syntax | Limits |
|---|---|---|
| Reference images (product, scene, mood, study, chart) | `@image[1]`, `@image[2]`, … | Up to 9 |
| Reference videos | `@video[1]`, `@video[2]`, `@video[3]` | Up to 3 total, 15s combined |
| Reference audios | `@audio[1]`, `@audio[2]`, `@audio[3]` | Up to 3 total, 15s combined |

First reference image = `@image[1]`. First audio = `@audio[1]`. They don't share counters.

**At the top of every prompt, note what each reference is**, e.g.:
```
reference @image[1] for the product
reference @image[2] for the lifestyle setting
reference @audio[1] for the voice tone
```

## 4. Actors are OPTIONAL — don't assume one is provided, and max ONE actor per prompt

An "actor" in Starpop = a specific human face image the user uploads via the **Add Actor** button. Add Actor is a convenience upload path with backend face-processing for Seedance compliance. Once uploaded, the image lives in the same `@image[N]` array as any other reference image — it gets its own index based on upload order.

**Default assumption: no actor is provided.** Seedance generates humans from the prompt's description just fine for most UGC. Describe the subject specifically (age, features, clothing, expression, setting) and let the model render them. Do NOT require an actor image in every prompt.

**Hard limit: maximum ONE actor per prompt.** Even for formats with multiple humans (podcast, street interview, reaction duets), only ONE human can have a locked face via Add Actor. The second human is described in the prompt and generated by Seedance. Pick the most important human for the actor slot (usually the one holding / using the product).

**Only suggest adding an actor when it's genuinely useful:**
- User wants face consistency across multiple generations (a recurring brand creator)
- User has a specific real influencer / model they want featured
- The brand has an established actor identity

If the orchestrator (`starpop-workflow-agent`) confirms the user provided an actor image, reference it by its `@image[N]` index in the prompt (e.g., `@image[2]` if the product is `@image[1]`). Otherwise, describe the subject in natural language and don't reference an actor image at all.

**If this skill is used standalone (no orchestrator), ask once during the brief:** *"Do you have a specific actor image you want featured, or should Seedance generate the subject from the description?"* Default to "generate from description" unless the user provides one.

## 5. Audio Direction

Seedance 2.0 generates audio natively. Every prompt MUST include explicit audio direction. Default voice + room tone + speech pattern for the scene.

**Voice — match to demographic.** Examples:
- *"Warm female voice, mid-20s, casual, talking to a friend"*
- *"Deep male voice, 40s, genuine dad energy, not a narrator"*
- *"Light female voice, teens, curious, slight giggle"*

**Room tone — must match the setting:**
- Bathroom → slight reverb from tiled walls
- Bedroom → soft close acoustics, carpeted, minimal echo
- Kitchen → open space feel, subtle ambient sounds (fridge hum, distant street)
- Car → muffled close acoustics
- Outdoors → natural ambience, slight wind or traffic
- Living room → warm room tone, furnished space
- Studio / podcast → close-mic'd, subtle studio dampening

**Speech pattern:** natural delivery with pauses, filler words, contractions. NOT scripted. The model handles this better when you tell it explicitly: *"delivery is conversational, not rehearsed, with small natural pauses."*

## 6. Dialogue Rules

Dialogue must sound REAL, not scripted.

- Use contractions: *"I've been"*, *"it's literally"*, *"you're gonna"*, *"I don't"*
- Include filler words: *"like"*, *"honestly"*, *"so basically"*, *"okay"*
- Casual grammar — fragments and run-ons are fine
- Sound genuinely excited or skeptical, not rehearsed

**Good:** *"Okay so I've been using this for like two weeks and honestly? It actually works."*
**Bad:** *"This revolutionary product has transformed my routine completely."*

If the user supplies a specific script, use it verbatim. Don't "improve" it — user's voice is more authentic than any rewrite.

## 7. Avoid complex vocabulary — Seedance struggles with pronunciation

Seedance 2.0 is a ByteDance (Chinese) model. It struggles with words that are long, Latin-derived, or rare in spoken English. Audio will glitch, slur, or completely mispronounce them.

**Common offenders — always rephrase:**

| Avoid | Say instead |
|---|---|
| Ashwagandha | stress-support blend, adaptogen, calming herb |
| L-theanine | amino acid, calming compound |
| Bacopa monnieri | memory-support herb |
| Phosphatidylserine | brain-support nutrient |
| Acetyl-L-carnitine | energy nutrient |
| Glucosamine | joint-support nutrient |
| Probiotics (technical strain names) | good gut bacteria |
| Complex Latin or scientific product names | a plain-English equivalent |

**Rule of thumb:** if you'd stumble reading it aloud, Seedance will too. Rewrite for natural spoken English. Let the ingredient list live on the packaging, not in the audio track.

## 8. Visual-first storytelling — describe actions, not emotions

Seedance does NOT reliably render internal emotional states. If you write *"the dog is trembling in fear"*, the model underdoes it — subtle cues get lost every time. Every emotion or behavior must be translated into an **exaggerated, physically explicit action** the model can actually animate.

**Rule:** if you can't see it from 6 feet away on a phone screen, the model won't render it. Describe what an outside observer would literally see, not what the character feels.

| ❌ Don't write (internal / subtle) | ✅ Write instead (external / exaggerated) |
|---|---|
| *Trembling in fear* | *Jumping up at every thunderclap, darting under the table, ears flat against skull* |
| *Anxious* | *Pacing in tight circles, looking at the door every few seconds, whining* |
| *Calm* | *Eyes half-closed, slow deep breath, body fully stretched out on the rug* |
| *Happy* | *Tail thumping hard against the floor, front paws tapping, broad open-mouth pant* |
| *Sad / defeated* | *Head lowered below shoulders, tail tucked, slow heavy slump onto the rug* |
| *Surprised* | *Eyes wide, hand flying to mouth, takes a full step back* |
| *Confident* | *Shoulders back, direct eye contact, slow measured walk* |
| *Disappointed* | *Shoulders drop visibly, slow exhale, looks down at the floor* |
| *Excited* | *Laughs mid-sentence, slaps the table, leans way forward into the shot* |
| *Focused* | *Leans in, narrows eyes, both hands flat on the table* |
| *Relieved* | *Big exhale with shoulders dropping, small smile spreads slowly, unclenches hands* |

**Corollary — exaggeration beats subtlety.** The model consistently underdoes actions. If you want a soft reaction, describe a medium one. If you want a medium reaction, describe a big one. Dial it up one notch past what you actually want on-screen.

**Pair this with the separation rule:** if the action involves the camera reacting to the subject, write them as two distinct instructions.

## 9. Positive-only prompting — no negatives

Seedance 2.0 does not reliably honor negative instructions ("avoid jitter", "no warping", "no deformation"). State what you WANT, not what you don't want.

- ❌ *"Avoid jitter."* / *"No warping on the product."*
- ✅ *"Stable picture. Natural smooth movements."* / *"Sharp logo, clean surface, correct proportions."*

When a prompt pattern uses negatives, rewrite to positives before using.

## 10. Script Pacing & Timeline Rule (scripted dialogue only)

Dialogue STARTS at 00:01 and ENDS 2 seconds before the video ends. Silent open, dialogue window, silent close.

| Duration | Dialogue window | Max words |
|---|---|---|
| 5s | 00:01–00:03 | ~5 |
| 7s | 00:01–00:05 | ~10 |
| 10s | 00:01–00:08 | ~18 |
| 12s | 00:01–00:10 | ~23 |
| 15s | 00:01–00:13 | ~30 |

Longer dialogue than max → rushed delivery, muddled lip-sync. Cut it.

Multi-scene scripted formats (Scripted 1-shot UGC, multi-scene selfie) don't use this rule per-scene — they cut between scenes so each segment carries one short line.

## 11. Separation Rule — subject motion ≠ camera motion

Describe them as two separate instructions. Mixing them blends both into confused output.

- ❌ *"Spinning camera around a dancing person"*
- ✅ *"The dancer spins slowly. Camera holds fixed framing."*

## 12. Dangerous Keywords — never use alone

| ❌ Don't write | ✅ Write instead |
|---|---|
| `fast` alone | Only ONE element fast — e.g., *"Fast subject movement, slow camera"* |
| `cinematic` alone | *"35mm film tone, warm shadows"* or specific director reference |
| `epic` / `beautiful` / `amazing` | Describe the actual visual — lighting, composition, colors |
| `lots of movement` | One specific motion + speed modifier |
| `multiple angles` | One tracking shot; use timestamped beats for angle changes |
| `realistic` alone | *"Photorealistic, natural skin texture, practical lighting"* |
| `authentic UGC energy` alone | Specific camera + lighting + setting description |
| `high quality` alone | *"4K, sharp clarity, rich details"* |

## 13. Quality Suffix — final line of every prompt

> 4K, Ultra HD, rich details, sharp clarity, cinematic texture, natural colors, soft lighting, stable picture.

Positive only. Append as the last line of every prompt.

## 14. Per-category defaults

When the user doesn't specify aspect / duration / resolution, use these defaults. Always confirm with the user before generating.

| # | Category | Aspect | Duration | Resolution |
|---|---|---|---|---|
| 1 | Show & Tell | 9:16 | 8–15s | 720p |
| 2 | Scripted 1-shot UGC with music | 9:16 | 10–15s | 720p |
| 3 | Before / After | 9:16 | 8–15s | 720p |
| 4 | Unboxing | 9:16 | 10–15s | 720p |
| 5 | Showcase & How-To | 9:16 or 1:1 | 10–15s | 720p |
| 6 | POV / day-in-the-life | 9:16 | 10–15s | 720p |
| 7 | Street interview | 9:16 | 8–12s | 720p |
| 8 | Cinematic commercial | 16:9 or 9:16 | 10–15s | 720p |
| 9 | Greenscreen with study | 9:16 | 8–12s | 720p |
| 10 | ASMR | 9:16 | 6–10s | 720p |
| 11 | Gift Reaction | 9:16 | 8–12s | 720p |
| 12 | Podcast video | 16:9 | 10–15s | 720p |
| 13 | Pixar animated | 9:16 | 10–15s | 720p |
| 14 | Duo Dynamic UGC | 9:16 | 10–15s | 720p |

Language default: **English only** unless the user specifies otherwise.

---

# VIDEO CATEGORIES

Each category below gives you: the 1-sentence visual anchor, the fundamental that makes the format work, and specific direction to bake into your prompt.

## 1. Show & Tell

**Visual:** Subject holding their phone selfie-style with the product in their other hand, talking directly to camera while turning the product, pointing at features, and sharing their honest take. Everyday setting — kitchen, bathroom, bedroom, desk.

**Fundamental:** The product is the visual anchor, honest opinion is the verbal anchor. Subject holds and demonstrates while giving their candid take — not teaching, not hyping.

**Direction:** Describe the camera specifically — *"shot on smartphone front camera, slight handheld movement, medium-close framing with the product held in-frame near the chest"*. Don't write "iPhone selfie-style" alone. The product stays in-hand most of the shot, label-out at key beats, turned to show different angles. Anchor lighting to a real-world source: morning window light, late-afternoon kitchen glow, practical desk lamp, bathroom overhead. Subject description must be specific: age, expression, outfit, setting. Voice is candid and conversational — include at least one honest caveat or qualified observation (*"the packaging is a little annoying but..."*, *"I was skeptical at first because..."*, *"it's not magic, but..."*). Specifics land: real durations (*"about three weeks"*), comparisons (*"compared to the X I had before"*). Openers should feel mid-thought, not opening line — *"Okay so I've been using this for a couple weeks..."*, *"Nobody really tells you this about..."*, *"If you're considering [product]..."*. Avoid scripted ad-speak entirely. A slight handheld wobble reads more real than perfectly stable framing.

## 2. Scripted 1-shot UGC with music

**Visual:** Multi-scene high-energy promo with a subject on camera, quick cuts between product and lifestyle moments, upbeat music driving the pace.

**Fundamental:** Energy. Every beat has to carry forward momentum — no dead air, no downbeat tones.

**Direction:** Structure is 4 beats. Each beat = a short scene direction (visible action) + one spoken line with real emotional energy. Mark the energy with parenthetical tags — *(laughing)*, *(excited)*, *(amazed)*. Emojis belong IN the dialogue, not in scene directions — they cue vocal lift. Product shows up in scene 1 or 2 (opening moment), a lifestyle or result flash happens in scene 3 (proof), and the final scene combines product + emotional payoff (smile, hug, dog licking face). Openers that work: *"Okay so..."*, *"Okay listen —"*, *"Not gonna lie…"*, *"Everyone keeps asking me…"*. Tone stays upbeat across all 4 beats — if any beat feels somber, rewrite it.

## 3. Before / After

**Visual:** Visual contrast of a subject, space, body, or pet in a worse state, then a cut to the improved state after the product was used.

**Fundamental:** The gap between the two states has to be visible on screen, not just claimed in dialogue.

**Direction:** Plan the "before" with specific problem signals — tired eyes, cluttered surface, itchy dog, dull skin, stressed posture, bloated belly. Plan the "after" with their direct opposites — bright eyes, clean calm surface, relaxed dog, glowing skin, confident posture. The product appears in a bridge moment between the two — use it (apply, pour, give) rather than just show it on a shelf. Give the "before" a bit longer than feels comfortable — the viewer needs to feel the problem for the transformation to land. Lighting can shift between states to amplify contrast (cooler/dimmer in the before, warmer/brighter in the after). Keep dialogue minimal; visual payoff is the whole point.

## 4. Unboxing / first-impression

**Visual:** Close-up of hands opening a package on a clean surface, slow product reveal, subject reacting with genuine surprise.

**Fundamental:** Tactile specificity. Hands, textures, and small sounds carry the format.

**Direction:** Describe the hands explicitly — *"manicured fingernails carefully lifting the lid"*, *"fingertips running along the embossed logo"*, *"pulling out the inner sleeve"*. The subject's face should be visible at the reveal moment for emotional payoff. Don't over-direct the reaction — *"genuine surprise, small smile"* reads more real than *"shocked and amazed"*. Sound design is huge here: tear of tape, rustle of tissue paper, soft thunk of the product hitting the surface. Keep the space clean — a staged but lived-in kitchen counter or bedroom desk. Clutter dilutes focus. The product gets the hero moment once; before that, it's about anticipation.

## 5. Showcase & How-To

**Visual:** Step-by-step demonstration of how the product is used — application, technique, setup, or workflow. Camera fixed (propped phone, tripod, or fixed mirror angle) so both hands are free for the demo.

**Fundamental:** Teaching, not opining. The viewer should walk away knowing exactly how to use it correctly.

**Direction:** Lock the camera setup BEFORE writing — propped phone on a counter, tripod, or fixed mirror angle. Both hands need to be free for the demo, so handheld selfie won't work for this format. Describe the sequence of mechanical steps in the prompt: Step 1 opens the product, Step 2 dispenses or extracts, Step 3 applies or uses, Step 4 result or finishing beat. If the product has a non-obvious mechanic (twist-base, pump style, pull-tab, hidden feature), call it out explicitly so the model has motion ground truth and doesn't have to guess. Voice is calm and instructional — *"first you'll want to..."*, *"the trick is to..."*, *"this is the part most people get wrong..."*. Skip emotional energy; clarity beats hype. Lighting is even and bright (top-down or front-fill) so the technique reads clearly. The product should be visible in frame at every step, never out of view.

## 6. POV / day-in-the-life

**Visual:** First-person perspective footage shot through the user's eyes as they move through a real moment — making coffee, getting ready, walking the dog — with the product woven in.

**Fundamental:** Nothing can feel staged. The viewer must feel like they're watching a real morning/afternoon/evening.

**Direction:** Describe the hands (what they're holding, what they're doing), the body movement (leaning, reaching, walking), and the environmental sounds. The product appears mid-sequence — never front-loaded. It's just part of the routine. Example flow: hand reaches for kettle → pours coffee → grabs product off counter → takes dose → continues getting ready. Keep dialogue absent or minimal — one short voiceover line tucked at the end can work (*"and this is why I actually stuck with it"*). Lighting is always practical and real (morning light through blinds, bathroom overhead).

## 7. Street interview / vox pop

**Visual:** Handheld camera approaches a passerby on a city street, microphone visible in frame, rapid-fire question-and-answer about a product or topic.

**Fundamental:** The microphone must be visible — that's what cues the format to the viewer.

**Direction:** Interviewer is off-camera (you might see their hand holding the mic or their shoulder). Subject responds in medium-close shot, in an outdoor urban environment — describe the street specifically (*"bright afternoon on a busy downtown sidewalk"*, *"quiet tree-lined neighborhood block"*). Urban ambient sound is essential: traffic, footsteps, distant voices, occasional honk. Subject delivery should be totally unrehearsed — *"Uh, yeah I've actually tried that,"* *"Honestly? I don't know."* The pause between question and answer carries realism — don't cut it out. If the user wants multiple subjects in sequence, write each as a separate short clip with a different demographic.

## 8. Cinematic commercial (short film)

**Visual:** Film-quality footage with a character, story beat, and emotional arc — product embedded as a narrative object, not a hero shot.

**Fundamental:** Character arc. The subject must feel, want, or learn something during the clip. Without that, you have pretty B-roll, not a film.

**Direction:** Give the character an emotional signature in the opening — *"quiet confidence"*, *"exhausted but hopeful"*, *"giddy anticipation"*. Anchor the look to specific camera specs: *"Shot on Arri Alexa Mini LF, 32mm wide → 85mm close, shallow DOF, anamorphic falloff, rich natural grain, deep blacks, high contrast, no digital smoothing."* Then describe the "color world" in 2–3 sentences — setting + palette + lighting quality + atmospheric detail. Break the runtime into 3–5 beats, each with a speed rating (0.3x / 0.5x / 1x / 1.5x) and an explicit sound direction. Product is WORN, USED, REVEALED, GIVEN, or PIVOTED ON — never "sitting on marble" as a beauty shot. Minimize dialogue; let visuals + sound carry the story. One short line at a pivot moment is plenty.

## 9. Greenscreen with study / screenshot

**Visual:** Subject in the foreground with a study, chart, or article screenshot filling the entire background, pointing at the reference while speaking.

**Fundamental:** The study screenshot is the hook. It has to look real — real journal, real title, real finding.

**Direction:** Subject is framed like a smartphone selfie but slightly wider than usual so the background is legible behind them. The pointing gesture at the background MUST be written into the Actions block — without it, the model may not register that the background is the reference. Dialogue cites what's actually visible in the screenshot — don't exaggerate beyond the finding. Fast-paced TikTok delivery works; slow delivery feels like a lecture. The study image is referenced as `@image[2]` — the product is `@image[1]`. *(Note to workflow — finding and screenshotting the study is the workflow agent's responsibility, not this skill's.)*

## 10. ASMR / sensorial product demo

**Visual:** Extreme close-ups with enhanced ambient sound — fingers peeling a label, product scooping, soft tapping, lip-smacking — minimal or no speaking.

**Fundamental:** Sound design is 70% of the format. Describe every sound specifically.

**Direction:** Write the sound palette with detail — *"crinkle of plastic wrap, soft pop of the lid twisting off, slow scrape of the scoop through powder, light tap of the glass bottle on a marble counter, a single breathy inhale before the scoop"*. Visually, frame tight on fingers, textures, and surfaces. Lighting is soft, warm, single-source (window or small practical). Every action is slow and deliberate — no fast movement. Dialogue is either absent or a single breathy line at the end. Don't describe what the product does; describe what it sounds and feels like being used. Background is always clean and minimal — marble, wood, linen — no visual clutter.

## 11. Gift Reaction

**Visual:** Handheld iPhone films the gift-giving moment — recipient unboxing or unwrapping the product (often still in original retail packaging or branded shipping box), camera catching their real-time reaction as they discover what it is.

**Fundamental:** The reaction has to feel genuinely surprised. If the recipient looks like they're performing for the camera, the format dies.

**Direction:** Open with a neutral beat — recipient receives the wrapped box, gift bag, or sealed shipping carton, mildly curious, not yet knowing what's inside. Then the reveal: they open it and discover the product. The reaction should include specific physical tells — eyes widen, hand covers mouth, look back up at the giver, *"oh my god"* / *"shut up"* / *"no way"* / *"is this what I think it is"*. Original retail packaging is a feature, not a bug — keep the box visible and let the brand visuals do their work. Camera is the gift-giver's POV, handheld, slightly shaky, framing the recipient's face and hands. Lighting is whatever's natural in the moment (living room sofa, kitchen table, front door entryway). Optional second beat: recipient takes the product out, holds it up, *"I've wanted this for so long"*. Keep dialogue short and unscripted — over-acted lines kill the format.

## 12. Podcast video

**Visual:** Two people seated at a table with podcast microphones and a minimal backdrop, shot from a three-quarter angle, natural conversation between host and guest about the product.

**Fundamental:** It should feel like a clipped slice from a longer episode — mid-conversation, not opening line.

**Direction:** Subject 1 (host) poses a question or observation; Subject 2 (guest) responds with specifics about the product experience — never a sales pitch, always a lived response. Both should be in frame at once (two-shot) or alternate between close-ups at natural cut points. Set is minimal — mics on stands, maybe coffee cups, plain or slightly textured backdrop (acoustic panels, plants, bookshelf). Lighting is warm and soft with key light on each face. Audio direction matters a lot here: *"warm close-mic'd voice with subtle studio dampening"* — not "podcast sound." Delivery is measured and relaxed, not energetic.

## 13. Pixar animated

**Visual:** 3D-animated scene with exaggerated cute characters — talking dog, anthropomorphic bottle, worried pair of running shoes — expressing big emotions with soft lighting and expressive physics.

**Fundamental:** Emotional exaggeration. The whole game is making the character feel big, cute, and slightly naive.

**Direction:** Give the main character a clear feeling in the opening — worried, proud, embarrassed, hopeful, mischievous. Let their face, body, and physics amplify it: wobble, droop, bounce, sparkle, ears perking, eyebrows lifting, tail thumping. Objects CAN talk, pets CAN speak, gravity CAN bend — anything is on the table if the story calls for it. Lean into micro-expressions. The product is either the character itself (a bottle with a face) OR the thing that solves the character's problem in a cute satisfying visual moment. Dialogue should be short, warm, and a little naive — the way a Pixar character talks. Reference aesthetics: *"Pixar Luxo/Piper/Lava short-film feel, soft pastel lighting, expressive SSS skin, buttery animation, warm color palette."* Audio direction: *"warm friendly voice, childlike energy, with soft orchestral ambient score."*

## 14. Duo Dynamic UGC

**Visual:** Two people in frame where a specific social dynamic between them drives the scenario. The product enters through that dynamic — as a challenge, a gift, a complaint, a hype-up, a teaching moment — not as a side prop.

**Fundamental:** The dynamic is the engine. If the same script could be performed by one person alone, this format is wrong for it.

**Direction:** Pick ONE dynamic archetype upfront and bake it into the format descriptor at the top of the prompt (e.g., *"Playful couple challenge UGC promoting a {category}, boyfriend tries girlfriend's [product]"*). The dynamic dictates camera, energy, and dialogue rhythm. Common archetypes to choose from:

- **Playful couple challenge** — *"I bet you can't…"*, "his vs hers" comparison, one forces the other to try
- **Mock-frustrated couple** — one complains about the other's habit (*"my boyfriend won't stop using my…"*)
- **Romantic surprise / gift** — partner reveals the product to the other
- **Mom + kid** — mom commentary on kid using the product, or kid being charmingly into it
- **Dad + kid** — kid teaching dad, dad charmingly confused
- **Grandparent + grandchild** — generational moment (granddaughter showing grandma a new product, grandson trying something new with grandpa). The warmth plus cross-generational humor lands hard — high viral potential.
- **Sibling rivalry** — older vs younger, one one-upping the other
- **Best-friend hype** — *"OK you HAVE to see this"*
- **Friend gossip / confiding** — *"OK don't tell anyone but…"*
- **Roommate disagreement** — opposite reactions, one convincing the other
- **Coworkers at the office** — desk-side moment, one shows the other at their desk or in the break room
- **Pet as secondary character** — the pet IS the co-star (dog refuses or freaks out for the product, cat judges from the windowsill). One of the strongest viral angles.

Camera setups that work: handheld 2-shot with one person off-camera passing the camera back and forth, phone propped on a surface to capture both, or alternating handheld selfie passes between them. Pick ONE and describe it concretely in the prompt. Dialogue feels mid-scenario — not opening line. Energy matches the dynamic: playful is upbeat, mock-frustrated is dry, gift is warm-shock, friend hype is high-pitch, grandparent is warm and amused, pet-driven is wordless except for the human's reactions.

**Out of scope:** 3+ people on camera. Friend groups, full families, office teams — those are a future format type. For now, cap at 2 in frame.

---

## Step 5 — Present for approval

Present ALL prompts together in one message. Use this exact header format for each prompt:

```
Prompt [N] — [Category]
Duration: [X]s
Aspect Ratio: [X:X]

Assets:
- Product ([product name]) = @image[1]
- Actor ([who]) = @image[2] (via Add Actor)        [omit if no actor]
- Scene / mood ([what]) = @image[3]                [omit if not used]
- Voice reference = @audio[1]                      [omit if not used]
```

Then the full prompt text immediately below the header — every word, no truncation.

Only list assets actually used in THAT prompt. Indices depend on upload order in Starpop; flag the expected indices so the user knows which slot gets which image when they upload.

After all prompts are listed, ask: *"Any tweaks? I won't hand off until you confirm."*

Wait for explicit "yes" / "approved" / "go." Tweaks → revise → re-present all.

Ask: *"Here are [N] prompts. Any tweaks? I won't hand off until you confirm."*

Wait for explicit "yes" / "approved" / "go." Tweaks → revise → re-present all.

If the user also wants to RUN the prompts in Starpop, hand off to the `starpop-workflow-agent` skill with the approved plan.

---

## Troubleshooting

| Symptom | Fix |
|---|---|
| Dialogue rushed / bad lip-sync | Prompt exceeds max words for duration. Trim to the Pacing Rule. |
| Character drifts across beats | Reference `@image[N]` consistently + identical subject noun in every beat. |
| Pronunciation glitches on an ingredient or technical word | Rewrite per the "avoid complex vocabulary" table. |
| Product looks off (logo warped, wrong proportions) | Strengthen positives: *"Sharp logo, clean surface, correct proportions, exact label match to @image[1]."* |
| Handheld selfie feels overproduced | Cut abstract style words. Add specific camera (*"smartphone front camera, slight handheld"*) + lighting (*"soft window light, real skin texture"*). |
| 1-shot UGC feels flat | Emotional tag weak. Escalate: `(laughing)` → `(laughing, energy cranked)`. Add exclamations + emojis. |
| Cinematic feels like B-roll | Missing character arc. Rewrite opening beat to establish what the subject feels / wants. Embed product as narrative object. |
| Subtle emotion / behavior didn't render | Model underdid it. Rewrite with exaggerated physical action per the Visual-first rule — e.g., *"trembling"* → *"darting under the table, ears flat, jumping up at every thunderclap"*. |
| Greenscreen study looks fake | Verify you're using a real peer-reviewed screenshot as `@image[2]`, not a generated or stock image. |
| Pixar looks stiff | Add specific reference — *"Pixar Luxo/Piper-style animation, buttery framerate, expressive micro-movements."* |
| Podcast looks like a set | Describe the set as minimal + lived-in, not polished — *"acoustic panels and a leaning bookshelf behind them, two coffee cups on the table."* |

---

## Credit

Adapted from Sirio Berati's public [Seedance 2.0 Prompt Guide](https://seedance-prompt-guide.sirioberati.com). Category guidance refined through David's (Starpop) working references and testing.
