Airium
Course · 5 lessons · ~5 minutes

Video with audio in Grok Imagine 1.5

The #1 model in the global Image-to-Video ranking: up to 30 seconds, native music, effects and lip-sync, up to 7 images in one generation.

⏱ up to 30 sec🔊 native audio🖼 up to 7 images⚡ ≈3 tokens/sec
Start learning →
1
Lesson 1 · 30 seconds

Choosing a version

In the studio, click the model name at the top, then find on the left Grok — versions on the right:

Airium Studio · model selection · Grok
Grok Imagine 1.5NEW
6–30 sec · native audio · up to 7 images · fun / normal / spicy
≈ 18 G-tokens / 6 sec
Grok Imagine (Legacy)
previous generation · 6–15 sec
1Choose 1.5 — Legacy is kept for compatibility.
2
Lesson 2 · 1 minute

Mode and parameters

All Grok 1.5 parameters:

parameters panel · Grok Imagine 1.5
Mode
From textFrom image
Duration
6 sec10152030
Format · Quality
16:99:161:12:3480p720p
Style
funnormalspicy
1I2V animates a photo or frame; orientation is taken from the image.
26–30 seconds — integer only.
3fun — expressive animation · normal — realism · spicy — 18+ (not available with your images).
🔊 No sound setup needed — Grok always generates native audio: music, effects, and lip-sync lines.
3
Lesson 3 · 1 minute

Start frame and references

Up to 7 images: the first is the starting frame, the rest are references.

"Frames" panel · Grok 1.5
Start frame
Start frame
Reference frames · up to 6
Ref 1 · character
Ref 2 · style
+add
+add
💡 Selected multiple images at once — the first automatically becomes the start frame, the rest become references.
⚠️ In spicy style, external images are unavailable — use fun or normal.
4
Lesson 4 · 1 minute

Writing a prompt

Grok's formula: camera + action + lighting + pace.

✕ Weak
«Beautiful video of a girl in the city»
✓ Strong
«The camera slowly circles around a girl on a neon street of a night city, rain, reflections in puddles, soft backlight, smooth movement without hard cuts»
✅ Animating a photo — add "preserve the exact color and style".
✅ Write lines in quotes — Grok will handle lip-sync automatically.
✅ For long videos (15–30 sec), describe in phases: "first…, then…, in the finale…".
5
Lesson 5 · 30 seconds

Generation and result

Hit "Generate" — the task enters the background queue (usually 30–90 seconds). The finished video will appear in your feed and in "My Videos"; tokens for failed generations are refunded automatically.

Real results from Airium Studio — hit play:

I2V · 6 s · camera orbit«The camera orbits around a young man, he lifts his head from the microscope, and the camera dives into the eyepiece — transition…»
I2V · 6 s · close-up«Hands reach toward each other, gently hold together, the camera slowly zooms in»
I2V · 6 s · style preservation«The camera moves in a circle, the patient plays the violin, preserve the color and style exactly…»
?
FAQ

FAQ

How to generate video in Grok Imagine 1.5 online?
Open Airium Studio, select the Grok Imagine 1.5 model in the catalog, set parameters, write your prompt, and click "Generate". Registration takes a minute — new users receive free tokens.

How much does video generation cost in Grok Imagine 1.5?
In Airium Studio, generating with Grok Imagine 1.5 costs approximately 18 G-tokens / 6 sec. You only pay for successful generations — tokens for failed ones are returned automatically.

What is the maximum video duration in Grok Imagine 1.5?
Available durations: 6–30 s. Parameters are set directly in the studio before generation.

Can I use Grok Imagine 1.5 without API keys or VPN?
Yes — Airium Studio works in the browser with no API keys, foreign bank cards, or VPN: simply select Grok Imagine 1.5 in the catalog and generate online.

Finale

Cheat sheet

ParameterValuesTip
Duration6–30 ssocial media — 6–10 sec
Format16:9 · 9:16 · 1:1 · 2:3 · 3:2in I2V — from image
Quality480p · 720p720p for publishing
Audionative, alwaysmusic + effects + lip-sync
Imagesup to 7spicy is not available with them
Price3 ⚡/sec6 s ≈ 18 · 30 s ≈ 90

🎥 Camera movement

«orbits in a circle», «slow push-in», «fly-through» — the primary language of Grok.

🗣 Dialogues

Lines in quotes → automatic lip-sync.

🖼 Photo animation

«preserve the color and style exactly» + smooth action.

⏩ Long videos

15–30 s — describe in phases.

Start learning →