最强图片视频提示词反推插件 ComfyUI-QwenVL(z-image 集成图生图能力)
最强图片视频提示词反推插件 ComfyUI-QwenVL
ComfyUI-QwenVL 核心有三个作用:反推图片或者视频内容;增强润色文本提示语;图生故事。本节我们只看第一个,我们先使用 5 种模式进行图片反推测试和 3 种模式进行视频反推测试,最后给出图片反推工作流/视频反推工作流以及 z-image 集成图片反推能力来实现图生图工作流。
插件特性
-
• 标准和高级推理节点:同时提供了一个简单 QwenVL 推理节点和一个高级 QwenVL 推理节点,前者可以快速使用,后者在前者的基础上开放了模型推理参数(例如,temperature / top_p 等),用于细粒度控制模型推理 -
• 提示增强器:提供了一个纯文本提示增强器 -
• 预设和自定义提示:提供了丰富但简洁的预设提示语列表,方便我们选择,也开放了自己编写提示语来实现完全控制 -
• 支持多种模型:支持在各种官方 Qwen VL 模型之间轻松切换(具体支持的模型列表见下面的 “模型下载” 部分) -
• 自动模型下载:模型在首次使用时自动下载 -
• 支持图像和视频推理:支持推理单张图像和推理视频帧序列 -
• 支持多种推理优化机制:支持 sagaAttention/flashAttention2/spda 推理优化机制
效果展示
下面以 Qwen3-VL-8B-Instruct-FP8 在 24G 4090 显卡上进行测试,图片反推生成提示语之后,使用 z-image 进行文生图;视频反推生成提示语之后,使用 wan2.2 进行文生视频。
反推图片内容
性能:推理一张 649*1000 的图片,显存占用约 13.3G,推理时间约 15s
测试图片

Detailed Description:详细描述模式(默认)
A muscular male figure with spiky green hair stands in a dynamic martial stance atop a glowing white yin-yang symbol embedded in the ground, gripping a long silver sword vertically between his hands at chest level. His intense gaze is fixed forward with narrowed eyes conveying fierce concentration or aggression, while his mouth slightly opens as if shouting or exerting power. He wears layered traditional attire consisting of loose-fitting beige pants secured by brown straps, bare feet clad in simple sandals, and an elaborate purple-and-white robe that billows dramatically around him like wings due to motion or energy emanating from within. The backdrop features swirling blue light patterns resembling dragon motifs above and illuminated trigrams arranged symmetrically on either side of the central platform below. A strong vertical beam of cool-toned cyan-blue light descends from directly overhead, casting sharp highlights across the character’s body and creating deep contrasts against the dark surroundings where fainter elemental swirls are subtly rendered. The perspective appears eye-level but angled upward toward the warrior's midsection, emphasizing both their commanding posture and the supernatural aura surrounding them through centered framing focused squarely on the protagonist amidst this mystical setting.

Simple Description:简单描述模式
A green-haired warrior in a purple robe stands poised on a glowing yin-yang symbol, holding a sword aloft against a backdrop of blue energy and dragon motifs.

Ultra Detailed Description:极其详细描述模式
A dynamic anime-style warrior with vibrant green hair swept back by motion stands poised atop a glowing white yin-yang symbol embedded within an illuminated blue floor patterned with trigrams; his muscular torso is bare beneath layered robes—white inner garments cinched at the waist, overlaid by voluminous purple outer sleeves that billow dramatically outward like wings, adorned with crisp black trigram symbols against their rich satin-like texture. His right foot rests firmly on the central circle while left leg extends backward for balance, clad in beige pants tucked into brown sandals secured with woven straps over pale ankles. He grips a gleaming silver sword vertically between both hands just below chest height, its blade reflecting ambient light as it pierces through midair tension. Behind him, translucent blue dragon silhouettes coil upward across a gradient sky transitioning from deep indigo to electric cyan near the top edge where faint cloud wisps drift. The scene’s primary illumination originates from above and behind, casting sharp downward shadows under arms and feet, accentuating muscle definition and creating luminous halos around edges of clothing and weapon. A shallow depth-of-field blurs background elements subtly while keeping foreground focus razor-sharp on facial expression—a determined gaze framed by defined cheekbones and subtle shading suggesting sun-kissed tan skin tone—and intricate stitching along robe seams. Composition utilizes radial symmetry centered on the character's stance but introduces asymmetrical flow via trailing cape tendrils and diagonal energy trails emanating from his body toward the upper corners, guiding viewer eyes dynamically inward before expanding outwards. Negative space surrounds him crisply, emphasizing isolation amidst cosmic symbolism without distracting clutter or competing focal points.

Detailed Analysis:详细分析模式
Subject: A green-haired warrior in a purple robe holds a glowing sword while standing on a yin-yang symbol surrounded by八卦 patterns.People: One male figure with spiky green hair and intense expression, wearing traditional martial arts attire.Environment: The scene features floating dragon-like spirits above the character and illuminated八卦 symbols below him against dark background.Lighting: Bright blue glow emanates from behind the character and ground markings, casting dramatic highlights and shadows.Camera/Composition: Centered composition focusing on the dynamic pose of the fighter with symmetrical layout around the central yin-yang circle.Color/Texture: Dominated by vibrant blues, deep purples, and stark white accents; textures suggest fabric movement and ethereal energy effects.

Detailed Description:详细描述模式 + temperature 0.1
A muscular male figure with vibrant green spiky hair stands in a dynamic martial arts stance atop an illuminated yin-yang symbol surrounded by glowing trigrams on the ground, gripping a long white sword vertically between his hands at chest level while one foot rests firmly forward and the other slightly lifted behind him for balance. He wears layered robes—white undergarments beneath loose tan pants secured with brown straps, topped by voluminous purple outer garments that billow dramatically around him as though caught mid-motion or influenced by supernatural energy. His face shows intense concentration with furrowed brows and gritted teeth, conveying fierce determination. Behind him looms a stylized blue-white dragon silhouette against a dark backdrop streaked with vertical beams of cool-toned light emanating from above, suggesting mystical power sources overhead. The scene’s illumination originates primarily from above and within the symbols below, casting sharp highlights across his form and creating deep contrast with surrounding darkness, emphasizing the luminous glyphs and swirling mist-like effects near his feet. Shot from a low angle looking upward to emphasize his imposing presence and spiritual aura, he dominates the frame centrally framed amidst symmetrical patterns radiating outward from the circular emblem beneath his sole.

个人觉得最后一种参数设置最好。
反推视频内容
性能:推理一段 640*640 64 帧的视频,显存占用约 13.6G,推理时间约 20s
测试视频
Video Summary:视频总结模式
There are no actual “key events” or a traditional narrative unfolding in these images — they appear to be still frames from an animated character, likely from a game (such as Final Fantasy VII Remake) featuring Aerith Gainsborough.However, if we interpret them narratively:- The scene captures a quiet, serene moment with the character looking directly at the viewer.- Her expression shifts subtly: initially neutral, then softening into a gentle smile — suggesting warmth, introspection, or perhaps emotional connection.- She is adorned with delicate jewelry—a flower-shaped hairpin, dangling earrings, and layered necklaces—adding elegance even within what may seem like rugged attire (brown straps over her shoulder).- Soft bokeh background hints at cherry blossoms floating gently around her, evoking themes of beauty, transience, peace, or nostalgia.Narrative Interpretation:This could symbolically represent a reflective pause before action—or a poignant farewell. In context of Aerith’s story arc, it might reflect inner strength, kindness, or acceptance—even amid impending hardship. It's emotionally rich but not plot-driven; instead, it invites viewers to feel connected through visual poignancy rather than event-based storytelling.In short: There is no progression of "events," just a tender portrait capturing emotion and atmosphere.
Cinematic Description:电影级别描述模式
A close-up portrait captures an ethereal young woman with long, glossy black hair gently tousled by unseen wind, her face illuminated from above as soft sunlight filters through blurred blossoms drifting past in slow-motion grace. Her amber eyes hold quiet intensity, framed by delicate lashes, while a silver filigree hairpin glints near her temple and pearlescent drop earrings catch scattered light against her collarbone. She wears layered necklaces — one thin chain, another thicker link-style — over a dark top accented by brown leather straps that suggest utility or armor beneath. The background is dreamlike blur—out-of-focus petals float midair around her shoulders, rendered almost painterly due to shallow depth of field, creating separation between foreground detail and hazy bokeh backdrop. Light sources appear naturalistic yet diffused, casting gentle highlights on skin without harsh shadows, suggesting late afternoon sun softened by clouds or foliage overhead. Camera remains static at eye level, intimate enough to capture subtle expression changes across her lips, which curve faintly upward as if smiling inwardly. Composition balances symmetry along vertical axis, drawing focus squarely onto facial features via centered framing and minimal distraction. Mood evokes serene melancholy tinged with beauty — tranquil yet charged emotionally — enhanced by floating particles mimicking cherry blossom rain suspended in air just beyond reach.
Detailed Description:详细描述模式
A young woman with long, straight black hair adorned by a delicate silver hairpin near her temple gazes softly forward, her warm brown eyes holding gentle calm as she slightly tilts her head while wearing subtle earrings that catch the light. Her face rests in mid-frame under an ambient glow from above, suggesting late afternoon or early evening sun filtering through diffused clouds, casting even illumination across her skin without harsh shadows. She wears layered attire including a dark gray top beneath thick brown leather straps over one shoulder—possibly part of armor or utility gear—and a chain-link necklace draped around her neck. The blurred backdrop reveals floating white petals drifting downward against muted tones of blue-gray sky, indicating either springtime outdoors or perhaps cherry blossom season amid urban surroundings obscured behind bokeh effects. Camera maintains eye level close-up framing focused on her upper body and expressive features, allowing fine textures like individual strands of hair to be rendered clearly despite shallow depth-of-field blur extending into surrounding space. A faint breeze seems implied by slight movement within her locks and airborne blossoms suspended just beyond reach of focus plane.
看上去都不错,个人最喜欢最后一种。
插件安装
使用 ComfyUI-Manager 安装插件 https://github.com/1038lab/ComfyUI-QwenVL,之后重启 ComfyUI。
模型下载
模型会在首次运行工作流的时候自动下载,也可以手动下载。下载之后放置到 ComfyUI/models/LLM/Qwen-VL/ 目录下。
例如,下载 Qwen3-VL-8B-Instruct 模型,最终的目录结构如下:
ComfyUI/models/LLM/Qwen-VL/-- Qwen3-VL-8B-Instruct -- chat_template.json -- model-00001-of-00004.safetensors -- ...
模型下载地址如下:
|
|
|
|---|---|
|
|
https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct |
|
|
https://huggingface.co/Qwen/Qwen3-VL-2B-Thinking |
|
|
https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct-FP8 |
|
|
https://huggingface.co/Qwen/Qwen3-VL-2B-Thinking-FP8 |
|
|
https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct |
|
|
https://huggingface.co/Qwen/Qwen3-VL-4B-Thinking |
|
|
https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct-FP8 |
|
|
https://huggingface.co/Qwen/Qwen3-VL-4B-Thinking-FP8 |
|
|
https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct |
|
|
https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking |
|
|
https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct-FP8 |
|
|
https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking-FP8 |
|
|
https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct |
|
|
https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking |
|
|
https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct-FP8 |
|
|
https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking-FP8 |
|
|
https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct |
|
|
https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct |
工作流

反推图片内容

核心节点 QwenVL 和 QwenVL (Advanced):核心参数如下
|
|
|
|
|
|
|---|---|---|---|---|
| model_name |
|
|
|
|
| quantization |
|
|
|
|
| preset_prompt |
|
|
|
|
| custom_prompt |
|
|
|
|
| max_tokens |
|
|
|
|
| keep_model_loaded |
|
|
|
|
| seed |
|
|
|
|
| temperature |
|
|
|
|
| top_p |
|
|
|
|
| num_beams |
|
|
|
|
| repetition_penalty |
|
|
|
|
| frame_count |
|
|
|
|
| device |
|
|
|
|
关于模型推理参数,在 基于 ollama 和 Qwen 让模型进行角色扮演和模型破限 一文中,有做说明。
反推视频内容

由于我们要控制视频反推的帧数,所以需要使用高级节点。
z-image 图生图工作流

运行过程可能出现的问题
Transformers 版本问题
运行工作流,报错如下
The checkpoint you are trying to load has model type `qwen3_vl` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
解决方案:
-
1. 首先在 ComfyUI_windows_portable 目录下运行 python_embeded\python.exe -m pip list,查看 transformers 版本,我此处为 4.53.3,版本较低 -
2. 执行命令 python_embeded\python.exe -m pip install transformers==4.57.0 --force-reinstall,安装 transformers 新版本 -
3. 重启 ComfyUI
夜雨聆风
