How to Run Flux Locally

Published April 2, 2026 · PurpleDoubleD · 8 min read

FLUX.1 by Black Forest Labs changed the AI image generation landscape when it launched. It produces images that rival Midjourney and DALL-E 3, but with one key difference: you can run it entirely on your own hardware. No cloud API, no per-image charges, no content filters. This guide walks you through the complete setup from scratch.

What Is FLUX.1?

FLUX.1 is an open-weight text-to-image model built by Black Forest Labs, the team behind Stable Diffusion. It uses a flow-matching architecture (hence the name) rather than the traditional diffusion approach. The result is faster convergence, better text rendering inside images, and stronger prompt adherence compared to SDXL.

There are two variants that matter for local use:

FLUX.1 schnell -- The fast variant. Generates a 1024x1024 image in about 4 steps. Ideal for iteration and experimentation. FP8 quantized version fits in roughly 12 GB VRAM.
FLUX.1 dev -- The quality variant. Needs 20-30 steps for best results. Produces more detailed, coherent images. Same VRAM footprint as schnell when quantized to FP8.

Both variants are available as FP8 quantized UNETs, which cut the VRAM requirement roughly in half compared to the full BF16 weights.

Hardware Requirements

FLUX is a 12-billion parameter model. Even with FP8 quantization, it needs a capable GPU. Here is what you need:

Component	Minimum	Recommended
GPU VRAM	10 GB (RTX 3080)	12+ GB (RTX 4070 Ti / RTX 3090)
System RAM	16 GB	32 GB
Storage	20 GB free	40+ GB free
GPU Brand	NVIDIA (CUDA)	NVIDIA (CUDA)

AMD GPUs can work through ROCm, but NVIDIA with CUDA remains the most stable path. If you have 8 GB VRAM, FLUX will struggle -- consider SDXL models like Juggernaut XL instead, which produce excellent results at 6 GB.

Step 1: Install Locally Uncensored

The fastest path to running FLUX locally is through Locally Uncensored, which handles the entire backend stack for you. It bundles Ollama for chat and ComfyUI for image and video generation into a single interface.

git clone https://github.com/PurpleDoubleD/locally-uncensored.git
cd locally-uncensored
npm install
npm run dev

On first launch, the app will detect whether ComfyUI is installed. If it is not found, the Model Manager will guide you through setup. ComfyUI runs as the backend engine that actually loads and executes FLUX -- Locally Uncensored provides the frontend and automates the workflow construction.

Step 2: Install ComfyUI

If you do not already have ComfyUI installed, download the latest portable release from the ComfyUI GitHub repository. Extract it anywhere on your drive. Locally Uncensored will auto-detect it by scanning common paths, or you can point to it manually in Settings.

ComfyUI needs Python 3.10+ and PyTorch with CUDA support. The portable release bundles everything, so in most cases you do not need to install anything separately.

Step 3: Download FLUX Models

FLUX requires four separate model files, unlike SDXL which uses a single checkpoint. This is because FLUX splits the architecture into separate components:

UNET -- The core diffusion model (the largest file, roughly 12 GB for FP8)
VAE -- The image encoder/decoder (ae.safetensors)
CLIP_L -- The first text encoder
T5-XXL -- The second text encoder (handles complex prompts)

In Locally Uncensored, open the Model Manager tab. You will find pre-configured FLUX bundles under the Image section. Click Install All on either the FLUX.1 schnell FP8 or FLUX.1 dev FP8 bundle. The app downloads all four files to the correct ComfyUI directories automatically.

If you prefer manual download, place the files in these ComfyUI subdirectories:

ComfyUI/models/diffusion_models/   -- UNET file
ComfyUI/models/vae/                -- VAE file
ComfyUI/models/text_encoders/      -- CLIP_L and T5-XXL files

Step 4: Generate Your First Image

Once the models are downloaded, switch to the Create tab in Locally Uncensored. The app auto-detects all installed models. Select your FLUX model from the model dropdown in the right-side parameter panel.

Type a prompt and click Generate. The app automatically:

Classifies the model as FLUX type
Finds the matching VAE and text encoders
Builds the correct ComfyUI workflow (no node editing required)
Submits the workflow and polls for results

For FLUX schnell, use 4 steps. For FLUX dev, use 20 steps as a starting point. The default resolution is 1024x1024, which is the native training resolution for FLUX.

FLUX Schnell vs FLUX Dev: Which Should You Use?

Feature	FLUX.1 schnell	FLUX.1 dev
Speed (1024x1024)	~5 seconds	~30 seconds
Recommended Steps	4	20-30
Image Quality	Good	Excellent
Text in Images	Decent	Very good
Prompt Adherence	Good	Excellent
License	Apache 2.0	Non-commercial
VRAM (FP8)	~12 GB	~12 GB

Start with schnell for quick iteration. Switch to dev when you want the best possible output for a prompt you have refined. The VRAM usage is identical since they share the same architecture.

Parameter Tuning Tips

Resolution

FLUX was trained on 1024x1024. You can generate at other aspect ratios (e.g., 1024x768 for landscape, 768x1024 for portrait) without significant quality loss. Going above 1024 on either dimension increases VRAM usage and can cause artifacts unless you use tiling or a high-res fix workflow.

CFG Scale

FLUX schnell works best with a CFG of 1.0 -- it was designed for low-step, low-guidance generation. FLUX dev responds well to CFG values between 3.0 and 5.0. Higher values increase prompt adherence but can introduce saturation artifacts.

Sampler and Scheduler

The default sampler (euler) and scheduler (normal) work well for both variants. FLUX schnell specifically benefits from the sgm_uniform scheduler. For dev, the simple scheduler at 20 steps is a reliable baseline.

Seed Control

Fix the seed to reproduce results. Change only one parameter at a time when comparing outputs. This is the fastest way to learn how FLUX responds to different settings.

Troubleshooting

Out of Memory Errors

If you get CUDA out of memory errors, try reducing the resolution to 768x768. Close other GPU-intensive applications. If the problem persists, you may need to switch to an SDXL model like Juggernaut XL, which runs comfortably at 6 GB VRAM.

Black or Broken Images

This usually means the VAE or text encoder is missing or mismatched. Verify that all four model files are present and that the app detected them correctly in the Model Manager. FLUX requires its own specific VAE (ae.safetensors) -- it cannot share a VAE with SDXL or SD 1.5 models.

Slow Generation

Ensure PyTorch is using CUDA and not falling back to CPU. In the ComfyUI terminal output, look for the CUDA device being listed at startup. Generation on CPU is roughly 50x slower and not practical for FLUX.

What About SDXL?

If your GPU has less than 10 GB VRAM, SDXL is the better choice. Models like Juggernaut XL V9 produce excellent photorealistic results and run as a single checkpoint file -- no separate VAE or text encoder downloads needed. Locally Uncensored includes Juggernaut XL as a one-click install bundle in the Model Manager.

FLUX produces better results at complex prompts and text rendering. SDXL is more forgiving on hardware and has a larger ecosystem of LoRA fine-tunes. Both are valid choices depending on your setup.

Try Locally Uncensored

Free, open source, MIT licensed. One command to get started.

View on GitHub