How to Run Flux Locally
FLUX.1 by Black Forest Labs changed the AI image generation landscape when it launched. It produces images that rival Midjourney and DALL-E 3, but with one key difference: you can run it entirely on your own hardware. No cloud API, no per-image charges, no content filters. This guide walks you through the complete setup from scratch.
What Is FLUX.1?
FLUX.1 is an open-weight text-to-image model built by Black Forest Labs, the team behind Stable Diffusion. It uses a flow-matching architecture (hence the name) rather than the traditional diffusion approach. The result is faster convergence, better text rendering inside images, and stronger prompt adherence compared to SDXL.
There are two variants that matter for local use:
- FLUX.1 schnell -- The fast variant. Generates a 1024x1024 image in about 4 steps. Ideal for iteration and experimentation. FP8 quantized version fits in roughly 12 GB VRAM.
- FLUX.1 dev -- The quality variant. Needs 20-30 steps for best results. Produces more detailed, coherent images. Same VRAM footprint as schnell when quantized to FP8.
Both variants are available as FP8 quantized UNETs, which cut the VRAM requirement roughly in half compared to the full BF16 weights.
Hardware Requirements
FLUX is a 12-billion parameter model. Even with FP8 quantization, it needs a capable GPU. Here is what you need:
| Component | Minimum | Recommended |
|---|---|---|
| GPU VRAM | 10 GB (RTX 3080) | 12+ GB (RTX 4070 Ti / RTX 3090) |
| System RAM | 16 GB | 32 GB |
| Storage | 20 GB free | 40+ GB free |
| GPU Brand | NVIDIA (CUDA) | NVIDIA (CUDA) |
AMD GPUs can work through ROCm, but NVIDIA with CUDA remains the most stable path. If you have 8 GB VRAM, FLUX will struggle -- consider SDXL models like Juggernaut XL instead, which produce excellent results at 6 GB.
Step 1: Install Locally Uncensored
The fastest path to running FLUX locally is through Locally Uncensored, which handles the entire backend stack for you. It bundles Ollama for chat and ComfyUI for image and video generation into a single interface.
git clone https://github.com/PurpleDoubleD/locally-uncensored.git
cd locally-uncensored
npm install
npm run dev
On first launch, the app will detect whether ComfyUI is installed. If it is not found, the Model Manager will guide you through setup. ComfyUI runs as the backend engine that actually loads and executes FLUX -- Locally Uncensored provides the frontend and automates the workflow construction.
Step 2: Install ComfyUI
If you do not already have ComfyUI installed, download the latest portable release from the ComfyUI GitHub repository. Extract it anywhere on your drive. Locally Uncensored will auto-detect it by scanning common paths, or you can point to it manually in Settings.
ComfyUI needs Python 3.10+ and PyTorch with CUDA support. The portable release bundles everything, so in most cases you do not need to install anything separately.
Step 3: Download FLUX Models
FLUX requires four separate model files, unlike SDXL which uses a single checkpoint. This is because FLUX splits the architecture into separate components:
- UNET -- The core diffusion model (the largest file, roughly 12 GB for FP8)
- VAE -- The image encoder/decoder (ae.safetensors)
- CLIP_L -- The first text encoder
- T5-XXL -- The second text encoder (handles complex prompts)
In Locally Uncensored, open the Model Manager tab. You will find pre-configured FLUX bundles under the Image section. Click Install All on either the FLUX.1 schnell FP8 or FLUX.1 dev FP8 bundle. The app downloads all four files to the correct ComfyUI directories automatically.
If you prefer manual download, place the files in these ComfyUI subdirectories:
ComfyUI/models/diffusion_models/ -- UNET file
ComfyUI/models/vae/ -- VAE file
ComfyUI/models/text_encoders/ -- CLIP_L and T5-XXL files
Step 4: Generate Your First Image
Once the models are downloaded, switch to the Create tab in Locally Uncensored. The app auto-detects all installed models. Select your FLUX model from the model dropdown in the right-side parameter panel.
Type a prompt and click Generate. The app automatically:
- Classifies the model as FLUX type
- Finds the matching VAE and text encoders
- Builds the correct ComfyUI workflow (no node editing required)
- Submits the workflow and polls for results
For FLUX schnell, use 4 steps. For FLUX dev, use 20 steps as a starting point. The default resolution is 1024x1024, which is the native training resolution for FLUX.
FLUX Schnell vs FLUX Dev: Which Should You Use?
| Feature | FLUX.1 schnell | FLUX.1 dev |
|---|---|---|
| Speed (1024x1024) | ~5 seconds | ~30 seconds |
| Recommended Steps | 4 | 20-30 |
| Image Quality | Good | Excellent |
| Text in Images | Decent | Very good |
| Prompt Adherence | Good | Excellent |
| License | Apache 2.0 | Non-commercial |
| VRAM (FP8) | ~12 GB | ~12 GB |
Start with schnell for quick iteration. Switch to dev when you want the best possible output for a prompt you have refined. The VRAM usage is identical since they share the same architecture.
Parameter Tuning Tips
Resolution
FLUX was trained on 1024x1024. You can generate at other aspect ratios (e.g., 1024x768 for landscape, 768x1024 for portrait) without significant quality loss. Going above 1024 on either dimension increases VRAM usage and can cause artifacts unless you use tiling or a high-res fix workflow.
CFG Scale
FLUX schnell works best with a CFG of 1.0 -- it was designed for low-step, low-guidance generation. FLUX dev responds well to CFG values between 3.0 and 5.0. Higher values increase prompt adherence but can introduce saturation artifacts.
Sampler and Scheduler
The default sampler (euler) and scheduler (normal) work well for both variants. FLUX schnell specifically benefits from the sgm_uniform scheduler. For dev, the simple scheduler at 20 steps is a reliable baseline.
Seed Control
Fix the seed to reproduce results. Change only one parameter at a time when comparing outputs. This is the fastest way to learn how FLUX responds to different settings.
Troubleshooting
Out of Memory Errors
If you get CUDA out of memory errors, try reducing the resolution to 768x768. Close other GPU-intensive applications. If the problem persists, you may need to switch to an SDXL model like Juggernaut XL, which runs comfortably at 6 GB VRAM.
Black or Broken Images
This usually means the VAE or text encoder is missing or mismatched. Verify that all four model files are present and that the app detected them correctly in the Model Manager. FLUX requires its own specific VAE (ae.safetensors) -- it cannot share a VAE with SDXL or SD 1.5 models.
Slow Generation
Ensure PyTorch is using CUDA and not falling back to CPU. In the ComfyUI terminal output, look for the CUDA device being listed at startup. Generation on CPU is roughly 50x slower and not practical for FLUX.
What About SDXL?
If your GPU has less than 10 GB VRAM, SDXL is the better choice. Models like Juggernaut XL V9 produce excellent photorealistic results and run as a single checkpoint file -- no separate VAE or text encoder downloads needed. Locally Uncensored includes Juggernaut XL as a one-click install bundle in the Model Manager.
FLUX produces better results at complex prompts and text rendering. SDXL is more forgiving on hardware and has a larger ecosystem of LoRA fine-tunes. Both are valid choices depending on your setup.