aboutsummaryrefslogtreecommitdiffstats
path: root/modules/sd_hijack_optimizations.py
Commit message (Collapse)AuthorAgeFilesLines
* Make sub-quadratic the default for MPSbrkirch2023-08-131-2/+5
|
* Use fixed size for sub-quadratic chunking on MPSbrkirch2023-08-131-1/+5
| | | | Even if this causes chunks to be much smaller, performance isn't significantly impacted. This will usually reduce memory usage but should also help with poor performance when free memory is low.
* update doggettx cross attention optimization to not use an unreasonable ↵AUTOMATIC11112023-08-021-2/+2
| | | | amount of memory in some edge cases -- suggestion by MorkTheOrk
* get attention optimizations to workAUTOMATIC11112023-07-131-7/+7
|
* SDXL supportAUTOMATIC11112023-07-121-8/+43
|
* Merge pull request #11066 from aljungberg/patch-1AUTOMATIC11112023-06-071-1/+1
|\ | | | | Fix upcast attention dtype error.
| * Fix upcast attention dtype error.Alexander Ljungberg2023-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | Without this fix, enabling the "Upcast cross attention layer to float32" option while also using `--opt-sdp-attention` breaks generation with an error: ``` File "/ext3/automatic1111/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 612, in sdp_attnblock_forward out = torch.nn.functional.scaled_dot_product_attention(q, k, v, dropout_p=0.0, is_causal=False) RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: c10::Half instead. ``` The fix is to make sure to upcast the value tensor too.
* | Merge pull request #10990 from vkage/sd_hijack_optimizations_bugfixAUTOMATIC11112023-06-041-1/+1
|\ \ | | | | | | torch.cuda.is_available() check for SdOptimizationXformers
| * | fix the broken line for #10990AUTOMATIC2023-06-041-1/+1
| | |
| * | torch.cuda.is_available() check for SdOptimizationXformersVivek K. Vasishtha2023-06-031-1/+1
| |/
| * revert default cross attention optimization to DoggettxAUTOMATIC2023-06-011-3/+3
| | | | | | | | make --disable-opt-split-attention command line option work again
* | revert default cross attention optimization to DoggettxAUTOMATIC2023-06-011-3/+3
| | | | | | | | make --disable-opt-split-attention command line option work again
* | rename print_error to report, use it with together with package nameAUTOMATIC2023-05-311-2/+1
| |
* | Add & use modules.errors.print_error where currently printing exception info ↵Aarni Koskela2023-05-291-4/+2
|/ | | | by hand
* Add a couple `from __future__ import annotations`es for Py3.9 compatAarni Koskela2023-05-201-0/+1
|
* Apply suggestions from code reviewAUTOMATIC11112023-05-191-38/+28
| | | Co-authored-by: Aarni Koskela <akx@iki.fi>
* fix linter issuesAUTOMATIC2023-05-181-1/+1
|
* make it possible for scripts to add cross attention optimizationsAUTOMATIC2023-05-181-3/+132
| | | | add UI selection for cross attention optimization
* Autofix Ruff W (not W605) (mostly whitespace)Aarni Koskela2023-05-111-16/+16
|
* ruff auto fixesAUTOMATIC2023-05-101-7/+7
|
* autofixes from ruffAUTOMATIC2023-05-101-1/+0
|
* Fix for Unet NaNsbrkirch2023-05-081-0/+3
|
* Update sd_hijack_optimizations.pyFNSpd2023-03-241-1/+1
|
* Update sd_hijack_optimizations.pyFNSpd2023-03-211-1/+1
|
* sdp_attnblock_forward hijackPam2023-03-101-0/+24
|
* argument to disable memory efficient for sdpPam2023-03-101-0/+4
|
* scaled dot product attentionPam2023-03-061-0/+42
|
* Add UI setting for upcasting attention to float32brkirch2023-01-251-60/+99
| | | | | | Adds "Upcast cross attention layer to float32" option in Stable Diffusion settings. This allows for generating images using SD 2.1 models without --no-half or xFormers. In order to make upcasting cross attention layer optimizations possible it is necessary to indent several sections of code in sd_hijack_optimizations.py so that a context manager can be used to disable autocast. Also, even though Stable Diffusion (and Diffusers) only upcast q and k, unfortunately my findings were that most of the cross attention layer optimizations could not function unless v is upcast also.
* better support for xformers flash attention on older versions of torchAUTOMATIC2023-01-231-24/+18
|
* add --xformers-flash-attention option & implTakuma Mori2023-01-211-2/+24
|
* extra networks UIAUTOMATIC2023-01-211-5/+5
| | | | rework of hypernets: rather than via settings, hypernets are added directly to prompt as <hypernet:name:weight>
* Added licensebrkirch2023-01-061-0/+1
|
* Change sub-quad chunk threshold to use percentagebrkirch2023-01-061-9/+9
|
* Add Birch-san's sub-quadratic attention implementationbrkirch2023-01-061-25/+99
|
* Use other MPS optimization for large q.shape[0] * q.shape[1]brkirch2022-12-211-4/+6
| | | | | | Check if q.shape[0] * q.shape[1] is 2**18 or larger and use the lower memory usage MPS optimization if it is. This should prevent most crashes that were occurring at certain resolutions (e.g. 1024x1024, 2048x512, 512x2048). Also included is a change to check slice_size and prevent it from being divisible by 4096 which also results in a crash. Otherwise a crash can occur at 1024x512 or 512x1024 resolution.
* cleanup some unneeded imports for hijack filesAUTOMATIC2022-12-101-3/+0
|
* do not replace entire unet for the resolution hackAUTOMATIC2022-12-101-28/+0
|
* Patch UNet Forward to support resolutions that are not multiples of 64Billy Cao2022-11-231-0/+31
| | | | Also modifed the UI to no longer step in 64
* Remove wrong self reference in CUDA support for invokeaiCheka2022-10-191-1/+1
|
* Update sd_hijack_optimizations.pyC43H66N12O12S22022-10-181-0/+3
|
* readd xformers attnblockC43H66N12O12S22022-10-181-0/+15
|
* delete xformers attnblockC43H66N12O12S22022-10-181-12/+0
|
* Use apply_hypernetwork functionbrkirch2022-10-111-10/+4
|
* Add InvokeAI and lstein to credits, add back CUDA supportbrkirch2022-10-111-0/+13
|
* Add check for psutilbrkirch2022-10-111-4/+15
|
* Add cross-attention optimization from InvokeAIbrkirch2022-10-111-0/+79
| | | | | | * Add cross-attention optimization from InvokeAI (~30% speed improvement on MPS) * Add command line option for it * Make it default when CUDA is unavailable
* rename hypernetwork dir to hypernetworks to prevent clash with an old ↵AUTOMATIC2022-10-111-1/+1
| | | | filename that people who use zip instead of git clone will have
* fixes related to mergeAUTOMATIC2022-10-111-1/+2
|
* replace duplicate code with a functionAUTOMATIC2022-10-111-29/+15
|
* remove functorchC43H66N12O12S22022-10-101-2/+0
|