Skip to content
@xlite-dev

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

Pinned Loading

  1. LeetCUDA LeetCUDA Public

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

    Cuda 8.2k 814

  2. lite.ai.toolkit lite.ai.toolkit Public

    🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉

    C++ 4.3k 764

  3. Awesome-LLM-Inference Awesome-LLM-Inference Public

    📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

    Python 4.6k 316

  4. Awesome-DiT-Inference Awesome-DiT-Inference Public

    📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

    Python 433 21

  5. torchlm torchlm Public

    💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉

    Python 266 27

  6. ffpa-attn ffpa-attn Public

    🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

    Cuda 227 10

Repositories

Showing 10 of 49 repositories
  • ImageReward Public Forked from zai-org/ImageReward

    [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation

    xlite-dev/ImageReward’s past year of commit activity
    Python 0 Apache-2.0 83 0 0 Updated Oct 30, 2025
  • longcat-video-fast Public

    🔥LongCat-Video 1.7x🎉 speedup: cache acceleration and 4/8-bits weight only.

    xlite-dev/longcat-video-fast’s past year of commit activity
    Python 4 0 0 0 Updated Oct 28, 2025
  • diffusers Public Forked from huggingface/diffusers

    🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

    xlite-dev/diffusers’s past year of commit activity
    Python 0 Apache-2.0 6,520 0 0 Updated Oct 28, 2025
  • xlite-dev/LongCat-Video’s past year of commit activity
    Python 0 MIT 55 0 0 Updated Oct 28, 2025
  • cache-dit Public Forked from vipshop/cache-dit

    A Unified, Flexible and Training-free Cache Acceleration Framework for 🤗 Diffusers.

    xlite-dev/cache-dit’s past year of commit activity
    Python 4 14 0 0 Updated Oct 28, 2025
  • ComfyUI Public Forked from comfyanonymous/ComfyUI

    The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

    xlite-dev/ComfyUI’s past year of commit activity
    Python 0 GPL-3.0 10,670 0 0 Updated Oct 27, 2025
  • qwen-image-fast Public

    ⚡️Qwen-Image 4.8x🎉 speedup with Hybrid Acceleration for low VRAM GPUs

    xlite-dev/qwen-image-fast’s past year of commit activity
    Python 13 Apache-2.0 0 0 0 Updated Oct 24, 2025
  • Kandinsky-5 Public Forked from ai-forever/Kandinsky-5

    Kandinsky 5.0: A family of diffusion models for Video & Image generation

    xlite-dev/Kandinsky-5’s past year of commit activity
    Python 0 Apache-2.0 11 0 0 Updated Oct 22, 2025
  • LeetCUDA Public

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

    xlite-dev/LeetCUDA’s past year of commit activity
    Cuda 8,246 GPL-3.0 814 8 0 Updated Oct 17, 2025
  • Wan2.1 Public Forked from Wan-Video/Wan2.1

    Wan: Open and Advanced Large-Scale Video Generative Models

    xlite-dev/Wan2.1’s past year of commit activity
    Python 1 Apache-2.0 2,109 0 0 Updated Oct 17, 2025