Skip to content

Conversation

@Kaihui-intel
Copy link
Contributor

@Kaihui-intel Kaihui-intel commented Oct 30, 2025

scheme /(opt-125m,) format RTN iter>0
W4A16 auto_round 0.2882 0.3526
W2A16 auto_round   0.1657
W3A16 auto_round   0.3247
W8A16 auto_round   0.3784
bit s group_size 32 auto_round 0.3749 0.3679
bit s group_size 32 auto_gptq 0.3747 0.3658
bit s group_size 32 auto_awq 0.3749 0.3646

#788

memory check
Qwen2.5-7B-Instruct-w4g32 RTN auto_round
mprof peak
16659.441MiB->9200.250MiB ~55%

Kaihui-intel and others added 8 commits October 16, 2025 01:41
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@wenhuach21
Copy link
Contributor

Thanks for the great work! Could you check the maximum RAM usage to see whether it has been reduced significantly, as expected?

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@xin3he xin3he modified the milestones: 1.0, 0.9.0 Oct 30, 2025
Kaihui-intel and others added 8 commits October 31, 2025 01:07
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
@wenhuach21 wenhuach21 changed the title Support for immediate saving [High Risk]Support for immediate saving Oct 31, 2025
Kaihui-intel and others added 6 commits October 31, 2025 07:33
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
self.is_packing_immediate = False # whether to pack the layer immediately after tuning

# Whether to pack the layer immediately after tuning
self.is_packing_immediate = kwargs.pop("is_packing_immediate", False)
Copy link
Contributor

@wenhuach21 wenhuach21 Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Packing immediate is set automatically before. Have you handled this when exporting >1 formats? So it’s better not to set it in the API. Besides, as discussed, set save_immediate to True.
Another thing to verify is the time cost of save_immediate, have you measured the total quantization time comparing to main branch ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants