-
Notifications
You must be signed in to change notification settings - Fork 58
[High Risk]Support for immediate saving #965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
for more information, see https://pre-commit.ci
|
Thanks for the great work! Could you check the maximum RAM usage to see whether it has been reduced significantly, as expected? |
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
… into kaihui/save_block
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
| self.is_packing_immediate = False # whether to pack the layer immediately after tuning | ||
|
|
||
| # Whether to pack the layer immediately after tuning | ||
| self.is_packing_immediate = kwargs.pop("is_packing_immediate", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packing immediate is set automatically before. Have you handled this when exporting >1 formats? So it’s better not to set it in the API. Besides, as discussed, set save_immediate to True.
Another thing to verify is the time cost of save_immediate, have you measured the total quantization time comparing to main branch ?
#788
memory check
Qwen2.5-7B-Instruct-w4g32 RTN auto_round
mprof peak
16659.441MiB->9200.250MiB~55%