- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.5k
ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 #16805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you demonstrate the performance is preserved?
        
          
                ggml/src/ggml-cpu/ops.cpp
              
                Outdated
          
        
      | break; | ||
| default: | ||
| //rope type not supported, silently default to NORMAL | ||
| rotate_pairs<T>(n_dims, 1, cache, src, dst_data, 1); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to GGML_ABORT here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too. I was unsure because I saw that test-rope.cpp tests for an unsupported (not yet implemented maybe?) rope type. Shall I change both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GLM rope type was removed - we should remove it from the test-rope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do that.
| @ggerganov It seems there's an improvement in performance, I didn't expect that (see my updated first comment). As a note, I just copy pasted the default test_rope cases from test-backend-ops, I still don't know how to chose real life relevant dimensions for perf. | 
This PR is a small refactoring of ggml_compute_forward_rope_f32 and _f16.
I extracted the rotate_pairs logic to remove some duplicate code.
Also, I kept the current behavior for unsupported rope type - it defaults to normal type (use consecutive values for the pairs).
Here's the output from compare-llama-bench.py (I added some perf tests locally to test-backend-ops)
Device description: Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
Device memory: 16384 MB (16384 MB free)