-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Hi there
I've recently been trying to use CMake instead of Make to build OpenBLAS, while preserving the original implementation logic as much as possible. However, I've encountered a legacy issue related to symbol renaming.
Below is a detailed explanation.
I'm only considering the build issue for x64 on Linux, using CMake to replace Make. Other scenarios are not being considered for now.
Overall, this can be divided into two parts: the first part is switching to CMake, and the second part is the error that occurred.
1. Switching to CMake
I mainly referred to OpenBLAS's CI script at OpenBLAS's .github/workflows/dynamic_arch.yml
By simply modifying the parameters, I was able to get a CMake version of the build. There's not much special to explain. For detailed modifications, you can check this commit, and the CI results are here.
CMake is not completely consistent with Make - for example, certain folder locations, library names, etc. We'll set that aside for now. We're only considering building OpenBLAS that can be used to build wheels, where the built wheel can import the OpenBLAS .so file and read the symbols inside.
After the build completed, when creating the wheel and testing it, I got the following error:
+ python3.9 -m scipy_openblas64
Traceback (most recent call last):
File "/opt/python/cp39-cp39/lib/python3.9/runpy.py", line 188, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/opt/python/cp39-cp39/lib/python3.9/runpy.py", line 147, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "/opt/python/cp39-cp39/lib/python3.9/runpy.py", line 111, in _get_module_details
__import__(pkg_name)
File "/tmp/tmp.SHQxv7Je9g/venv/lib/python3.9/site-packages/scipy_openblas64/__init__.py", line 140, in <module>
get_openblas_config()
File "/tmp/tmp.SHQxv7Je9g/venv/lib/python3.9/site-packages/scipy_openblas64/__init__.py", line 134, in get_openblas_config
openblas_config = dll.scipy_openblas_get_config64_
File "/opt/python/cp39-cp39/lib/python3.9/ctypes/__init__.py", line 387, in __getattr__
func = self.__getitem__(name)
File "/opt/python/cp39-cp39/lib/python3.9/ctypes/__init__.py", line 392, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /tmp/tmp.SHQxv7Je9g/venv/lib/python3.9/site-packages/scipy_openblas64/lib64/libscipy_openblas64__64.so: undefined symbol: scipy_openblas_get_config64_
✕ 6.46sThis error is explained in detail below. Simply put, this issue is related to the symbol renaming we need to perform, and this problem has not yet been resolved on Linux.
2. objcopy can't modify symbols used for dynamic loading in .so Files
Our requirement is to modify OpenBLAS symbols by adding a scipy_ prefix to them.
To simplify the situation in OpenBLAS, I created a very simple example to illustrate the problem.
In the example below, I'll demonstrate how to modify symbols.
$ tree .
.
├── add.c
├── add.h
├── main.c
└── Makefile
1 directory, 4 filesWe have four files as shown above.
add.c only contains the definition of add:
int add(int a, int b)
{
return a + b;
}add.h is as follows:
int add(int, int);
// The following function is not implemented in add.c; it is achieved by modifying the symbol table.
int scipy_add(int, int);For main.c, we expect to use scipy_add:
#include <stdio.h>
#include "add.h"
int main()
{
printf("%d\n", scipy_add(1, 2));
return 0;
}As you can see, if we use the default compilation options, this code will fail to execute. In the Makefile below, I demonstrate two approaches to modifying symbols: modifying after the .a is generated, and modifying after the .so is generated:
CC = gcc
CFLAGS = -fPIC -O2 -Wall
AR = ar
ARFLAGS = rcs
OBJS = add.o
STATIC = libmylib.a
SHARED = libmylib.so
.PHONY: all clean
all: main
add.o: add.c
$(CC) $(CFLAGS) -c $< -o $@
$(STATIC): $(OBJS)
$(AR) $(ARFLAGS) $@ $^
# It is only effective when modified here.
objcopy -v --redefine-sym add=scipy_add $(STATIC)
# scipy_add is found
# nm $(STATIC) | grep add
# scipy_add is not found
# nm -D $(STATIC) | grep add
$(SHARED): $(STATIC)
$(CC) -shared -o $@ -Wl,--whole-archive $< -Wl,--no-whole-archive
# Modifying the symbol here will not allow it to be used by the dynamic library.
# objcopy -v --redefine-sym add=scipy_add $(SHARED)
# scipy_add is found
# nm $(SHARED) | grep sci
# scipy_add is not found
# nm -D $(SHARED) | grep sci
main.o: main.c
$(CC) -c main.c -o main.o
main: main.o $(SHARED)
$(CC) -o $@ main.o ./$(SHARED) -Wl,-rpath,.
clean:
rm -f $(OBJS) main.o $(STATIC) $(SHARED) mainIn the first case, you can directly test the usability of symbol modification using make. For the second case, please comment out the modification in the static library section, then uncomment the dynamic library section below, and then use make.
Here are the outputs for the first and second cases respectively:
$ make
gcc -c main.c -o main.o
gcc -fPIC -O2 -Wall -c add.c -o add.o
ar rcs libmylib.a add.o
objcopy -v --redefine-sym add=scipy_add libmylib.a
copy from `libmylib.a(add.o)' [elf64-x86-64] to `stLbGnlZ/add.o' [elf64-x86-64]
gcc -shared -o libmylib.so -Wl,--whole-archive libmylib.a -Wl,--no-whole-archive
gcc -o main main.o ./libmylib.so -Wl,-rpath,.
$ ./main
3$ make
gcc -c main.c -o main.o
gcc -fPIC -O2 -Wall -c add.c -o add.o
ar rcs libmylib.a add.o
gcc -shared -o libmylib.so -Wl,--whole-archive libmylib.a -Wl,--no-whole-archive
objcopy -v --redefine-sym add=scipy_add libmylib.so
copy from `libmylib.so' [elf64-x86-64] to `stjE2E8K' [elf64-x86-64]
gcc -o main main.o ./libmylib.so -Wl,-rpath,.
/usr/bin/ld: main.o: in function `main':
main.c:(.text+0xf): undefined reference to `scipy_add'
collect2: error: ld returned 1 exit status
make: *** [Makefile:46: main] Error 1Actually, for the second case:
objcopy -redefine-symswill only redefine the debugging symbols in the .symtab and .strtab symbol tables, not the symbols in the .dynsym and .dynstr symbol tables that is used for dynamic loading.
The quote above comes from here: https://stackoverflow.com/questions/54332797/binding-failure-with-objcopy-redefine-syms
Using nm $(SHARED) | grep sci in the Makefile, we can indeed confirm that the situation is exactly as stated in the quote above:
nm libmylib.so | grep sci
0000000000000390 T scipy_addIf we use nm -D, we won't get scipy_add, indicating that the .so's symbol table for dynamic libraries doesn't contain this symbol, which is precisely why the second case fails to compile.
The situation above shows that objcopy cannot properly handle symbol renaming directly on .so files.
In fact, the operation of modifying symbols after the static library is exactly what OpenBLAS's make does, as seen at https://github.com/OpenMathLib/OpenBLAS/blob/develop/exports/Makefile#L194
In CMake, however, it modifies directly on the dynamic library, as seen at https://github.com/OpenMathLib/OpenBLAS/blob/develop/CMakeLists.txt#L558
This situation is also explained here:
SYMBOLSUFFIXdoesn't seems to work in CMAKE build system OpenMathLib/OpenBLAS#3998 (comment)SYMBOLSUFFIXdoesn't seems to work in CMAKE build system OpenMathLib/OpenBLAS#3998 (comment)
Originally, CMake's approach of building the static library first and then the dynamic library was workable, but a patch was introduced to skip building the static library and build the dynamic library directly. This led to symbol modification being performed directly on the .so, which caused this symbol modification error.
So why does CMake work on Windows on ARM? I suspect it may be related to this line: https://github.com/OpenMathLib/OpenBLAS/blob/develop/CMakeLists.txt#L357. If I'm wrong, please feel free to correct me promptly.
Going back to the CI error, I specifically added some debugging information:
2025-11-04T07:08:22.7312704Z ++ grep openblas_get_config objcopy.def
2025-11-04T07:08:22.7325454Z openblas_get_config scipy_openblas_get_config64_
2025-11-04T07:08:22.7329935Z ++ nm lib/libscipy_openblas64__64.so
2025-11-04T07:08:22.7332106Z ++ grep openblas_get_config
2025-11-04T07:08:22.7982530Z 000000000035d870 T scipy_openblas_get_config64_
2025-11-04T07:08:22.7986191Z ++ nm -D lib/libscipy_openblas64__64.so
2025-11-04T07:08:22.7988151Z ++ grep openblas_get_config
2025-11-04T07:08:22.8476829Z 000000000035d870 T openblas_get_configFor the information above, simply put: when building OpenBLAS for x64 on Linux, when using objcopy, we cannot modify the symbols in the dynamic library section, causing Python to throw libscipy_openblas64__64.so: undefined symbol: scipy_openblas_get_config64_ when importing this .so file and using it.
3. My Main Questions
Actually, this bug has already been reported in OpenBLAS, see OpenMathLib/OpenBLAS#3998.
However, although the issue was marked as closed, I'm still encountering this bug today. Therefore, I believe we should seek experts who handle CMake builds of OpenBLAS on Windows, ask how CMake solves this problem on Windows, and seek methods to provide support for Linux as well, so that Linux can also use CMake to complete the build.
If this symbol-related issue is resolved in the future, we still cannot directly apply CMake because there are certain differences between CMake and Make. I deliberately ignored this part of the content here, but when providing it to numpy/scipy, we may need to consider these situations.
If there's anything incorrect in my previous content, please feel free to point it out promptly and I will make the relevant corrections immediately. Thank you in advance to everyone who provides help.
4. other info
Co-authored by: nijincheng@iscas.ac.cn;