Skip to content

Refactor: Use CMake instead of Make #231

@ffgan

Description

@ffgan

Hi there

I've recently been trying to use CMake instead of Make to build OpenBLAS, while preserving the original implementation logic as much as possible. However, I've encountered a legacy issue related to symbol renaming.

Below is a detailed explanation.

I'm only considering the build issue for x64 on Linux, using CMake to replace Make. Other scenarios are not being considered for now.

Overall, this can be divided into two parts: the first part is switching to CMake, and the second part is the error that occurred.

1. Switching to CMake

I mainly referred to OpenBLAS's CI script at OpenBLAS's .github/workflows/dynamic_arch.yml

By simply modifying the parameters, I was able to get a CMake version of the build. There's not much special to explain. For detailed modifications, you can check this commit, and the CI results are here.

CMake is not completely consistent with Make - for example, certain folder locations, library names, etc. We'll set that aside for now. We're only considering building OpenBLAS that can be used to build wheels, where the built wheel can import the OpenBLAS .so file and read the symbols inside.

After the build completed, when creating the wheel and testing it, I got the following error:

  + python3.9 -m scipy_openblas64
  Traceback (most recent call last):
    File "/opt/python/cp39-cp39/lib/python3.9/runpy.py", line 188, in _run_module_as_main
      mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
    File "/opt/python/cp39-cp39/lib/python3.9/runpy.py", line 147, in _get_module_details
      return _get_module_details(pkg_main_name, error)
    File "/opt/python/cp39-cp39/lib/python3.9/runpy.py", line 111, in _get_module_details
      __import__(pkg_name)
    File "/tmp/tmp.SHQxv7Je9g/venv/lib/python3.9/site-packages/scipy_openblas64/__init__.py", line 140, in <module>
      get_openblas_config()
    File "/tmp/tmp.SHQxv7Je9g/venv/lib/python3.9/site-packages/scipy_openblas64/__init__.py", line 134, in get_openblas_config
      openblas_config = dll.scipy_openblas_get_config64_
    File "/opt/python/cp39-cp39/lib/python3.9/ctypes/__init__.py", line 387, in __getattr__
      func = self.__getitem__(name)
    File "/opt/python/cp39-cp39/lib/python3.9/ctypes/__init__.py", line 392, in __getitem__
      func = self._FuncPtr((name_or_ordinal, self))
  AttributeError: /tmp/tmp.SHQxv7Je9g/venv/lib/python3.9/site-packages/scipy_openblas64/lib64/libscipy_openblas64__64.so: undefined symbol: scipy_openblas_get_config64_
                                                              ✕ 6.46s

This error is explained in detail below. Simply put, this issue is related to the symbol renaming we need to perform, and this problem has not yet been resolved on Linux.

2. objcopy can't modify symbols used for dynamic loading in .so Files

Our requirement is to modify OpenBLAS symbols by adding a scipy_ prefix to them.

To simplify the situation in OpenBLAS, I created a very simple example to illustrate the problem.

In the example below, I'll demonstrate how to modify symbols.

$ tree .
.
├── add.c
├── add.h
├── main.c
└── Makefile

1 directory, 4 files

We have four files as shown above.
add.c only contains the definition of add:

int add(int a, int b)
{
    return a + b;
}

add.h is as follows:

int add(int, int);

// The following function is not implemented in add.c; it is achieved by modifying the symbol table.
int scipy_add(int, int);

For main.c, we expect to use scipy_add:

#include <stdio.h>
#include "add.h"

int main()
{
    printf("%d\n", scipy_add(1, 2));

    return 0;
}

As you can see, if we use the default compilation options, this code will fail to execute. In the Makefile below, I demonstrate two approaches to modifying symbols: modifying after the .a is generated, and modifying after the .so is generated:

CC = gcc
CFLAGS = -fPIC -O2 -Wall
AR = ar
ARFLAGS = rcs

OBJS = add.o
STATIC = libmylib.a
SHARED = libmylib.so

.PHONY: all clean

all: main

add.o: add.c
	$(CC) $(CFLAGS) -c $< -o $@


$(STATIC): $(OBJS)
	$(AR) $(ARFLAGS) $@ $^

#   It is only effective when modified here.
	objcopy -v --redefine-sym add=scipy_add $(STATIC)

#   scipy_add is found
# 	nm  $(STATIC)  | grep add

#   scipy_add is not found
# 	nm -D  $(STATIC) | grep add

$(SHARED): $(STATIC)
	$(CC) -shared -o $@ -Wl,--whole-archive $< -Wl,--no-whole-archive

# 	Modifying the symbol here will not allow it to be used by the dynamic library.
# 	objcopy -v --redefine-sym add=scipy_add $(SHARED)  

#   scipy_add is found
# 	nm  $(SHARED) | grep sci 

#   scipy_add is not found
# 	nm -D  $(SHARED) | grep sci

main.o: main.c
	$(CC) -c main.c -o main.o

main: main.o $(SHARED)
	$(CC) -o $@ main.o ./$(SHARED) -Wl,-rpath,.

clean:
	rm -f $(OBJS) main.o $(STATIC) $(SHARED) main

In the first case, you can directly test the usability of symbol modification using make. For the second case, please comment out the modification in the static library section, then uncomment the dynamic library section below, and then use make.

Here are the outputs for the first and second cases respectively:

$ make 
gcc -c main.c -o main.o
gcc -fPIC -O2 -Wall -c add.c -o add.o
ar rcs libmylib.a add.o
objcopy -v --redefine-sym add=scipy_add libmylib.a
copy from `libmylib.a(add.o)' [elf64-x86-64] to `stLbGnlZ/add.o' [elf64-x86-64]
gcc -shared -o libmylib.so -Wl,--whole-archive libmylib.a -Wl,--no-whole-archive
gcc -o main main.o ./libmylib.so -Wl,-rpath,.
$ ./main 
3
$ make
gcc -c main.c -o main.o
gcc -fPIC -O2 -Wall -c add.c -o add.o
ar rcs libmylib.a add.o
gcc -shared -o libmylib.so -Wl,--whole-archive libmylib.a -Wl,--no-whole-archive
objcopy -v --redefine-sym add=scipy_add libmylib.so  
copy from `libmylib.so' [elf64-x86-64] to `stjE2E8K' [elf64-x86-64]
gcc -o main main.o ./libmylib.so -Wl,-rpath,.
/usr/bin/ld: main.o: in function `main':
main.c:(.text+0xf): undefined reference to `scipy_add'
collect2: error: ld returned 1 exit status
make: *** [Makefile:46: main] Error 1

Actually, for the second case:

objcopy -redefine-syms will only redefine the debugging symbols in the .symtab and .strtab symbol tables, not the symbols in the .dynsym and .dynstr symbol tables that is used for dynamic loading.

The quote above comes from here: https://stackoverflow.com/questions/54332797/binding-failure-with-objcopy-redefine-syms

Using nm $(SHARED) | grep sci in the Makefile, we can indeed confirm that the situation is exactly as stated in the quote above:

nm  libmylib.so | grep sci 
0000000000000390 T scipy_add

If we use nm -D, we won't get scipy_add, indicating that the .so's symbol table for dynamic libraries doesn't contain this symbol, which is precisely why the second case fails to compile.

The situation above shows that objcopy cannot properly handle symbol renaming directly on .so files.

In fact, the operation of modifying symbols after the static library is exactly what OpenBLAS's make does, as seen at https://github.com/OpenMathLib/OpenBLAS/blob/develop/exports/Makefile#L194

In CMake, however, it modifies directly on the dynamic library, as seen at https://github.com/OpenMathLib/OpenBLAS/blob/develop/CMakeLists.txt#L558

This situation is also explained here:

  1. SYMBOLSUFFIX doesn't seems to work in CMAKE build system OpenMathLib/OpenBLAS#3998 (comment)
  2. SYMBOLSUFFIX doesn't seems to work in CMAKE build system OpenMathLib/OpenBLAS#3998 (comment)

Originally, CMake's approach of building the static library first and then the dynamic library was workable, but a patch was introduced to skip building the static library and build the dynamic library directly. This led to symbol modification being performed directly on the .so, which caused this symbol modification error.

So why does CMake work on Windows on ARM? I suspect it may be related to this line: https://github.com/OpenMathLib/OpenBLAS/blob/develop/CMakeLists.txt#L357. If I'm wrong, please feel free to correct me promptly.

Going back to the CI error, I specifically added some debugging information:

2025-11-04T07:08:22.7312704Z ++ grep openblas_get_config objcopy.def
2025-11-04T07:08:22.7325454Z openblas_get_config scipy_openblas_get_config64_
2025-11-04T07:08:22.7329935Z ++ nm lib/libscipy_openblas64__64.so
2025-11-04T07:08:22.7332106Z ++ grep openblas_get_config
2025-11-04T07:08:22.7982530Z 000000000035d870 T scipy_openblas_get_config64_
2025-11-04T07:08:22.7986191Z ++ nm -D lib/libscipy_openblas64__64.so
2025-11-04T07:08:22.7988151Z ++ grep openblas_get_config
2025-11-04T07:08:22.8476829Z 000000000035d870 T openblas_get_config

For the information above, simply put: when building OpenBLAS for x64 on Linux, when using objcopy, we cannot modify the symbols in the dynamic library section, causing Python to throw libscipy_openblas64__64.so: undefined symbol: scipy_openblas_get_config64_ when importing this .so file and using it.

3. My Main Questions

Actually, this bug has already been reported in OpenBLAS, see OpenMathLib/OpenBLAS#3998.

However, although the issue was marked as closed, I'm still encountering this bug today. Therefore, I believe we should seek experts who handle CMake builds of OpenBLAS on Windows, ask how CMake solves this problem on Windows, and seek methods to provide support for Linux as well, so that Linux can also use CMake to complete the build.

If this symbol-related issue is resolved in the future, we still cannot directly apply CMake because there are certain differences between CMake and Make. I deliberately ignored this part of the content here, but when providing it to numpy/scipy, we may need to consider these situations.

If there's anything incorrect in my previous content, please feel free to point it out promptly and I will make the relevant corrections immediately. Thank you in advance to everyone who provides help.

4. other info

Co-authored by: nijincheng@iscas.ac.cn;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions