Skip to content

Conversation

@icfaust
Copy link
Contributor

@icfaust icfaust commented Oct 23, 2025

Description

DPNP and DPCTL now support the __array_namespace__ attribute for array API. Sklearn's get_namespace contains checks for validation in verifying the same namespace. There is a bit of a dance with get_namespace where it is used before and after validate_data and before and after sklearnex._device_offload.dispatch. This change allows for the array namespace verification in the case that array_api_dispatch is enabled.

@david-cortes-intel I think your reviews with intermixed array types should be formalized and integrated for better quality of the codebase. I would recommend also device intermixing be done. This way we can accelerate #2209 #2201 #2700 and #2654 to minimize maintenance without getting bogged down in a bug hunt impacting larger array API scope (i.e. its likely that other estimators are failing in similar ways and need follow up work).


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
  • I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
  • I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@david-cortes-intel
Copy link
Contributor

@icfaust I'm still seeing the same error message after merging this PR in your RF branch:

import os, sys
os.environ["SCIPY_ARRAY_API"] = "1"
import numpy as np
import dpnp
from sklearnex import config_context, set_config
from sklearnex.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(random_state=123)
Xd = dpnp.array(X, dtype=np.float32, device="cpu")
yd = dpnp.array(y, dtype=np.float32, device="cpu")

set_config(array_api_dispatch=True)
model = RandomForestClassifier(n_estimators=1, max_depth=5).fit(Xd, yd)
model.predict(X[:5])
ValueError: `mode` must be `wrap` or `clip`.Got `raise`.

I cannot reproduce the error with any of the other current array API classes, although I see that previous ones all use arrays of the given class to store fitted attributes like coefficients, whereas forests work quite differently.

@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
azure 80.48% <83.33%> (+<0.01%) ⬆️
github 82.08% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sklearnex/utils/_array_api.py 91.66% <100.00%> (+0.23%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@icfaust
Copy link
Contributor Author

icfaust commented Oct 23, 2025

@icfaust I'm still seeing the same error message after merging this PR in your RF branch:

import os, sys
os.environ["SCIPY_ARRAY_API"] = "1"
import numpy as np
import dpnp
from sklearnex import config_context, set_config
from sklearnex.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(random_state=123)
Xd = dpnp.array(X, dtype=np.float32, device="cpu")
yd = dpnp.array(y, dtype=np.float32, device="cpu")

set_config(array_api_dispatch=True)
model = RandomForestClassifier(n_estimators=1, max_depth=5).fit(Xd, yd)
model.predict(X[:5])
ValueError: `mode` must be `wrap` or `clip`.Got `raise`.

I cannot reproduce the error with any of the other current array API classes, although I see that previous ones all use arrays of the given class to store fitted attributes like coefficients, whereas forests work quite differently.

Please make sure to apply 665b903 as well. My reproduction of this now raises:

TypeError: Multiple namespaces for array inputs: {<module 'dpnp' from 'dpnp/__init__.py'>, <module 'array_api_compat.numpy' from 'array_api_compat/numpy/__init__.py

How about with sklearn: https://github.com/scikit-learn/scikit-learn/blob/c60dae2060/sklearn/decomposition/_base.py#L167 Try inverse_transform using numpy data (That is one of the major array API estimators from their codebase)

@david-cortes-intel
Copy link
Contributor

@icfaust I'm still seeing the same error message after merging this PR in your RF branch:

import os, sys
os.environ["SCIPY_ARRAY_API"] = "1"
import numpy as np
import dpnp
from sklearnex import config_context, set_config
from sklearnex.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(random_state=123)
Xd = dpnp.array(X, dtype=np.float32, device="cpu")
yd = dpnp.array(y, dtype=np.float32, device="cpu")

set_config(array_api_dispatch=True)
model = RandomForestClassifier(n_estimators=1, max_depth=5).fit(Xd, yd)
model.predict(X[:5])
ValueError: `mode` must be `wrap` or `clip`.Got `raise`.

I cannot reproduce the error with any of the other current array API classes, although I see that previous ones all use arrays of the given class to store fitted attributes like coefficients, whereas forests work quite differently.

Please make sure to apply 665b903 as well. My reproduction of this now raises:

TypeError: Multiple namespaces for array inputs: {<module 'dpnp' from 'dpnp/__init__.py'>, <module 'array_api_compat.numpy' from 'array_api_compat/numpy/__init__.py

How about with sklearn: https://github.com/scikit-learn/scikit-learn/blob/c60dae2060/sklearn/decomposition/_base.py#L167 Try inverse_transform using numpy data (That is one of the major array API estimators from their codebase)

I was able to trigger the new error message after merging that commit.

But if this is the intended behavior, then please modify the docs too, since they currently state that it should work:

# Fitted models can be passed array API inputs of a different class

(note that when that doc was written. fitting on array API on CPU and then predicting on numpy would work for the other estimators that support array API)

@icfaust
Copy link
Contributor Author

icfaust commented Oct 23, 2025

/intelci: run

@icfaust
Copy link
Contributor Author

icfaust commented Oct 24, 2025

/intelci: run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants