ENH: stats.quantile: add support for frequency weights #23941

mdhaber · 2025-11-06T04:28:28Z

Reference issue

Closes gh-22794
Supersedes gh-22894. This PR addresses the concerns expressed in gh-22794, is more general (adds weights support for all methods except 'harwell-davis' rather than just inverted_cdf), and include more extensive tests.
data-apis/array-api-extra#494

What does this implement/fix?

This adds support for frequency weights to scipy.stats.quantile.

Results agree with NumPy for method='inverted_cdf'

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
rng = np.random.default_rng(23458924473496)
x = rng.random(100)
weights = rng.random(100)
p = np.linspace(0., 1., 300)
res = stats.quantile(x, p, weights=weights, method='inverted_cdf')
ref = np.quantile(x, p, weights=weights, method='inverted_cdf')
plt.plot(p, res, '-', label='SciPy')
plt.plot(p, res, '--', label='NumPy')
plt.xlabel('p')
plt.ylabel('quantile(p)')
plt.legend()
plt.show()

For non-negative integer weights, stats.quantile(x, p, weights=counts, ...) is equivalent to stats.quantile(np.repeat(x, counts), p, ...).

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
rng = np.random.default_rng(23458924473496)
x = rng.random(10)
counts = rng.integers(10, size=10)
p = np.linspace(0., 1., 10000)
res = stats.quantile(x, p, weights=counts, method='linear')
ref = stats.quantile(np.repeat(x, counts), p, method='linear')
plt.plot(p, res, '-', label='Using weights')
plt.plot(p, res, '--', label='Using repetitions')
plt.xlabel('p')
plt.ylabel('quantile(p)')
plt.legend()
plt.show()

The documentation specifies that weights that do not satisfy the requirements (non-negative integers) are accepted but may not have valid statistical interpretations.

Additional information

Support for weights in scipy.stats has come up several times before, most notably in gh-6931. As mentioned in #22794 (comment), an important discussion point in gh-6931 was the different meanings of "weights" in statistics (e.g., see here), in other statistical software, and possibly within scipy.stats. Here are my responses to my own concerns.

I found twelve functions/classes in scipy.stats that accept a weights argument: gmean, hmean, pmean, cumfreq, relfreq, wasserstein_distance_nd, wasserstein_distance, energy_distance, expectile, combine_pvalues, kde, Mixture. While it's not always documented clearly, it seems that the first nine of these produce valid results when used for frequency weights¹.

import numpy as np
from scipy import stats

rng = np.random.default_rng(285494985394583)
nx, ny = 100, 100
x = rng.integers(1, 11, size=nx) / 10.
y = rng.integers(1, 11, size=ny) / 10.
x_vals, x_weights = np.unique_counts(x)
y_vals, y_weights = np.unique_counts(y)

ref = stats.gmean(x)
res = stats.gmean(x_vals, weights=x_weights)
np.testing.assert_allclose(res, ref)

ref = stats.hmean(x)
res = stats.hmean(x_vals, weights=x_weights)
np.testing.assert_allclose(res, ref)

ref = stats.pmean(x, 3)
res = stats.pmean(x_vals, 3, weights=x_weights)
np.testing.assert_allclose(res, ref)

ref = stats.cumfreq(x)
res = stats.cumfreq(x_vals, weights=x_weights)
np.testing.assert_equal(res, ref)

ref = stats.relfreq(x)
res = stats.relfreq(x_vals, weights=x_weights)
# np.testing.assert_equal(res, ref)  # see gh-23931

ref = stats.wasserstein_distance(x, y)
res = stats.wasserstein_distance(x_vals, y_vals, x_weights, y_weights)
np.testing.assert_equal(res, ref)

ref = stats.energy_distance(x, y)
res = stats.energy_distance(x_vals, y_vals, x_weights, y_weights)
np.testing.assert_equal(res, ref)

ref = stats.expectile(x, 0.7)
res = stats.expectile(x_vals, 0.7, weights=x_weights)
np.testing.assert_equal(res, ref)

The weights parameter of combine_pvalues does not support frequency weights. (I haven't looked into what it is doing.)

ref = stats.combine_pvalues(x[x != 1], method='stouffer')
res = stats.combine_pvalues(x_vals[x_vals != 1], method='stouffer', 
                           weights=x_weights[x_vals != 1])
np.testing.assert_equal(res, ref)  # fails - both statistic and p-value are different
# AssertionError: 
# Items are not equal:
#  ACTUAL: np.float64(0.4734226456871095)
#  DESIRED: np.float64(0.41556093437208813)

I haven't dug into it, but the results of using weights and repeating observations affects the smoothing of gaussian_kde. I'm not sure whether this is intentional - it might be.

import matplotlib.pyplot as plt
x_ = np.linspace(-1, 2, 300)
ref = stats.gaussian_kde(x)(x_)
res = stats.gaussian_kde(x_vals, weights=x_weights)(x_)
# np.testing.assert_allclose(res, ref)
plt.plot(x_, res, x_, ref)  # affects smoothing

Mixture only accepts relative weights that sum to one, and they don't really represent weights of observations at all, so I don't think a "frequency weights" interpretation would be valid.

That said, it is pretty common in statistical software (e.g. SAS, SPSS, Stata) to support different types of weights in different functions, and we are not alone in being imperfect about documenting what kind of weights are implemented².

Frequency weights are currently the most widely supported in scipy.stats, and they will probably remain the most common in the future because:

they are conceptually simple,
it is always possible (although probably not advisable) to add a trivial implementation for counting number-valued weights using repeats,
a more efficient implementation is often easy, and
correctness is easy to verify with property-based testing.

With all that in mind, I think it's OK to implement frequency weights in quantile using the keyword argument weights. I don't think it needs to have a more specific name (Stata uses fweights to distinguish them from pweights, aweights, and iweights.) because we can give the other kinds of weights more specific names if/when the need arises. As for combine_pvalues and other places where weights does NOT refer to frequency weights, we at least need to document how the weights can be interpreted, and we can consider renaming the parameters².

In some cases (e.g. sample mean), multiply weight types may lead to the same weighted version of a statistic. I know that these functions support frequency weights, but the results might still be valid for other types of weighting. ↩
I'll submit a separate PR to improve this, though. ↩ ↩²

mdhaber

This relies on _xp_searchsorted (gh-23930), so that should merge first. I'd also much prefer to get gh-22644 merged first and resolve the merge conflicts here rather than the other way around. There is also a third PR (yet another ECDF function) that I will post shortly; I'd also like to wait until that's in to merge this.

mdhaber · 2025-11-06T06:37:03Z

scipy/stats/_quantile.py

+
+
+@xp_capabilities(skip_backends=[("dask.array", "No take_along_axis yet.")])
+def _xp_searchsorted(x, y, *, side='left', xp=None):


This is being reviewed separately in gh-23930.

mdhaber · 2025-11-06T06:40:26Z

scipy/stats/_quantile.py

+        n_int = xp.asarray(n, dtype=xp.int64)
+        n_int = xp.broadcast_to(n_int, cumulative_weights.shape[:-1] + (1,))
+        total_weight = xp.take_along_axis(cumulative_weights, n_int-1, axis=-1)


For nan_policy='omit' and marray, n is an array the is the number of valid entries in a slice; that's why we can't just take the last element. (marray tests are skipped right now because I'm running into a mutability issue. I think it's an marray issue and the implementation will probably "just work" when that gets resolved because we're already accounting for n depending on the slice.)

mdhaber · 2025-11-06T06:43:39Z

scipy/stats/tests/test_quantile.py

        with pytest.raises(ValueError, match=message):
            stats.quantile(x, xp.asarray([0.5, 0.6]), keepdims=False)

+    def _get_weights_x_rep(self, x, axis, rng):


The purpose of this function is to generate integral weights and use them to produce an x_rep array that is x with elements repeated weights times. The somewhat tricky thing is to ensure that the weights have the same sum along axis so we don't have a ragged x_rep array. I thought about using random_table... maybe that or another function would help here? It felt very clumsy, but I didn't really try to optimize this.

mdhaber · 2025-11-06T06:58:44Z

scipy/stats/_quantile.py

+    Note that for methods ``inverted_cdf`` and ``averaged_inverted_cdf``, only the
+    relative proportions of tied observations (and relative weights) affect the
+    results; for all other methods, the total number of observations (and absolute
+    weights) matter.


I was thinking that if I were to include weights in an example, this is what I'd show.

I think I'll be asked to include the information about coverage conditions, etc., that appears in the NumPy documentation, so I'll address that now: I would prefer not to, but I can add references. I don't think the math underlying the weights needs to be spelled out because it's exactly the same math as repeated samples; it's only the implementation that is generalized. Now, that same generalized implementation happens to work for non-integer weights, too, so if there's a reference for valid statistical interpretations of non-integral weights, I'd be interested in including that.

scipy/stats/tests/test_quantile.py

ogrisel · 2025-11-06T14:12:54Z

I haven't dug into it, but the results of using weights and repeating observations affects the smoothing of gaussian_kde. I'm not sure whether this is intentional - it might be.

I think this is what caused this PR in scikit-learn to get stalled: https://github.com/scikit-learn/scikit-learn/pull/27971/files#r1745872881 and https://github.com/scikit-learn/scikit-learn/pull/27971/files#r1763478046 (I was worried about not being able to implement weight/repetition equivalence in our scikit-learn estimator).

mdhaber · 2025-11-06T14:23:07Z

@ogrisel thanks - replied there. Let's get that addressed.

ogrisel · 2025-11-07T13:54:53Z

I built scipy with this PR's branch and patched scikit-learn to use this instead of our embedded sklearn.utils.stats._weighted_percentile function:

diff --git a/sklearn/utils/stats.py b/sklearn/utils/stats.py
index 453b0ab122..c35a2b4523 100644
--- a/sklearn/utils/stats.py
+++ b/sklearn/utils/stats.py
@@ -90,7 +90,27 @@ def _weighted_percentile(
         If `array` is 2D and `percentile_rank` is 1D, returns a 2D array
             of shape `(array.shape[1], percentile_rank.shape[0])`
     """
+
+    from scipy.stats import quantile
     xp, _, device = get_namespace_and_device(array)
+
+    array = xp.asarray(array, device=device)
+    if not xp.isdtype(array.dtype, "real floating"):
+        array = xp.asarray(array, dtype=xp.asarray([0.0]).dtype, device=device)
+
+    p = xp.asarray(percentile_rank, dtype=array.dtype, device=device) / 100
+    sample_weight = xp.asarray(sample_weight, dtype=array.dtype, device=device)
+
+    results = quantile(
+        array.T if array.ndim == 2 else array,
+        p=p,
+        axis=-1 if array.ndim == 2 else None,
+        nan_policy="omit",
+        weights=sample_weight.T if sample_weight.ndim == 2 else sample_weight,
+        method="averaged_inverted_cdf" if average else "inverted_cdf",
+    )
+    return results
+
     # `sample_weight` should follow `array` for dtypes
     floating_dtype = _find_matching_floating_dtype(array, xp=xp)
     array = xp.asarray(array, dtype=floating_dtype, device=device)

Then I ran the full scikit-learn tests suite. Once #23930 (comment) is fixed, the remaining errors seems to be related to the fact that 0 weight data points are not ignored (as they should according to frequency weight semantics).

In particular, the following scikit-learn test fails:

https://github.com/scikit-learn/scikit-learn/blob/c8a5f807a365a069b4d96da1a84318ce6253eab7/sklearn/utils/tests/test_stats.py#L112-L132

There are two other tests failures, but I suspect they stem from the same cause.

EDIT: to make this more directly related to scipy, here is a minimal reproducer. The frequency semantics seem to be violated for the "(averaged_)inverted_cdf" methods:

>>> from scipy.stats import quantile
>>> import numpy as np
>>> data = np.array([0, 1, 2, 3, 4, 5, 6])
>>> weights = np.array([0, 0, 1, 1, 0, 1, 0])
>>> quantile(data, p=[0.0, 0.5, 1.0], weights=weights, method="averaged_inverted_cdf")
array([1. , 3. , 5.5])
>>> quantile(data[weights != 0], p=[0.0, 0.5, 1.0], method="averaged_inverted_cdf")
array([2., 3., 5.])
>>> quantile(data, p=[0.0, 0.5, 1.0], weights=weights, method="inverted_cdf")
array([0., 3., 5.])
>>> quantile(data[weights != 0], p=[0.0, 0.5, 1.0], method="inverted_cdf")
array([2., 3., 5.])

The "linear" method seems to not be impacted by this bug:

>>> quantile(data, p=[0.0, 0.5, 1.0], weights=weights, method="linear")
array([2., 3., 5.])

mdhaber · 2025-11-07T15:19:01Z

Hmm there is a test with zero weights that compares against NumPy; surprised it didn't find this. I'll take a look. Thanks for the MRE.

mdhaber · 2025-11-07T16:34:59Z

@ogrisel I thought zero-weights were being tested, but apparently the RNG gods spared me the ~~frustration of debugging~~ opportunity to fix that earlier ; ) The simple fix is to treat them like omitted NaNs - move them to the end of the sorted array and don't include them in the count. I didn't include your test case specifically, but updated an existing test. There's an argument for adding a special test for this, though, since it is not testing all the methods.
I noticed but didn't handle the case of all-zero weights yet. I think that will interpolate between the first two elements along the slice instead of returning NaN like it should. A full slice of NaNs with nan_policy='omit' works correctly, so I can generalize that code to fix zero weights, too. However, I'd like to address some other cleanups regarding nan handling at the same time, so if it doesn't break sklearn tests, I'd like to save that for a follow-up rather than complicating this PR.

scipy/stats/tests/test_quantile.py

scipy/stats/_quantile.py

[skip ci]

mdhaber · 2025-11-07T22:40:22Z

Added more extensive tests of zero-weights. Skipped CI for now - when gh-23930 merges, we can merge main and run again, and hopefully then the sklearn tests will pass, too.

mdhaber added 5 commits November 5, 2025 07:25

ENH: stats.quantile: draft support for weights

f82b9fa

TST: stats.quantile: begin to test weights

915083b

TST: stats: improve tests of weights

8d7b0e5

TST: stats.quantile: improve weights tests

ff80c06

ENH: stats.quantile: weights documentation and input validation

133d5b4

mdhaber added scipy.stats enhancement A new feature or improvement labels Nov 6, 2025

TST: stats.quantile: fix test failures

909d13f

mdhaber commented Nov 6, 2025

View reviewed changes

mdhaber mentioned this pull request Nov 6, 2025

ENH: add quantile function with weights support data-apis/array-api-extra#494

Draft

mdhaber commented Nov 6, 2025

View reviewed changes

scipy/stats/tests/test_quantile.py Outdated Show resolved Hide resolved

mdhaber added 2 commits November 6, 2025 08:16

Merge remote-tracking branch 'upstream/main' into quantile_weights

3fd961f

TST: stats.quantile: fix test failures

f3f1b86

ogrisel mentioned this pull request Nov 6, 2025

ENH: stats: frequency weights for gaussian_kde #23943

Open

MAINT: stats.quantile: fix zero weight handling

1684ef4

mdhaber commented Nov 7, 2025

View reviewed changes

scipy/stats/tests/test_quantile.py Show resolved Hide resolved

mdhaber mentioned this pull request Nov 7, 2025

ENH: stats._xp_searchsorted: private vectorized searchsorted #23930

Merged

mdhaber commented Nov 7, 2025

View reviewed changes

scipy/stats/_quantile.py Outdated Show resolved Hide resolved

mdhaber commented Nov 7, 2025

View reviewed changes

scipy/stats/_quantile.py Show resolved Hide resolved

TST: stats.quantile: test zero-weights specifically

2a30214

[skip ci]

Merge branch 'main' into quantile_weights

b7b13b2



		@xp_capabilities(skip_backends=[("dask.array", "No take_along_axis yet.")])
		def _xp_searchsorted(x, y, *, side='left', xp=None):

Uh oh!

ENH: stats.quantile: add support for frequency weights #23941

Are you sure you want to change the base?

ENH: stats.quantile: add support for frequency weights #23941

Conversation

mdhaber commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Additional information

Footnotes

Uh oh!

mdhaber left a comment

Choose a reason for hiding this comment

Uh oh!

mdhaber Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

mdhaber Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdhaber commented Nov 6, 2025

Uh oh!

ogrisel commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdhaber commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdhaber commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdhaber commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mdhaber commented Nov 6, 2025 •

edited

Loading

mdhaber Nov 6, 2025 •

edited

Loading

mdhaber Nov 6, 2025 •

edited

Loading

mdhaber Nov 6, 2025 •

edited

Loading

ogrisel commented Nov 6, 2025 •

edited

Loading

ogrisel commented Nov 7, 2025 •

edited

Loading

mdhaber commented Nov 7, 2025 •

edited

Loading

mdhaber commented Nov 7, 2025 •

edited

Loading

mdhaber commented Nov 7, 2025 •

edited

Loading