[WIP] ENH: Resample additional arrays apart from X and y #463

glemaitre · 2018-08-27T23:03:06Z

Implement the last point of #462 and should be merged after it.
Partially addressing #460

pep8speaks · 2018-08-27T23:03:28Z

Hello @glemaitre! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 29, 2018 at 09:59 Hours UTC

glemaitre · 2018-08-28T06:28:02Z

Sampling extra arrays could be easy in the case of prototype selection.
One can just peak the weight of the selected samples.

However, what would be a good and meaningful default when new sample weight need to be created.

Right now, I created an *arrays sequence but we might interested to only limit to sample_weight since the creation of new instances would make sense only if we know what are we creating. Up-sampling in arrays that we don't know anything about it could be weird.

glemaitre · 2018-08-28T07:17:31Z

Ups I forgot to ping @jnothman in my last comment

…mpling

codecov · 2018-08-29T09:59:27Z

Codecov Report

Merging #463 into master will decrease coverage by 0.35%.
The diff coverage is 94.97%.

@@            Coverage Diff             @@
##           master     #463      +/-   ##
==========================================
- Coverage   98.92%   98.57%   -0.36%     
==========================================
  Files          85       75      -10     
  Lines        5324     4633     -691     
==========================================
- Hits         5267     4567     -700     
- Misses         57       66       +9

Impacted Files	Coverage Δ
imblearn/pipeline.py	`97.07% <ø> (+2.1%)`	⬆️
imblearn/utils/estimator_checks.py	`96.62% <100%> (-0.46%)`	⬇️
imblearn/ensemble/_balance_cascade.py	`100% <100%> (ø)`	⬆️
...ling/_prototype_selection/_random_under_sampler.py	`100% <100%> (ø)`	⬆️
...nder_sampling/_prototype_selection/_tomek_links.py	`100% <100%> (ø)`	⬆️
...rototype_selection/_condensed_nearest_neighbour.py	`100% <100%> (ø)`	⬆️
imblearn/over_sampling/_random_over_sampler.py	`100% <100%> (ø)`	⬆️
imblearn/combine/_smote_tomek.py	`100% <100%> (ø)`	⬆️
...rototype_selection/_neighbourhood_cleaning_rule.py	`100% <100%> (ø)`	⬆️
imblearn/combine/_smote_enn.py	`100% <100%> (ø)`	⬆️
... and 73 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ffdde80...7a8fad0. Read the comment docs.

jnothman

Btw, I don't think returning non-X,y will work with the current handling of Pipeline.fit's kwargs. We really need sample prop routing to handle that case. Currently, the handling would be unambiguous if one of:

the resampler is the last step, in which case we return any additional sample props like weights
the resampler is the second-last step, and there is no fit_param called last_step_name__sample_weight, in which case we pass all sample props into the last step's fit, I think.

Otherwise, it's unclear where to pass the returned sample_weight given that **fit_params intends to prescribe this at the time Pipeline.fit is called.

jnothman · 2018-09-03T03:06:39Z

imblearn/base.py

+        y_resampled : ndarray, shape (n_samples_new,)
            The corresponding label of `X_resampled`.

+        sample_weight_resampled : ndarray, shape (n_samples_new,)


I would rather have a dict of non-X,y returned. (Optionally? In scikit-learn I would rather this be mandatory so we don't need to handle both cases.)

jnothman · 2018-09-03T03:08:21Z

imblearn/base.py

+            ``sample_weight`` was not ``None``.
+
+        idx_resampled : ndarray, shape (n_samples_new,)
+            Indices of the selected features. This output is optional and only


Do you mean the selected samples?

jnothman · 2018-09-03T03:08:27Z

imblearn/base.py

+            Resampled sample weights. This output is returned only if
+            ``sample_weight`` was not ``None``.
+
+        idx_resampled : ndarray, shape (n_samples_new,)


Could you explain why this should be returned from fit_resample, rather than stored as an attribute?

I think that it was some original design (before it was in scikit-learn). But actually it would be better to keep it as an attribute with the single fit_resample.

glemaitre · 2018-09-03T09:46:23Z

Otherwise, it's unclear where to pass the returned sample_weight given that **fit_params intends to prescribe this at the time Pipeline.fit is called.

If I understand well and from what I could see, Pipeline does not support sample_weight right now. But in the meanwhile, do you recommend to add a fit_resample(X, y, **sample_props) signature and return a dict sample_props.

jnothman · 2018-09-03T11:18:55Z

You're right Pipeline does not really support sample_weight now... I think supporting returning it from Pipeline.fit_resample makes sense.

chkoar · 2020-07-29T09:04:08Z

@glemaitre #460 is closed but #457 is still open and probably relevant. Could we close this PR in favor of new one in the future? It is two years old.

glemaitre added 11 commits August 27, 2018 15:45

API: define fit_resample only without any fit

0ac2c92

PEP8

c872679

DOC: add whats new entry

bbecd30

DOC add issue number

f7120d8

PEP8 examples

00f8e44

DOC fix import

9046477

iter

8b3aa50

TST remove sample in pipeline

24fd62d

TST: make sure samplers common test are run

59725c7

PEP8

6189206

EHN: resample additional arrays apart from X and y

61f53a7

glemaitre changed the title ~~EHN: resample additional arrays apart from X and y~~ [WIP] EHN: resample additional arrays apart from X and y Aug 27, 2018

glemaitre added 5 commits August 28, 2018 14:25

FIX only consider sample_weight

8f86d98

Merge remote-tracking branch 'origin/master' into is/sample_weight_sa…

38526bc

…mpling

iter

9f600fd

iter

2b8ab83

EXA fix fake sampler in example

7a8fad0

jnothman reviewed Sep 3, 2018

View reviewed changes

glemaitre force-pushed the master branch 2 times, most recently from bbf2b12 to 513203c Compare September 7, 2018 13:26

glemaitre force-pushed the master branch from 3e305f8 to 59d9b4d Compare January 19, 2019 16:56

chkoar changed the title ~~[WIP] EHN: resample additional arrays apart from X and y~~ [WIP] ENH: Resample additional arrays apart from X and y Jun 12, 2019

glemaitre force-pushed the master branch 2 times, most recently from f1bc189 to 8f87307 Compare June 28, 2019 12:32

glemaitre force-pushed the master branch 3 times, most recently from eae6c6b to ffdde80 Compare June 28, 2019 13:52

glemaitre force-pushed the master branch from 65132db to 68123d0 Compare November 8, 2019 22:54

chkoar mentioned this pull request Feb 5, 2020

[MRG] Better in-out support #681

Merged

chkoar force-pushed the master branch from 4a201cd to 0eb9033 Compare June 20, 2020 02:58

glemaitre force-pushed the master branch from f8347ad to 56eefdf Compare September 29, 2021 16:10

glemaitre force-pushed the master branch from 3228f8a to 7e94390 Compare October 21, 2021 20:41

[WIP] ENH: Resample additional arrays apart from X and y #463

Are you sure you want to change the base?

[WIP] ENH: Resample additional arrays apart from X and y #463

Uh oh!

Conversation

glemaitre commented Aug 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Aug 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on August 29, 2018 at 09:59 Hours UTC

Uh oh!

glemaitre commented Aug 28, 2018

Uh oh!

glemaitre commented Aug 28, 2018

Uh oh!

codecov bot commented Aug 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman Sep 3, 2018

Choose a reason for hiding this comment

Uh oh!

jnothman Sep 3, 2018

Choose a reason for hiding this comment

Uh oh!

glemaitre Sep 3, 2018

Choose a reason for hiding this comment

Uh oh!

jnothman Sep 3, 2018

Choose a reason for hiding this comment

Uh oh!

glemaitre Sep 3, 2018

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Sep 3, 2018

Uh oh!

jnothman commented Sep 3, 2018

Uh oh!

chkoar commented Jul 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

glemaitre commented Aug 27, 2018 •

edited

Loading

pep8speaks commented Aug 27, 2018 •

edited

Loading

codecov bot commented Aug 29, 2018 •

edited

Loading