-
Couldn't load subscription status.
- Fork 1.9k
Unified search string construction between albums and items. #6117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR standardizes search term handling between album and singleton matching in the autotag system. Previously, albums defaulted to metadata when no search query was provided, while singletons would overwrite metadata with any non-empty query. The changes introduce a unified _parse_search_terms_with_fallbacks function that applies consistent logic to both cases and adds explicit type hints to the candidates and item_candidates functions to catch type errors where str|None was accepted instead of the required str.
Key Changes:
- Introduced
_parse_search_terms_with_fallbacksfunction to unify search term parsing logic - Added explicit type hints to
candidatesanditem_candidatesfunctions to enforcestrparameters - Updated documentation to clarify that empty strings are passed when no metadata is available
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| docs/changelog.rst | Documents the standardization of search parameter handling |
| beets/metadata_plugins.py | Adds explicit type hints to candidates and item_candidates functions and updates documentation |
| beets/autotag/match.py | Introduces _parse_search_terms_with_fallbacks and refactors both tag_album and tag_item to use unified search term parsing |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Ensure that custom metadata plugins are updated to match the new named-argument signature for candidates and item_candidates to avoid breakage.
- Consider renaming _parse_search_terms_with_fallbacks to a more concise name (e.g. _choose_search_terms) for better readability.
- Clarify in the docstring how empty strings vs None values are treated by the fallback logic to avoid confusion.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Ensure that custom metadata plugins are updated to match the new named-argument signature for candidates and item_candidates to avoid breakage.
- Consider renaming _parse_search_terms_with_fallbacks to a more concise name (e.g. _choose_search_terms) for better readability.
- Clarify in the docstring how empty strings vs None values are treated by the fallback logic to avoid confusion.
## Individual Comments
### Comment 1
<location> `beets/autotag/match.py:252` </location>
<code_context>
def tag_album(
items,
search_artist: str | None = None,
search_album: str | None = None,
search_ids: list[str] = [],
) -> tuple[str, str, Proposal]:
"""Return a tuple of the current artist name, the current album
name, and a `Proposal` containing `AlbumMatch` candidates.
The artist and album are the most common values of these fields
among `items`.
The `AlbumMatch` objects are generated by searching the metadata
backends. By default, the metadata of the items is used for the
search. This can be customized by setting the parameters.
`search_ids` is a list of metadata backend IDs: if specified,
it will restrict the candidates to those IDs, ignoring
`search_artist` and `search album`. The `mapping` field of the
album has the matched `items` as keys.
The recommendation is calculated from the match quality of the
candidates.
"""
# Get current metadata.
likelies, consensus = get_most_common_tags(items)
cur_artist: str = likelies["artist"]
cur_album: str = likelies["album"]
log.debug("Tagging {} - {}", cur_artist, cur_album)
# The output result, keys are the MB album ID.
candidates: dict[Any, AlbumMatch] = {}
# Search by explicit ID.
if search_ids:
for search_id in search_ids:
log.debug("Searching for album ID: {}", search_id)
if info := metadata_plugins.album_for_id(search_id):
_add_candidate(items, candidates, info)
# Use existing metadata or text search.
else:
# Try search based on current ID.
if info := match_by_id(items):
_add_candidate(items, candidates, info)
rec = _recommendation(list(candidates.values()))
log.debug("Album ID match recommendation is {}", rec)
if candidates and not config["import"]["timid"]:
# If we have a very good MBID match, return immediately.
# Otherwise, this match will compete against metadata-based
# matches.
if rec == Recommendation.strong:
log.debug("ID match.")
return (
cur_artist,
cur_album,
Proposal(list(candidates.values()), rec),
)
# Manually provided search terms or fallbacks.
_search_artist, _search_album = _parse_search_terms_with_fallbacks(
(search_artist, cur_artist),
(search_album, cur_album),
)
log.debug("Search terms: {} - {}", _search_artist, _search_album)
# Is this album likely to be a "various artist" release?
va_likely = (
(not consensus["artist"])
or (_search_artist.lower() in VA_ARTISTS)
or any(item.comp for item in items)
)
log.debug("Album might be VA: {}", va_likely)
# Get the results from the data sources.
for matched_candidate in metadata_plugins.candidates(
items, _search_artist, _search_album, va_likely
):
_add_candidate(items, candidates, matched_candidate)
log.debug("Evaluating {} candidates.", len(candidates))
# Sort and get the recommendation.
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return cur_artist, cur_album, Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** Replace mutable default arguments with None ([`default-mutable-arg`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/default-mutable-arg/))
</issue_to_address>
### Comment 2
<location> `beets/autotag/match.py:340` </location>
<code_context>
def tag_item(
item: Item,
search_artist: str | None = None,
search_title: str | None = None,
search_ids: list[str] | None = None,
) -> Proposal:
"""Find metadata for a single track. Return a `Proposal` consisting
of `TrackMatch` objects.
`search_artist` and `search_title` may be used to override the item
metadata in the search query. `search_ids` may be used for restricting the
search to a list of metadata backend IDs.
"""
# Holds candidates found so far: keys are MBIDs; values are
# (distance, TrackInfo) pairs.
candidates = {}
rec: Recommendation | None = None
# First, try matching by the external source ID.
trackids = search_ids or [t for t in [item.mb_trackid] if t]
if trackids:
for trackid in trackids:
log.debug("Searching for track ID: {}", trackid)
if info := metadata_plugins.track_for_id(trackid):
dist = track_distance(item, info, incl_artist=True)
candidates[info.track_id] = hooks.TrackMatch(dist, info)
# If this is a good match, then don't keep searching.
rec = _recommendation(_sort_candidates(candidates.values()))
if (
rec == Recommendation.strong
and not config["import"]["timid"]
):
log.debug("Track ID match.")
return Proposal(_sort_candidates(candidates.values()), rec)
# If we're searching by ID, don't proceed.
if search_ids:
if candidates:
assert rec is not None
return Proposal(_sort_candidates(candidates.values()), rec)
else:
return Proposal([], Recommendation.none)
# Manually provided search terms or fallbacks.
_search_artist, _search_title = _parse_search_terms_with_fallbacks(
(search_artist, item.artist),
(search_title, item.title),
)
log.debug("Item search terms: {} - {}", _search_artist, _search_title)
# Get and evaluate candidate metadata.
for track_info in metadata_plugins.item_candidates(
item,
_search_artist,
_search_title,
):
dist = track_distance(item, track_info, incl_artist=True)
candidates[track_info.track_id] = hooks.TrackMatch(dist, track_info)
# Sort by distance and return with recommendation.
log.debug("Found {} candidates.", len(candidates))
candidates_sorted = _sort_candidates(candidates.values())
rec = _recommendation(candidates_sorted)
return Proposal(candidates_sorted, rec)
</code_context>
<issue_to_address>
**issue (code-quality):** We've found these issues:
- Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Remove unnecessary else after guard condition ([`remove-unnecessary-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/remove-unnecessary-else/))
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6117 +/- ##
=======================================
Coverage 66.98% 66.99%
=======================================
Files 118 118
Lines 18191 18193 +2
Branches 3079 3079
=======================================
+ Hits 12186 12189 +3
Misses 5346 5346
+ Partials 659 658 -1
🚀 New features to boost your workflow:
|
8f07788 to
7a250ef
Compare
| ) | ||
|
|
||
|
|
||
| def _parse_search_terms_with_fallbacks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing that you still want to use this helper method after reading my comments, I'm now concerned about my ability to understand the code.
I honestly have an issue trying to understand what this function does. I failed to do this by reading the code, so I had to manually run search in order to see what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a simple mp3 track:
$ tag track.mp3
IDv2 tag info for track.mp3
TALB=album
TIT2=titlePreviously
| import | artist search | album/title search | resulting search terms |
|---|---|---|---|
| album | None |
None |
- album |
| album | "Artist" |
"" |
- album |
| album | "" |
"Album" |
- album |
| album | "Artist" |
"Album" |
Artist - Album |
| singleton | None |
None |
- title |
| singleton | "Artist" |
"" |
Artist - title |
| singleton | "" |
"Title" |
- Title |
| singleton | "Artist" |
"Title" |
Artist - Title |
I'm actually surprised to see that album search requires both terms. Apparently, I misunderstood even the simple if not (search_artist and search_album) condition 😁.
Now
| import | artist search | album/title search | resulting search terms |
|---|---|---|---|
| album | None |
None |
- album |
| album | "Artist" |
"" |
Artist - |
| album | "" |
"Album" |
- Album |
| album | "Artist" |
"Album" |
Artist - Album |
| singleton | None |
None |
- title |
| singleton | "Artist" |
"" |
Artist - |
| singleton | "" |
"Title" |
- Title |
| singleton | "Artist" |
"Title" |
Artist - Title |
I take back what I said regarding 'we're keeping the previous functionality'. What you've got here aligns with my expectation: we only need one of the search terms to be present for the search to override item data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I also had issue with understanding the logic especially since it was different in singletons and albums. Have you seen the test I added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did, see the rest of the comments
| search_artist: str | None = None, | ||
| search_album: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we handle only strings here:
search_artist: str,
search_album: str,And update the tag_item and tag_album calls in beets.importer.tasks to explicitly provide empty strings.
| _search_artist, _search_album = _parse_search_terms_with_fallbacks( | ||
| (search_artist, cur_artist), | ||
| (search_album, cur_album), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my comment above, we don't need to handle Nones any more, so this can be simplified to
if not search_artist and not search_album:
search_artist, search_album = cur_artist, cur_album| search_artist: str | None = None, | ||
| search_album: str | None = None, | ||
| search_ids: list[str] = [], | ||
| search_ids: list[str] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice use case for a Sequence with a default of empty tuple:
| search_ids: list[str] | None = None, | |
| search_ids: Sequence[str] = (), |
This way, you can simplify the following:
- if search_ids:
- for search_id in search_ids:
+ for search_id in search_ids:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would than also allow for non list entries. I remember we opted to not use Sequence[str] in some other places because of this reason.
# both valid:
search_ids: Sequence[str] = "fooo"
search_ids: Sequence[str] = ["fooo", "baar"]| def candidates( | ||
| items: Sequence[Item], | ||
| artist: str, | ||
| album: str, | ||
| va_likely: bool, | ||
| ) -> Iterable[AlbumInfo]: | ||
| """Return matching album candidates from all metadata source plugins.""" | ||
| for plugin in find_metadata_source_plugins(): | ||
| yield from plugin.candidates(*args, **kwargs) | ||
| yield from plugin.candidates( | ||
| items=items, | ||
| artist=artist, | ||
| album=album, | ||
| va_likely=va_likely, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with making arguments explicit, but let's delegate them unchanged (using the same plugin.candidates call):
| def candidates( | |
| items: Sequence[Item], | |
| artist: str, | |
| album: str, | |
| va_likely: bool, | |
| ) -> Iterable[AlbumInfo]: | |
| """Return matching album candidates from all metadata source plugins.""" | |
| for plugin in find_metadata_source_plugins(): | |
| yield from plugin.candidates(*args, **kwargs) | |
| yield from plugin.candidates( | |
| items=items, | |
| artist=artist, | |
| album=album, | |
| va_likely=va_likely, | |
| ) | |
| def candidates( | |
| items: Sequence[Item], artist: str, album: str, va_likely: bool | |
| ) -> Iterable[AlbumInfo]: | |
| """Return matching album candidates from all metadata source plugins.""" | |
| for plugin in find_metadata_source_plugins(): | |
| yield from plugin.candidates(items, artist, album, va_likely) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we use trailing commas? https://docs.astral.sh/ruff/rules/missing-trailing-comma/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to not use it if not needed, for verbosity reasons. Scrolling the code does take some time at the end of the day 🤷🏼
| :param item: Track item | ||
| :param artist: Track artist | ||
| :param title: Track title | ||
| :param artist: Track artist, either a search manually provided or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's undo this - the plugin is not supposed to know where this data comes from. This is going to get out of sync given how plugins may extend it (and it probably is given the fromfilename plugin).
| log.debug("Item search terms: {} - {}", search_artist, search_title) | ||
| # Manually provided search terms or fallbacks. | ||
| _search_artist, _search_title = _parse_search_terms_with_fallbacks( | ||
| (search_artist, item.artist), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, item.artist and item.title types are not present. Let's add them:
class LibModel(dbcore.Model["Library"]):
...
path: bytes
artist: str
...
class Item(LibModel):
title: str
Description
Matching albums and singletons has slightly different behavior between singletons and albums.
Albums: When a user does not specify any search query in the manual prompt, the system defaults to using the corresponding value from the metadata.
Singletons: Overwrite metadata by any given noneempty search query.
Additionally I noticed that we have an unintentional type error in the
metadata_plugins.candidatescall. Since*argand**kwargsare not typed they acceptedartistandtitle/albumof typestr|Nonewhile they should only acceptstr.Changes
This PR aligns both methods and uses a common function to parse the search terms. This intentionally couples these two occasions to hopefully prevent drift in the future and makes clear that we use the same logic in these two places.
Added explicit typehints to
candidatesanditem_candidatesfunction.