-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Refactor: centralize lookup and named type helpers in SemanticAnalyzer (#4157) #20136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rchristof
wants to merge
3
commits into
python:master
Choose a base branch
from
rchristof:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+106
−61
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This refactor extracts the core logic for resolving fully qualified names
into a new private helper _lookup_fully_qualified(), and rewrites
lookup_fully_qualified() / lookup_fully_qualified_or_none() as thin
wrappers around it.
Before this change:
lookup_fully_qualified_or_none() in semanal.py directly implemented
a large amount of lookup logic with two separate code paths
("exact module match" vs "longest loaded prefix"). This made it harder
to maintain and reason about.
lookup_fully_qualified() just called the above and asserted non-None,
so the only authoritative behavior lived inside one big public method.
After this change:
_lookup_fully_qualified(fullname, allow_missing) is now the single
internal implementation of fully qualified lookup:
- It finds the longest prefix of the name that exists in self.modules.
- It walks the remaining components through module / class symbol tables.
- It preserves record_incomplete_ref() calls for incomplete namespaces.
- It validates input and handles partial / nested names consistently.
lookup_fully_qualified() now just calls the helper in "strict" mode
and asserts the result is not None.
lookup_fully_qualified_or_none() now just calls the helper in
"permissive" mode and returns the result (or None).
Why this matters:
The public API remains unchanged:
- lookup_fully_qualified(name) -> SymbolTableNode
- lookup_fully_qualified_or_none(name) -> SymbolTableNode | None
The duplicated traversal logic is no longer embedded in a public method.
Instead, there is a single internal source of truth with a documented
contract (_lookup_fully_qualified is explicitly not plugin API).
This makes it easier to migrate other ad-hoc lookup sites to use this
common logic, which is one of the goals in issue python#4157 ("too many lookup
helpers across the semantic analyzer / checker leads to confusion").
Behavioral note:
When the fully qualified name refers exactly to a module
(e.g. "pkg.mod"), _lookup_fully_qualified() now returns None instead
of attempting to treat the module object itself as a definitional
symbol via its parent package.
This matches the intent of the ongoing lookup cleanup work (avoid
treating modules like regular definitions), but I'm calling it out in
case external plugins depended on the old quirk.
This is an incremental, internal refactor toward python#4157. No plugin-facing
API signatures were changed.
This refactor extracts the common logic for constructing Instance
objects from fully qualified type names into a private helper
_build_named_instance(), and rewrites named_type(),
named_type_or_none(), and builtin_type() to delegate to it.
Before this change:
named_type() and named_type_or_none() each reimplemented the same
core steps:
- resolve a fully qualified name using either
lookup_fully_qualified() (strict) or
lookup_fully_qualified_or_none() (permissive),
- reject unresolved symbols / placeholders,
- optionally unwrap a TypeAlias to get to the underlying TypeInfo,
- build an Instance, synthesizing default type arguments with a
particular AnyType flavor (special_form vs unannotated).
builtin_type() just wrapped named_type(), obscuring the fact that
it's semantically "strict + use special_form AnyType defaults".
This meant the same idea ("turn a fully qualified name into an Instance")
was duplicated in several places, with subtle differences that were not
obvious at the call site.
After this change:
_build_named_instance(fullname, args, *, allow_missing, any_flavor,
unwrap_alias) becomes the single internal implementation of that logic:
- allow_missing controls whether we allow unresolved names
(mirrors strict vs permissive lookup).
- any_flavor selects which AnyType flavor to use for synthesized
type arguments when args is not provided.
- unwrap_alias controls whether a TypeAlias target should be
unwrapped to its underlying TypeInfo (this matches the historical
behavior of named_type_or_none()).
named_type() now calls the helper with
allow_missing=False, any_flavor=TypeOfAny.special_form,
unwrap_alias=False, and asserts that a result exists.
named_type_or_none() calls it with
allow_missing=True, any_flavor=TypeOfAny.unannotated,
unwrap_alias=True, and returns None if the symbol couldn't be
resolved or is still a placeholder.
builtin_type() now directly calls the helper with the same strict
parameters as named_type(), making its semantics explicit while
keeping the same behavior.
Why this matters:
The public helpers keep the same signatures and the same intended
contracts:
- named_type(fullname, args?) -> Instance
- named_type_or_none(fullname, args?) -> Instance | None
- builtin_type(fullname) -> Instance
The differences between them (strict vs permissive lookup, alias
unwrapping, AnyType flavor for default arguments) are now explicit in
how they call _build_named_instance() instead of being duplicated in
multiple ad hoc implementations.
Future changes to how we resolve names or synthesize default type
arguments only need to happen in one place.
This continues the cleanup called out in issue python#4157: reducing the number
of slightly different lookup/type helpers and making the remaining ones
more uniform. _build_named_instance() is internal-only and not part of
the plugin API surface.
for more information, see https://pre-commit.ci
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is an incremental cleanup in
SemanticAnalyzerto move toward issue #4157 ("too many lookup helpers, too many subtly different code paths"). It does two things:Centralizes fully qualified name lookup logic.
Centralizes
Instanceconstruction from named types.Both changes follow the same pattern:
Extract the real logic into a single private helper (not plugin API).
Turn the existing public helpers into thin, intention-revealing wrappers.
Keep public signatures and behavior stable as much as possible.
The goal is to reduce duplication and make future cleanup / movement into
mypy/lookup.pyeasier without breaking callers immediately.Part 1: Fully qualified lookup refactor
Previously:
lookup_fully_qualified_or_none()insemanal.pycontained a large block of logic that:tried to resolve a fully qualified name by walking
self.modules,handled partially loaded / incomplete namespaces and called
record_incomplete_ref(),walked into
TypeInfomembers for nested attributes,and had two different branches for "the module name matches directly" vs "peel components until we find the longest loaded prefix".
lookup_fully_qualified()just called that function and asserted the result was notNone.Now:
A new private helper
_lookup_fully_qualified(fullname, *, allow_missing)implements the traversal logic in one place.lookup_fully_qualified()becomes a thin "strict" wrapper: it calls the helper withallow_missing=False, asserts non-None, and returns the node.lookup_fully_qualified_or_none()becomes a thin "permissive" wrapper: it calls the helper withallow_missing=Trueand returns that result (possiblyNone).Key properties:
The logic that finds the longest loaded module prefix and then walks remaining name parts now lives in one place instead of being duplicated across branches.
The behavior around
is_incomplete_namespace()andrecord_incomplete_ref()is preserved and documented.These wrappers keep their original names and signatures, so existing code and plugins can continue to call them.
Behavioral note:
"pkg.mod"),_lookup_fully_qualified()now returnsNoneinstead of treating the module object itself as a symbol defined in the parent package. This matches the direction of the ongoing lookup cleanup (avoid treating modules like regular definitions), but I'm calling it out in case something relied on the old behavior.This puts us in a place where future work can move the core logic into
mypy/lookup.pyand update other ad-hoc lookups to use it.Part 2: Named type / Instance construction refactor
Previously:
named_type(),named_type_or_none(), andbuiltin_type()each reimplemented very similar logic:resolve the symbol with
lookup_fully_qualified()orlookup_fully_qualified_or_none(),handle
PlaceholderNode,sometimes unwrap
TypeAliastargets to aTypeInfo,build an
Instance, filling in default type arguments using a particularAnyTypeflavor (TypeOfAny.special_formvsTypeOfAny.unannotated).The differences between these helpers (strict vs permissive, unwrap alias or not, which AnyType flavor to use) were spread out and not very explicit at the call site.
Now:
A new private helper
_build_named_instance(fullname, args, *, allow_missing, any_flavor, unwrap_alias)is the single implementation of that logic.allow_missingmirrors strict (named_type) vs permissive (named_type_or_none) lookup.unwrap_aliascontrols whether we unwrap aTypeAliasto get the underlyingTypeInfo(only theor_nonevariant historically did this).any_flavorselects whichAnyTypeflavor to use when no explicit type arguments are provided.named_type()now calls this helper with strict parameters and asserts a result exists.named_type_or_none()calls it permissively and may returnNone.builtin_type()now also calls the helper directly instead of indirectly delegating throughnamed_type().Key properties:
The public APIs are unchanged:
named_type(fullname, args?) -> Instancenamed_type_or_none(fullname, args?) -> Instance | Nonebuiltin_type(fullname) -> InstanceThe differences between these helpers are now expressed as parameters to the shared implementation instead of being duplicated in three places.
Future fixes to how we synthesize type arguments or unwrap aliases now only need to happen in one function.
Why this helps #4157
Issue #4157 calls out that there are ~9 different lookup / type access helpers across the analyzer/checker, with slightly different semantics. That makes it easy for behavior to drift and hard for plugin authors (and even core devs) to know which one is "correct."
This PR moves us toward a world where:
there is one authoritative code path for fully qualified lookups,
there is one authoritative code path for turning a qualified name into an
Instance,the public helpers become stable, intention-revealing wrappers,
and we can start migrating other places in the codebase to use these helpers instead of reimplementing their own traversal logic.
This should make future steps (for example, moving more of this logic into
mypy/lookup.py) much easier to review and reason about.Scope / compatibility
No public helper was removed or renamed.
Signatures of
lookup_fully_qualified(),lookup_fully_qualified_or_none(),named_type(),named_type_or_none(), andbuiltin_type()are unchanged._lookup_fully_qualified()and_build_named_instance()are documented as internal-only and not part of the plugin API.The only semantic difference worth flagging is that exact-module names (like
"pkg.mod") now returnNonefrom the fully qualified lookup helper instead of trying to return the module symbol through the parent package. This aligns with the new lookup direction but is called out here explicitly for review.Follow-up work
With these helpers in place, future cleanups can:
Update other ad hoc lookup logic in
semanal.py/checker.pyto call these helpers instead of duplicating traversal.Potentially move the shared logic into
mypy/lookup.py, so the analyzer and other passes use a single centralized implementation.This PR intentionally keeps each step reviewable and does not attempt the full migration in one shot.