-
Notifications
You must be signed in to change notification settings - Fork 27
FEAT: Improved Connection String handling in Python #307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changes
Summary
mssql_python/connection_string_parser.pyLines 119-127 119 if incomplete_text:
120 errors.append(f"Incomplete specification: keyword '{incomplete_text}' has no value (missing '=')")
121 # Skip to next semicolon
122 while current_pos < str_len and connection_str[current_pos] != ';':
! 123 current_pos += 1
124 continue
125
126 # Extract and normalize the key
127 key = connection_str[key_start:current_pos].strip().lower()Lines 210-218 210 str_len = len(connection_str)
211
212 # Skip leading whitespace before the value
213 while start_pos < str_len and connection_str[start_pos] in ' \t':
! 214 start_pos += 1
215
216 # If we've consumed the entire string or reached a semicolon, return empty value
217 if start_pos >= str_len:
218 return '', start_posmssql_python/constants.pyLines 568-578 568 try:
569 from mssql_python.logging_config import get_logger
570 from mssql_python.helpers import sanitize_user_input
571 logger = get_logger()
! 572 except ImportError:
! 573 logger = None
! 574 sanitize_user_input = lambda x: str(x)[:50] # Simple fallback
575
576 filtered = {}
577
578 # The rejected list should ideally be empty when used in the normal connection📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.helpers.py: 66.9%
mssql_python.pybind.ddbc_bindings.cpp: 70.7%
mssql_python.pybind.connection.connection.cpp: 74.4%
mssql_python.pybind.connection.connection_pool.cpp: 78.9%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.cursor.py: 83.4%
mssql_python.auth.py: 87.1%
mssql_python.pooling.py: 87.7%
mssql_python.row.py: 90.9%
mssql_python.exceptions.py: 92.8%🔗 Quick Links
|
…ers and improving exception management
…nd normalizing test cases
…d updating filtering logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
devskim found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
… to _ConnectionStringAllowList - Updated references in connection.py, connection_string_allowlist.py, connection_string_parser.py, and related test files to use the new class name. - Changed method names from `parse` to `_parse` in _ConnectionStringParser for internal consistency. - Adjusted unit tests to reflect the new class and method names, ensuring all tests pass with the updated structure. - Improved code clarity and maintainability by adhering to naming conventions for internal classes.
…nd improve error handling for empty values
…ate test cases for parameter normalization
… include synonym and clarify usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a comprehensive connection string allow-list feature for mssql-python, adding strict validation and parsing for ODBC connection strings. The implementation includes a three-stage pipeline (parse → filter → build) that validates connection string parameters against an allow-list, normalizes synonyms, and properly handles ODBC escape sequences.
Key Changes:
- Implemented ODBC-compliant connection string parser with strict validation
- Added connection string parameter allow-list with synonym normalization
- Created connection string builder with proper escaping
- Updated connection construction to use the new validation pipeline
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| mssql_python/exceptions.py | Added ConnectionStringParseError exception for validation failures |
| mssql_python/connection_string_parser.py | New ODBC-compliant parser with strict validation and error batching |
| mssql_python/connection_string_allowlist.py | New allow-list manager for parameter validation and normalization |
| mssql_python/connection_string_builder.py | New builder for reconstructing connection strings with proper escaping |
| mssql_python/connection.py | Updated _construct_connection_string to use new validation pipeline |
| tests/test_010_connection_string_parser.py | Comprehensive parser unit tests (448 lines) |
| tests/test_011_connection_string_allowlist.py | Allow-list unit tests (245 lines) |
| tests/test_012_connection_string_integration.py | Integration tests for complete flow (664 lines) |
| tests/test_003_connection.py | Updated tests for new parameter name normalization |
| tests/test_006_exceptions.py | Updated to expect ConnectionStringParseError for invalid strings |
| eng/pipelines/*.yml | Removed Driver parameter from test connection strings |
Comments suppressed due to low confidence (1)
mssql_python/connection_string_allowlist.py:1
- [nitpick] The comment formatting is inconsistent. The comment on line 82 should be on a separate line above the code, matching the style used elsewhere in the file (e.g., lines 35-40).
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Removed unused 'List' and 'logging' imports from connection_string_parser.py - Removed unused 'sys', 'pooling', and 'threading' imports from test_003_connection.py - Removed unused variables and unnecessary pass statements in tests - Cleaned up test_012_connection_string_integration.py imports All tests passing: 226 passed, 4 skipped
- Moved _ConnectionStringAllowList class from connection_string_allowlist.py to constants.py - Updated all imports in connection.py and test files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some restructuring comments which might streamline circular imports etc.
Logic lgtm and test coverage for all scenarios is super nice!
| return RESERVED_PARAMETERS | ||
|
|
||
|
|
||
| class _ConnectionStringAllowList: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the ConnectionStringAllowList is now a part of constants.py, the below comment is in continuation of the same in lines of re-structuring:
Currently, _ConnectionStringAllowList is implemented as a class with only class methods and static methods, and no instance state. In Python, when a class doesn't need instance state or inheritance, it's generally more idiomatic to use module-level functions and constants instead.
Also, it'll be better to move RESERVED_PARAMETERS in this file to remove the lazy import circular dependency block, making constants.py the ultimate source of truth for all type constants
Current structure:
class _ConnectionStringAllowList:
ALLOWED_PARAMS = {...}
@classmethod
def normalize_key(cls, key: str) -> Optional[str]:
return cls.ALLOWED_PARAMS.get(key_lower)
@staticmethod
def _normalize_params(params: Dict[str, str], ...):
# ...Suggested refactoring:
Consider separating data (constants) from logic (validation functions) for better separation of concerns:
Step 1: Keep only constants inside constants.py (data only):
# mssql_python/constants.py
# Pure data/configuration - no logic
# Internal connection string constants (underscore prefix = internal use only)
_RESERVED_CONNECTION_STRING_PARAMS = ('Driver', 'APP')
_ALLOWED_CONNECTION_STRING_PARAMS = {
'server': 'Server',
'address': 'Server',
'uid': 'UID',
'pwd': 'PWD',
# ... etc (same dictionary content, just not inside a class)
}Step 2: Move validation logic to connection_string_parser.py (where it's used):
# mssql_python/connection_string_parser.py
# All connection string parsing and validation logic together
from mssql_python.constants import (
_RESERVED_CONNECTION_STRING_PARAMS,
_ALLOWED_CONNECTION_STRING_PARAMS
)
def _normalize_connection_string_key(key: str) -> Optional[str]:
"""Normalize a parameter key to its canonical form. (Internal use only)"""
key_lower = key.lower().strip()
return _ALLOWED_CONNECTION_STRING_PARAMS.get(key_lower)
def _validate_and_normalize_params(params: Dict[str, str], warn_rejected: bool = True) -> Dict[str, str]:
"""Validate and normalize parameters against the allow-list. (Internal use only)"""
# ... implementation (moved from _ConnectionStringAllowList._normalize_params)
filtered = {}
for key, value in params.items():
normalized_key = _normalize_connection_string_key(key)
if normalized_key and normalized_key not in _RESERVED_CONNECTION_STRING_PARAMS:
filtered[normalized_key] = value
return filtered
class _ConnectionStringParser:
"""Parser for ODBC connection strings."""
# ... existing parser implementationwhy I made this suggestion:
- all connection string validation lives in one place (
connection_string_parser.py) - constants.py contains just data, no logic
- no more lazy import
| return ATTRIBUTE_SET_TIMING.get(attribute, AttributeSetTime.AFTER_ONLY) | ||
|
|
||
|
|
||
| # Import RESERVED_PARAMETERS from parser module to maintain single source of truth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested in the other comment, keeping RESERVED_PARAMETERS in this file itself would be a better choice, since parser module is technically not a single source of truth as of now. Whenever lazy imports are needed to avoid circular dependencies, it often indicates that dependents should be co-located.
| assert any("Empty value for keyword 'server'" in err for err in errors) | ||
| assert any("Empty value for keyword 'pwd'" in err for err in errors) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checked for OCD purposes, adding these tests should cover up the remaining lines reported in codecoverage comment:
"Server= localhost;Database= mydb"(excessive whitespace)"Server=\t\tlocalhost;PWD=\t{pass}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha. Nice!!
| from mssql_python.helpers import sanitize_user_input | ||
| logger = get_logger() | ||
| except ImportError: | ||
| logger = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested in the restructuring comment, this whole method would be moved - and hence we wouldn't be needing logger at all in constants.py - and logger can be used readily in connection_string_parser.py
If not, this might be a problem since we should not set logger as none in case of circular dependencies. We should remove the try: except: block in imports and fix circular dependency imports, if any - by streamlining modules.
defensive block in this case would render the logger of no use, which might be a problem in future - siliently skipping failures.
Work Item / Issue Reference
Summary
Fixes #306 by introducing a connection string parser for mssql-python.
PR Title Guide
FEAT: Improve connection string parsing, honor the allow list, and improve error messaging.
Connection String Allowlist Feature - Review Guide
Overview
This PR introduces a comprehensive connection string validation and allowlist system that validates all ODBC connection parameters before passing them to the driver, improving usability and providing better error messages.
What This Changes
Before: Connection strings were passed directly to the ODBC driver with minimal or no validation. Unknown parameters were silently ignored by ODBC, malformed strings caused cryptic ODBC errors.
After: All connection strings are parsed, validated against an allowlist of ODBC Driver 18 parameters, and reconstructed with proper escaping. Clear error messages are provided for any issues.
Data Flow
File Structure & Review Order
Recommended Review Order
Review in this sequence to understand the architecture:
1. Core Components (understand the building blocks)
mssql_python/connection_string_parser.pyStart hereConnectionStringParseError- Custom exception_ConnectionStringParser- Main parser classparse(connection_str)- Main entry point_parse_value(str, pos)- Handles braced values & escaping{{,}})mssql_python/connection_string_allowlist.pyCriticalConnectionStringAllowList- Singleton allowlist managernormalize_key(key)- Maps synonyms to canonical namesfilter_params(params)- Validates and filters parametersmssql_python/connection_string_builder.py_ConnectionStringBuilder- Builder with escape logicadd_param(key, value)- Adds a parameter_needs_braces(value)- Determines if braces needed_escape_value(value)- Escapes special charactersbuild()- Constructs final string;,{,},=)2. Integration (see how it fits together)
mssql_python/connection.pyIntegration point_construct_connection_string()mssql_python/__init__.pyConnectionStringParseError3. Tests (validate functionality)
tests/test_010_connection_string_parser.pyParser teststests/test_011_connection_string_allowlist.pyAllowlist teststests/test_012_connection_string_integration.pyIntegration teststests/test_003_connection.py(Updated)test_construct_connection_string()test_connection_string_with_attrs_before()test_connection_string_with_odbc_param()tests/test_006_exceptions.py(Updated)test_connection_error()now expectsConnectionStringParseError4. Documentation (understand design decisions)
docs/connection_string_allow_list_design.mdRead thisdocs/parser_state_machine.mdAreas to Focus On
1. Security
2. Error Handling
3. Performance
4. Correctness
Testing Strategy
Test Coverage Map
test_010_connection_string_parser.pytest_011_connection_string_allowlist.pytest_012_connection_string_integration.pytest_012_connection_string_integration.pytest_003_connection.pytest_006_exceptions.pyBehavior Changes
Should be aware of these behavioral changes:
1. Unknown Parameters Now Raise Errors
Before:
After:
2. Malformed Strings Caught Early
Before:
After:
3. Reserved Parameters Blocked
Before:
After:
Key Concepts to Understand
ODBC Connection String Format
Braced Values
Used when value contains semicolons or special characters:
Escape Sequences
{{->{(escaped left brace)}}->}(escaped right brace)Example: