forked from aws/sagemaker-python-sdk
-
Notifications
You must be signed in to change notification settings - Fork 0
Mlx 1269 #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
HCharlie
wants to merge
581
commits into
master
Choose a base branch
from
MLX-1269
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ef346b1 to
3112627
Compare
… and deployment configs (aws#1572)
* add in-process mode for DJL server * fix format * add inference_spec as a member of DJL * add the validations for model server * fix typo * fix test assertion * add unit-testing * have a common server for inprocess mode * fix failing tests * add support to torchserve * fix tests to include torchserve servers * use custom inference_spec code instead of HF pipelines * fix tests for app.py * fix unit test failure * fix format * use schema_builder for serialization and deserialization * remove task field * remove unused import
* Base model trainer (aws#1521) * Base model trainer * flake8 * add testing notebook * add param validation & set defaults * Implement simple train method * feature: support script mode with local train.sh (aws#1523) * feature: support script mode with local train.sh * Stop tracking train.sh and add it to .gitignore * update message * make dir if not exist * fix docs * fix: docstyle * Address comments * fix hyperparams * Revert pydantic custom error * pylint * Image Spec refactoring and updates (aws#1525) * Image Spec refactoring and updates * Unit tests and update function for Image Spec * Fix hugging face test * Fix Tests * Add unit tests for ModelTrainer (aws#1527) * Add unit tests for ModelTrainer * Flake8 * format * Add example notebook (aws#1528) * Add testing notebook * format * use smaller data * remove large dataset * update * pylint * flake8 * ignore docstyle in directories with test * format * format * Add enviornment variable bootstrapping script (aws#1530) * Add enviornment variables scripts * format * fix comment * add docstrings * fix comment * feature: add utility function to capture local snapshot (aws#1524) * local snapshot * Update pip list command * Remove function calls * Address comments * Address comments * Change to make Model Trainer return a Model Object * Fix * Cleanup * Support intelligent parameters (aws#1540) * Support intelligent parameters * fix codestyle * Revert Image Spec (aws#1541) * Cleanup ModelTrainer (aws#1542) * General image builder (aws#1546) * General image builder * General image builder * Fix codestyle * Fix codestyle * Move location * Add warnings * Add integ tests * Fix integ test * Fix integ test * Fix region error * Add region * Latest Container Image (aws#1545) * Latest Container Image * Test Fixes * Parameterized tests and some logic updates * Test fixes * Move to Image URI * Fixes for unit test * Fixes for unit test * Fix codestyle error checks * Cleanup ModelTrainer code (aws#1552) * Updates * feat: add pre-processing and post-processing logic to inference_spec (aws#1560) * add pre-processing and post-processing logic to inference_spec * fix format * make accept_type and content_type optional * remove accept_type and content_type from pre/post processing * correct typo * Add Distributed Training Support Model Trainer (aws#1536) * Add path to set Additional Settings in ModelTrainer (aws#1555) * Updates * Mask Sensitive Env Logs in Container (aws#1568) * Cleanup PR * Codestyle fixes * Update logic to use model parameter instead of model_path * Fixes * Fixes * Tests * Codestyle Fixes * Codestyle Fixes * Codestyle Fixes * Codestyle Fixes --------- Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com> Co-authored-by: pintaoz-aws <167920275+pintaoz-aws@users.noreply.github.com> Co-authored-by: Pravali Uppugunduri <46845440+pravali96@users.noreply.github.com>
Co-authored-by: Gokul Anantha Narayanan <166456257+nargokul@users.noreply.github.com>
* Base model trainer (aws#1521) * Base model trainer * flake8 * add testing notebook * add param validation & set defaults * Implement simple train method * feature: support script mode with local train.sh (aws#1523) * feature: support script mode with local train.sh * Stop tracking train.sh and add it to .gitignore * update message * make dir if not exist * fix docs * fix: docstyle * Address comments * fix hyperparams * Revert pydantic custom error * pylint * Image Spec refactoring and updates (aws#1525) * Image Spec refactoring and updates * Unit tests and update function for Image Spec * Fix hugging face test * Fix Tests * Add unit tests for ModelTrainer (aws#1527) * Add unit tests for ModelTrainer * Flake8 * format * Add example notebook (aws#1528) * Add testing notebook * format * use smaller data * remove large dataset * update * pylint * flake8 * ignore docstyle in directories with test * format * format * Add enviornment variable bootstrapping script (aws#1530) * Add enviornment variables scripts * format * fix comment * add docstrings * fix comment * feature: add utility function to capture local snapshot (aws#1524) * local snapshot * Update pip list command * Remove function calls * Address comments * Address comments * Support intelligent parameters (aws#1540) * Support intelligent parameters * fix codestyle * Revert Image Spec (aws#1541) * Cleanup ModelTrainer (aws#1542) * General image builder (aws#1546) * General image builder * General image builder * Fix codestyle * Fix codestyle * Move location * Add warnings * Add integ tests * Fix integ test * Fix integ test * Fix region error * Add region * Latest Container Image (aws#1545) * Latest Container Image * Test Fixes * Parameterized tests and some logic updates * Test fixes * Move to Image URI * Fixes for unit test * Fixes for unit test * Fix codestyle error checks * Cleanup ModelTrainer code (aws#1552) * feat: add pre-processing and post-processing logic to inference_spec (aws#1560) * add pre-processing and post-processing logic to inference_spec * fix format * make accept_type and content_type optional * remove accept_type and content_type from pre/post processing * correct typo * Add Distributed Training Support Model Trainer (aws#1536) * Add path to set Additional Settings in ModelTrainer (aws#1555) * Support building image from Dockerfile * Fix test * Fix test * Rename functions --------- Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com> Co-authored-by: Gokul Anantha Narayanan <166456257+nargokul@users.noreply.github.com> Co-authored-by: Pravali Uppugunduri <46845440+pravali96@users.noreply.github.com>
* Base model trainer (aws#1521) * Base model trainer * flake8 * add testing notebook * add param validation & set defaults * Implement simple train method * feature: support script mode with local train.sh (aws#1523) * feature: support script mode with local train.sh * Stop tracking train.sh and add it to .gitignore * update message * make dir if not exist * fix docs * fix: docstyle * Address comments * fix hyperparams * Revert pydantic custom error * pylint * Image Spec refactoring and updates (aws#1525) * Image Spec refactoring and updates * Unit tests and update function for Image Spec * Fix hugging face test * Fix Tests * Add unit tests for ModelTrainer (aws#1527) * Add unit tests for ModelTrainer * Flake8 * format * Add example notebook (aws#1528) * Add testing notebook * format * use smaller data * remove large dataset * update * pylint * flake8 * ignore docstyle in directories with test * format * format * Add enviornment variable bootstrapping script (aws#1530) * Add enviornment variables scripts * format * fix comment * add docstrings * fix comment * feature: add utility function to capture local snapshot (aws#1524) * local snapshot * Update pip list command * Remove function calls * Address comments * Address comments * Support intelligent parameters (aws#1540) * Support intelligent parameters * fix codestyle * Revert Image Spec (aws#1541) * Cleanup ModelTrainer (aws#1542) * Initial Prototype * General image builder (aws#1546) * General image builder * General image builder * Fix codestyle * Fix codestyle * Move location * Add warnings * Add integ tests * Fix integ test * Fix integ test * Fix region error * Add region * Unified deploying in ModelBuilder * Latest Container Image (aws#1545) * Latest Container Image * Test Fixes * Parameterized tests and some logic updates * Test fixes * Move to Image URI * Fixes for unit test * Fixes for unit test * Fix codestyle error checks * Address PR comments * Address Codestyle errors * Cleanup ModelTrainer code (aws#1552) * Black format * Codestyle changes * Codestyle changes * from __future__ import absolute_import * DocString formatting * Black formatting * Address PR comments * Noteboook changes and fixes * feat: add pre-processing and post-processing logic to inference_spec (aws#1560) * add pre-processing and post-processing logic to inference_spec * fix format * make accept_type and content_type optional * remove accept_type and content_type from pre/post processing * correct typo * Add Distributed Training Support Model Trainer (aws#1536) * Add path to set Additional Settings in ModelTrainer (aws#1555) * Checkstyle Fixes * Address PR comments * Fixes * Merge Fixes * Codestyle Fixes * Codestyle Fixes * Codestyle Fixes * Codestyle Fixes * Codestyle Fixes * Update Docstring --------- Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com> Co-authored-by: pintaoz-aws <167920275+pintaoz-aws@users.noreply.github.com> Co-authored-by: Pravali Uppugunduri <46845440+pravali96@users.noreply.github.com>
* Base model trainer (aws#1521) * Base model trainer * flake8 * add testing notebook * add param validation & set defaults * Implement simple train method * feature: support script mode with local train.sh (aws#1523) * feature: support script mode with local train.sh * Stop tracking train.sh and add it to .gitignore * update message * make dir if not exist * fix docs * fix: docstyle * Address comments * fix hyperparams * Revert pydantic custom error * pylint * Image Spec refactoring and updates (aws#1525) * Image Spec refactoring and updates * Unit tests and update function for Image Spec * Fix hugging face test * Fix Tests * Add unit tests for ModelTrainer (aws#1527) * Add unit tests for ModelTrainer * Flake8 * format * Add example notebook (aws#1528) * Add testing notebook * format * use smaller data * remove large dataset * update * pylint * flake8 * ignore docstyle in directories with test * format * format * Add enviornment variable bootstrapping script (aws#1530) * Add enviornment variables scripts * format * fix comment * add docstrings * fix comment * feature: add utility function to capture local snapshot (aws#1524) * local snapshot * Update pip list command * Remove function calls * Address comments * Address comments * Support intelligent parameters (aws#1540) * Support intelligent parameters * fix codestyle * Revert Image Spec (aws#1541) * Cleanup ModelTrainer (aws#1542) * General image builder (aws#1546) * General image builder * General image builder * Fix codestyle * Fix codestyle * Move location * Add warnings * Add integ tests * Fix integ test * Fix integ test * Fix region error * Add region * Latest Container Image (aws#1545) * Latest Container Image * Test Fixes * Parameterized tests and some logic updates * Test fixes * Move to Image URI * Fixes for unit test * Fixes for unit test * Fix codestyle error checks * Cleanup ModelTrainer code (aws#1552) * Single container local mode training * Add wait argument * Implement helper funtions * Add helper functions * Fix bugs * Fix codestyle * feat: add pre-processing and post-processing logic to inference_spec (aws#1560) * add pre-processing and post-processing logic to inference_spec * fix format * make accept_type and content_type optional * remove accept_type and content_type from pre/post processing * correct typo * Fix test and codestyle * Add Distributed Training Support Model Trainer (aws#1536) * Add tests * Add path to set Additional Settings in ModelTrainer (aws#1555) * Added example notebook * Fix codestyle * Address comments * resolve merge conflict * Support multi container local training (aws#1576) * Fix codestyle * Mask Sensitive Env Logs in Container (aws#1568) * Fix bug in script mode setup ModelTrainer (aws#1575) * Support multi container local training * Merge branch 'single_container_local_training' into multi_container_local_training * Update unit tests --------- Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com> * Remove LocalTrainingJob class * Bypass pydantic check * Add example --------- Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com> Co-authored-by: Gokul Anantha Narayanan <166456257+nargokul@users.noreply.github.com> Co-authored-by: Pravali Uppugunduri <46845440+pravali96@users.noreply.github.com>
* add inference morpheus nbs * update the in process notebook
…#1583) * Fix: move the functionality from latest_container_image to retrieve * address some comments from Gokul and add unit test * remove extra functions and rewrite the test * fix unit test * fix for other unit test * unit test fix * fix unit test: add one more condition * more unit tests fix * remove redundant files --------- Co-authored-by: Chad Chiang <chadchc@amazon.com> Co-authored-by: Gokul Anantha Narayanan <166456257+nargokul@users.noreply.github.com>
* Fix: move the functionality from latest_container_image to retrieve * address some comments from Gokul and add unit test * remove extra functions and rewrite the test * fix unit test * fix for other unit test * unit test fix * fix unit test: add one more condition * more unit tests fix * remove redundant files * remove the special condition and fix the unit test --------- Co-authored-by: Chad Chiang <chadchc@amazon.com> Co-authored-by: Gokul Anantha Narayanan <166456257+nargokul@users.noreply.github.com>
* Notebooks update for Bugbash * Testing and updates * Testing and updates * Addressed comments * Fix * Fix
* Fix deepdiff dependencies * trigger tests
* change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container * feature: Enabled update_endpoint through model_builder * fix: fix unit test, black-check, pylint errors * fix: fix black-check, pylint errors * fix:Added handler for pipeline variable while creating process job * fix: Added handler for pipeline variable while creating process job * Revert the PR changes: aws#5122, due to issue https://t.corp.amazon.com/P223568185/overview * Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>
* fix: tgi image uri unit tests * fix: black-format and flake8 failures * fix: parse * fix: print statement --------- Co-authored-by: Erick Benitez-Ramos <141277478+benieric@users.noreply.github.com>
…aws#5123) * clean up * bump maxdepth for doc/api/training to fix readthedocs * change maxdepth for readthedocs rendering doc/api/training page * change maxdepth for readthedocs rendering doc/api/training page * change maxdepth for readthedocs rendering doc/api/training page
* change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container * feature: Enabled update_endpoint through model_builder * fix: fix unit test, black-check, pylint errors * fix: fix black-check, pylint errors * fix:Added handler for pipeline variable while creating process job * fix: Added handler for pipeline variable while creating process job * Revert the PR changes: aws#5122, due to issue https://t.corp.amazon.com/P223568185/overview * Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication * Revert PR 5122 changes, due to issues with other processor codeflows --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com> Co-authored-by: Zhaoqi <jzhaoqwa@amazon.com>
…ws#5144) * add s3 uri check to modeltrainer data source * update ModelTrainer to support s3 uri and tar.gz file as source_dir * black-format * add unit and integ tests * update logic and unit test to raise value error if the file is not .tar.gz
…image. (aws#5143) * feature:support custom workflow deployment in ModelBuilder using SMD image. (aws#1661) * feature:support custom workflow deployment in ModelBuilder using SMD inference image. * Rename test case and pass session. * Address PR comments. * Tweak resource cleanup logic in integ test. * Fixing CodeBuild integ test failures. * Renamed integ test. * Remove unused integ test, restore once GA. --------- Co-authored-by: Joseph Zhang <cjz@amazon.com> * Cache client as instance attribute in property@ decorator. (aws#1668) * Remove property@ decorator from ABC definition. * Cache client as instance attribute in @Property. * Fix flake8 issue. --------- Co-authored-by: Joseph Zhang <cjz@amazon.com> * Bugfixes from e2e testing. (aws#1670) * Fix Alabtross Inference component tests * trigger integ tests --------- Co-authored-by: cj-zhang <32367995+cj-zhang@users.noreply.github.com> Co-authored-by: Joseph Zhang <cjz@amazon.com> Co-authored-by: Pravali Uppugunduri <upravali@amazon.com>
…ws#5149) Co-authored-by: Namrata Madan <nmmadan@amazon.com>
Co-authored-by: adishaa <adishaa@amazon.com>
…5146) * Fix Flake8 Violations * Add Owner ID check for bucket with path when prefix is provided **Description** Previously we called the head_bucket call to ensure the owner ID check, but this doesnt take into consideration cases where the s3 path is provided through the prefix. This change makes sure that director level permissions are supported. **Testing Done** Tested through unit tests, integ tests and manual testing through the installation file. Yes * Address PR comment * Codestyle fixes * Minor fix * Codestyle fixes * Fix Unit tests
* chore: add huggingface images * chore: add tei 1.6 image * chore: add tei 1.6.0 to tei mapping in tests
aws#5098) Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 2.20.3. - [Release notes](https://github.com/mlflow/mlflow/releases) - [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md) - [Commits](mlflow/mlflow@v2.13.2...v2.20.3) --- updated-dependencies: - dependency-name: mlflow dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 2.20.3. - [Release notes](https://github.com/mlflow/mlflow/releases) - [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md) - [Commits](mlflow/mlflow@v2.13.2...v2.20.3) --- updated-dependencies: - dependency-name: mlflow dependency-version: 2.20.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [scikit-learn](https://github.com/scikit-learn/scikit-learn) from 1.3.2 to 1.5.1. - [Release notes](https://github.com/scikit-learn/scikit-learn/releases) - [Commits](scikit-learn/scikit-learn@1.3.2...1.5.1) --- updated-dependencies: - dependency-name: scikit-learn dependency-version: 1.5.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Improve error logging and documentation for issue 4007 * Add hyperlink to RTDs
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
Description of changes:
Testing done:
Merge Checklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_baseto create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.