I'm James, an engineer / data scientist from Chicago. My time on GitHub is mostly spent writing Python, R, and shell scripts on projects for data scientists and data engineers. My time off GitHub is spent with family, at hip hop shows, and watching reality TV.
- LightGBM: a lightweight gradient boosting machine
- lightgbm-dask-testing: containerized setup for testing LightGBM's Dask interface locally and on Amazon ECS
- pkgnet: R package for analyzing an R package's dependencies
- pydistcheck: linter that finds portability issues in Python package distributions (wheels, sdists, and conda packages)
- uptasticsearch: an R data frame client for Elasticsearch
- hamilton: a "micro-framework" for feature engineering in Python
- prefect: a workflow management thing in Python that plays nicely with Dask
- xgboost: another gradient boosting machine
click for details
The pull requests and none-code contributions below were chosen to showcase the types of software work I've done. This list is not exhaustive.
- adapting
lightgbmandxgboosttoscikit-learn1.6: - setting up
condapackages forlegate-boost,legate-dataframe, andlegate-raft: rapidsai/legate-boost#115 - replacing LightGBM's
setup.pywithscikit-build-corefor PEP 517/518 compatibility: microsoft/LightGBM#5759 - upstreaming
dask-lightgbminto LightGBM and guiding community discussion with Dask, XGBoost maintainers - adding
Webhookstorage toprefect: PrefectHQ/prefect#3000 - adding
autoconf-based builds of LightGBM's R package: microsoft/LightGBM#3188 - making
snowflake-connector-pythoncompatible withpyjwt1.x and 2.x: snowflakedb/snowflake-connector-python#604 - allow tight control over ports in LightGBM distributed traiining with Dask: microsoft/LightGBM#3994
- cut compiled size of
{lightgbm}by ignoring CLI-only objects: microsoft/LightGBM#3566 - allow use of multiple image pull secrets in
prefectkubernetes agent: PrefectHQ/prefect#3596 - replace single-shot HTTP requests with
httr::RETRY()in various R packages- project I led at Chi R Collab 2020: chircollab/chircollab20#1
{sergeant}(one example): hrbrmstr/sergeant#42
- tutorial on distributed LightGBM training with Dask: microsoft/LightGBM#4030
- early stopping example in XGBoost Dask docs: dmlc/xgboost#6501
- detailed information on how LightGBM parameters affect training speed: microsoft/LightGBM#3628
- guide on how to find valid memory and CPU combinations for ECS / Fargate clusters in
dask-cloudprovider: dask/dask-cloudprovider#156
- fixing OpenMP conflicts in
lightgbm: - detecting debug symbols in
pandas2.0 wheels: pandas-dev/pandas#51900 - prevent
condafrom "downgrading" Python from CPython to PyPy, while also reducing the risk of a subtle networking error made worse by unpredictability in when Dask garbage collects objects (microsoft/LightGBM#5510) - create a reproducible example for
lightgbmloading failing withGLIBCXXcompatibility errors: microsoft/LightGBM#5106 (comment) - fix
jupyter_serverconda-forge feedstock recipe to prevent broken environments: conda-forge/jupyter_server-feedstock#84 - make multioutput behavior of
dask-mlregression metrics consistent withscikit-learn: dask/dask-ml#820 - fix saving Dask Random Forest models in
cuml: rapidsai/cuml#3388 - fix checks for availability of
mm_mallocin{lightgbm}autoconf-based builds: microsoft/LightGBM#3510 - fix broken plots in
{lightgbm}'s docs site: microsoft/LightGBM#3508 - factor out dependency on
gendef.exefor compiling XGBoost and LightGBM R packages with Visual Studio compilers and R 4.0:{xgboost}: dmlc/xgboost#5764{lightgbm}: microsoft/LightGBM#3065
- helping with various migrations for all of the RAPIDS libraries:
- updating to newer
fmt/spdlog: rapidsai/build-planning#56 - Dropping Python 3.9: rapidsai/build-planning#88
- CUDA 12.5: rapidsai/build-planning#73
- Adding Python 3.12: rapidsai/build-planning#40
- Adding Python 3.11: rapidsai/build-planning#3
- updating to newer
- switching LightGBM's Python package jobs to
manylinux_2_28: microsoft/LightGBM#5580 - automatically publish
prefect-saturnto PyPI when a new release is created: saturncloud/prefect-saturn#7 - moving LightGBM CI jobs from Travis to GitHub Actions:
- move
{uptasticsearch}CI to GitHub Actions: uptake/uptasticsearch#217 - add CI job testing
{lightgbm}within ASAN and UBSAN sanitizers: microsoft/LightGBM#3439 - reduce data loading work in LightGBM tests by caching data loading calls: microsoft/LightGBM#3486
- add Dockerfile to build an image for testing the Apache Arrow R package: apache/arrow#2770
- Sr. Software Engineer at NVIDIA, working on RAPIDS (https://github.com/rapidsai)
- adjunct instructor at Marquette University, where I teach "Intro to R Programming"
I've given talks on Dask, LightGBM, R, Python packaging, and other random stuff. For a full list and links to videos, see https://github.com/jameslamb/talks#gallery.
My DMs are open if you want to talk about open source, data science careers, Bravo shows, or anything else.
- π LinkedIn: https://www.linkedin.com/in/jameslamb1/
- π¦ Bluesky: https://bsky.app/profile/jameslamb.bsky.social







