- Extract metadata of pull requests related to a GitHub repo (pointed at by the environment
variable REPO_NAMEor defined by the command-line--repoflag) in the form "<owner>/<repo>".
- If MAIN_BRANCHis set, it will be used as the main branch to compare against. Otherwise, the default ismain.
- You will need to set the environment variable GITHUB_TOKENto your GitHub Personal Access Token (PAT).
- This token should have access to the repository you are trying to analyze.
- The script will fetch all open pull requests, analyze them, and visualize the results.
- The results will be saved to a CSV file if the --csv <filename>flag is provided.
If you are in a corporate environment, the script is proxy-aware. Set the environment
variables HTTP_PROXY and HTTPS_PROXY appropriately and the script will take them
into account.
- 
Install the required libraries: pip install -r requirements.txtOr using an explicit proxy setting like: pip install -r requirements.txt --proxy http://[user:passwd@]proxy.server:port
- 
Update the .envfile:Ensure your account/token has the reposcope and appropriate access to the repository (eg: it is fine for the repo to be private - as long as your account/token has access to it).GITHUB_TOKEN=your_github_token REPO_NAME=owner/repo_name MAIN_BRANCH=main # Replace with the name of your main branch if differentProxy Servers: If you have problems with corporate proxy servers, you might add: HTTP_PROXY=http://username:password@proxy.company.com:port HTTPS_PROXY=http://username:password@proxy.company.com:port
- 
Run the script: - For example: python main.py --repo x/y --branch z --csv pull_requests.csv
- The script will save the pull request data from a repo called yunder GitHub ownerxto a CSV file namedpull_requests.csvin the same directory. It will compare how far behind each pull-request is from branchz.
 
- For example: 
- 
Lines of Code Changed: Sum of additions and deletions (additions + deletions). Larger changes may indicate higher complexity. Number of Changed Files: 
- 
Number of Comments: Sum of comments and review_comments. More comments may indicate more discussion or contention. Time Open: 
- 
Calculate the duration between created_at and closed_at (or the current date if still open). Longer durations may indicate complexity or delays. 
- 
Merge Status: Whether the pull request is merged (merged_at is not None). Labels: 
- 
Analyze labels (e.g., "bug", "enhancement", "critical") to prioritize based on importance. 
- 
Commits Behind/Ahead Main: Already included in the previous script. Larger numbers may indicate more divergence from the main branch. 
- 
Author Activity: Number of pull requests created by the author (to identify experienced contributors). 
The script fetches all pull requests (open, closed, and merged) and includes metadata like labels, comments, and file changes.
If you have problems with corporate proxy servers, you might try (With set in Windows, export in Linux/Bash):
set HTTP_PROXY=http://username:password@proxy.company.com:port
set HTTPS_PROXY=http://username:password@proxy.company.com:port
Or use an explict proxy setting like:
pip install -r requirements.txt --proxy http://[user:passwd@]proxy.server:port
