Mozilla Data Collective Python API Library

Python library for interfacing with the Mozilla Data Collective REST API.

Installation

Install the package using pip:

pip install datacollective

Quick Start

Get your API key from the Mozilla Data Collective dashboard
Set up your environment:

If you have cloned the repository, you can run the following command:

# Copy the example environment file
cp .env.example .env

Otherwise, copy and paste the following into a file called .env in your present working directory.

MDC_API_KEY=<MDC_API_KEY> # change to your MDC API Key
MDC_API_URL=https://datacollective.mozillafoundation.org/api # change to MDC API URL endpoint
MDC_DOWNLOAD_PATH=~/.mozdata/datasets # change to where you want to download datasets

Configure your API key by editing .env:

# Required: Your MDC API key
MDC_API_KEY=your-api-key-here

# Optional: Download path for datasets (defaults to ~/.mozdata/datasets)
MDC_DOWNLOAD_PATH=~/.mozdata/datasets

Start using the library:

from datacollective import DataCollective

# Initialize the client
client = DataCollective()

# Download a dataset
client.get_dataset('mdc-dataset-id')

Configuration

The client loads configuration from environment variables or .env files:

MDC_API_KEY - Your Mozilla Data Collective API key (required)
MDC_API_URL - API endpoint (defaults to production)
MDC_DOWNLOAD_PATH - Where to download datasets (defaults to ~/.mozdata/datasets)

Environment Files

Create a .env file in your project root:

# MDC API Configuration
MDC_API_KEY=your-api-key-here
MDC_API_URL=https://datacollective.mozillafoundation.org/api
MDC_DOWNLOAD_PATH=~/.mozdata/datasets

Note: Never commit .env files to version control as they contain sensitive information.

Basic Usage

from datacollective import DataCollective

# Initialize client (loads from .env automatically)
client = DataCollective()

# Verify your configuration
print(f"API URL: {client.api_url}")
print(f"Download path: {client.download_path}")

# Download a dataset
dataset = client.get_dataset('your-dataset-id')

Load and query datasets

note: today, this feature only works with Mozilla Common Voice datasets

from datacollective import DataCollective

client = DataCollective()

dataset = client.load_dataset("<dataset-id>") # Load dasaset into memory
df = dataset.to_pandas() # Convert to pandas for queryable form
dataset.splits # A list of all splits available in the dataset

Multiple Environments

You can use different environment configurations:

# Production environment (default, uses .env)
client = DataCollective()

# Development environment (uses .env.development)
client = DataCollective(environment='development')

# Staging environment (uses .env.staging)  
client = DataCollective(environment='staging')

License

This project is released under MPL (Mozilla Public License) 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
scripts		scripts
src/datacollective		src/datacollective
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Mozilla Data Collective Python API Library

Installation

Quick Start

Configuration

Environment Files

Basic Usage

Load and query datasets

Multiple Environments

License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Uh oh!

License

Uh oh!

Mozilla-Data-Collective/datacollective-python

Folders and files

Latest commit

History

Repository files navigation

Mozilla Data Collective Python API Library

Installation

Quick Start

Configuration

Environment Files

Basic Usage

Load and query datasets

Multiple Environments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages