OSF Datasets
Setup
pip install osfclient
Authentication
# Option A: Personal Access Token (preferred) -- generate at osf.io/settings/tokens
export OSF_TOKEN=<pat>
# Option B: credentials
export OSF_USERNAME=user@example.com
export OSF_PASSWORD=pass
Or per-directory .osfcli.config:
[osf]
username = user@example.com
project = abc12
Project ID
The 5-character alphanumeric slug from any osf.io/<id> URL. Example: osf.io/abc12 -> project ID is abc12.
Core Commands
| Command | Purpose |
|---|---|
| osf -p <id> ls | List all files |
| osf -p <id> clone [dir] | Download entire project |
| osf -p <id> fetch osfstorage/path.csv local.csv | Download single file |
| osf -p <id> upload local.csv osfstorage/path.csv | Upload file |
| osf -p <id> geturl osfstorage/path.csv | Get web URL |
Remote paths are prefixed with the storage provider, typically osfstorage/.
Direct API Access
Base URL: https://api.osf.io/v2/
Use --globoff with curl -- OSF query params use [] which curl interprets as glob ranges.
# List project files
curl -sL --globoff "https://api.osf.io/v2/nodes/<id>/files/osfstorage/" | jq '.data[].attributes.name'
# Get project metadata
curl -sL --globoff "https://api.osf.io/v2/nodes/<id>/" | jq '.data.attributes'
# Search public nodes by title
curl -sL --globoff "https://api.osf.io/v2/nodes/?filter[title]=keyword&page[size]=20" | jq '.data[] | {id, title: .attributes.title}'
# Browse subfolder (use folder ID from parent listing)
curl -sL --globoff "https://api.osf.io/v2/nodes/<id>/files/osfstorage/<folder_id>/" | jq '.data[].attributes.name'
# With auth
curl -sL --globoff -H "Authorization: Bearer $OSF_TOKEN" "https://api.osf.io/v2/nodes/<id>/files/osfstorage/"
File download links are in data[].links.download -- follow redirects with curl -L -o.
Helper Scripts
scripts/osf-search.sh <keyword> [page_size]-- search public OSF projects by keywordscripts/osf-browse.sh <project_id> [path]-- browse project files via API (no osfclient needed)
Guidelines
- Public projects need no auth; private ones require
OSF_TOKENor credentials clonemirrors the remote directory structure locallyfetchrequires both remote and local path arguments- The API paginates at 10 items by default; use
?page[size]=100for larger listings - Rate limits apply -- add brief sleeps for batch operations