Added some docs for using the tool
This commit is contained in:
142
GUIDE.md
Normal file
142
GUIDE.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Canvas Student Data Export — Usage Guide
|
||||
|
||||
## Shared Prerequisites (both scripts)
|
||||
|
||||
**1. Install Python dependencies**
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
**2. Configure `credentials.yaml`**
|
||||
|
||||
The base required fields for both scripts:
|
||||
```yaml
|
||||
API_URL: https://canvas.ucsd.edu
|
||||
API_KEY: <your Canvas API token>
|
||||
USER_ID: <your numeric Canvas user ID>
|
||||
```
|
||||
|
||||
- **API_KEY**: Canvas → Account → Settings → Approved Integrations → "+ New Access Token"
|
||||
- **USER_ID**: Visit `https://<your-canvas-url>/api/v1/users/self` while logged in and find the `id` field
|
||||
|
||||
---
|
||||
|
||||
## `export.py` — Canvas Data Exporter
|
||||
|
||||
**What it does:** Exports assignments, announcements, discussions, pages, files, and modules from all your Canvas courses (active and completed) as JSON files, and optionally as HTML snapshots.
|
||||
|
||||
### Basic usage (JSON only — no extra setup)
|
||||
```bash
|
||||
python export.py
|
||||
```
|
||||
|
||||
### Optional: HTML Snapshots (`--singlefile` flag)
|
||||
|
||||
**Additional prerequisites:**
|
||||
1. Node.js 16+ installed
|
||||
2. Run `npm install` in the project directory
|
||||
3. Add `COOKIES_PATH` to `credentials.yaml`:
|
||||
```yaml
|
||||
COOKIES_PATH: ./cookies.txt
|
||||
```
|
||||
4. Export browser cookies in **Netscape format** from `canvas.ucsd.edu` using a browser extension like "Get cookies.txt Clean" (Chrome) — do this right before running
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
python export.py --singlefile
|
||||
```
|
||||
|
||||
### CLI options
|
||||
|
||||
| Flag | Description | Default |
|
||||
|------|-------------|---------|
|
||||
| `-c <path>` | Path to credentials YAML | `credentials.yaml` |
|
||||
| `-o <path>` | Output directory | `./output` |
|
||||
| `--singlefile` | Enable HTML snapshot capture | Disabled |
|
||||
| `-v` / `--verbose` | Verbose debug output | Disabled |
|
||||
|
||||
### Optional `credentials.yaml` fields for `export.py`
|
||||
```yaml
|
||||
COURSES_TO_SKIP:
|
||||
- 12345 # skip specific course IDs
|
||||
CHROME_PATH: /usr/bin/chromium # if Chrome/Chromium not auto-detected
|
||||
SINGLEFILE_TIMEOUT: 180 # seconds; raise if you see "Capture timeout"
|
||||
```
|
||||
|
||||
### Output structure
|
||||
```
|
||||
output/
|
||||
Fall 2024/
|
||||
CS 101/
|
||||
course files/
|
||||
assignments/
|
||||
modules/
|
||||
announcements/
|
||||
discussions/
|
||||
grades.html # (singlefile only)
|
||||
homepage.html # (singlefile only)
|
||||
CS 101.json
|
||||
all_output.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## `kaltura_downloader.py` — Kaltura Media Gallery Downloader
|
||||
|
||||
**What it does:** Downloads all Kaltura lecture videos from your Canvas courses' Media Galleries using yt-dlp. Tracks progress with a pickle cache so it can resume if interrupted.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
1. **yt-dlp** and **ffmpeg** must be installed:
|
||||
```bash
|
||||
pip install yt-dlp
|
||||
# ffmpeg: sudo apt install ffmpeg (or brew install ffmpeg on macOS)
|
||||
```
|
||||
|
||||
2. **Two additional required fields** in `credentials.yaml`:
|
||||
|
||||
**`COOKIES_PATH`** — Browser cookies in Netscape format. These need to include cookies from **both** `canvas.ucsd.edu` and `canvaskaf.ucsd.edu`. Get them with "Get cookies.txt Clean" or similar extension.
|
||||
|
||||
**`KAF_COOKIE`** — The `kms_ctamuls` session cookie from `canvaskaf.ucsd.edu`:
|
||||
- Log into Canvas first
|
||||
- Navigate to a course's **Media Gallery** page
|
||||
- Then open browser DevTools or a cookie inspector (e.g., EditThisCookie) on `canvaskaf.ucsd.edu`
|
||||
- Find and copy the value of the `kms_ctamuls` cookie
|
||||
|
||||
Add to `credentials.yaml`:
|
||||
```yaml
|
||||
COOKIES_PATH: ./cookies.txt
|
||||
KAF_COOKIE: "your_kms_ctamuls_value_here"
|
||||
```
|
||||
|
||||
### Usage
|
||||
```bash
|
||||
python kaltura_downloader.py
|
||||
```
|
||||
|
||||
### CLI options
|
||||
|
||||
| Flag | Description | Default |
|
||||
|------|-------------|---------|
|
||||
| `-c <path>` | Path to credentials YAML | `credentials.yaml` |
|
||||
| `-o <path>` | Output directory | `./output` |
|
||||
| `-v` / `--verbose` | Verbose debug output | Disabled |
|
||||
|
||||
### Behavior details
|
||||
- Videos download to `output/{Term}/{CourseCode}/Lectures/Kaltura/`
|
||||
- `kaltura-ytdl-history.txt` prevents re-downloading already downloaded videos
|
||||
- `cache.pickle` tracks which courses have been fully processed — if interrupted, completed courses are skipped on re-run
|
||||
- Failed downloads are logged to `failed.txt`
|
||||
- Downloads best quality MP4 with embedded subtitles and thumbnail metadata
|
||||
|
||||
---
|
||||
|
||||
## Quick reference: which credentials fields are needed
|
||||
|
||||
| Field | `export.py` (basic) | `export.py --singlefile` | `kaltura_downloader.py` |
|
||||
|---|---|---|---|
|
||||
| `API_URL` | Required | Required | Required |
|
||||
| `API_KEY` | Required | Required | Required |
|
||||
| `USER_ID` | Required | Required | Required |
|
||||
| `COOKIES_PATH` | Not needed | Required | Required |
|
||||
| `KAF_COOKIE` | Not needed | Not needed | Required |
|
||||
176
README.md
176
README.md
@@ -1,9 +1,16 @@
|
||||
# Introduction
|
||||
# Canvas Student Data Export
|
||||
|
||||
The Canvas Student Data Export Tool exports nearly all of a student's data from the Instructure Canvas Learning Management System (Canvas LMS).
|
||||
This is useful when you are graduating or leaving your college or university, and would like to have a backup of all the data you had in canvas.
|
||||
A set of tools to back up nearly all of a student's data from the Instructure Canvas LMS. Useful when graduating or leaving a university and wanting to preserve your course data.
|
||||
|
||||
The tool exports the following data:
|
||||
For full setup and usage instructions, see [GUIDE.md](GUIDE.md).
|
||||
|
||||
## Tools
|
||||
|
||||
### `export.py` — Canvas Data Exporter
|
||||
|
||||
Exports course data from all active and completed Canvas courses as JSON files, with optional HTML snapshots.
|
||||
|
||||
Exports:
|
||||
- Course Assignments (including submissions and attachments)
|
||||
- Course Announcements
|
||||
- Course Discussions
|
||||
@@ -11,72 +18,55 @@ The tool exports the following data:
|
||||
- Course Files
|
||||
- Course Modules
|
||||
- (Optional) HTML snapshots of:
|
||||
- Course Home Page
|
||||
- Grades Page
|
||||
- Assignments
|
||||
- Announcements
|
||||
- Discussions
|
||||
- Modules
|
||||
- Course Home Page
|
||||
- Grades Page
|
||||
- Assignments
|
||||
- Announcements
|
||||
- Discussions
|
||||
- Modules
|
||||
|
||||
Data is saved in JSON (and optionally HTML) format and organized into folders by academic term and course.
|
||||
Data is organized into folders by academic term and course:
|
||||
|
||||
Example output structure:
|
||||
- Fall 2023
|
||||
- CS 101
|
||||
- announcements/
|
||||
- First Announcement/
|
||||
- announcement_1.html
|
||||
- announcement_list.html
|
||||
- assignments/
|
||||
- Sample Assignment/
|
||||
- assignment.html
|
||||
- submission.html
|
||||
- assignment_list.html
|
||||
- course files/
|
||||
- file_1.docx
|
||||
- file_2.png
|
||||
- discussions/
|
||||
- Sample Discussion
|
||||
- discussion_1.html
|
||||
- discussion_list.html
|
||||
- modules/
|
||||
- Sample Module
|
||||
- Sample Assignment.html
|
||||
- Sample Discussion.html
|
||||
- Sample Page.html
|
||||
- Sample Quiz.html
|
||||
- modules_list.html
|
||||
- grades.html
|
||||
- homepage.html
|
||||
- CS 101.json
|
||||
- ENGL 101
|
||||
- ...
|
||||
- Spring 2024
|
||||
- ...
|
||||
- all_output.json
|
||||
```
|
||||
output/
|
||||
Fall 2023/
|
||||
CS 101/
|
||||
announcements/
|
||||
assignments/
|
||||
course files/
|
||||
discussions/
|
||||
modules/
|
||||
grades.html # (--singlefile only)
|
||||
homepage.html # (--singlefile only)
|
||||
CS 101.json
|
||||
all_output.json
|
||||
```
|
||||
|
||||
# Getting Started
|
||||
### `kaltura_downloader.py` — Kaltura Media Gallery Downloader
|
||||
|
||||
Downloads all Kaltura lecture videos from your Canvas courses' Media Galleries using yt-dlp. Tracks completed courses via a cache file so it can safely resume if interrupted. Videos are saved to `output/{Term}/{CourseCode}/Lectures/Kaltura/`.
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Dependencies
|
||||
|
||||
## Dependencies
|
||||
- Python 3.8 or newer
|
||||
- Node.js 16 or newer (only needed for HTML snapshots)
|
||||
- Node.js 16 or newer (only needed for `--singlefile` HTML snapshots)
|
||||
- ffmpeg (only needed for `kaltura_downloader.py`)
|
||||
|
||||
1. **Install Python dependencies:**
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
**Install Python dependencies:**
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
2. **(Optional) Install SingleFile for HTML snapshots:**
|
||||
This step requires Node.js.
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
**(Optional) Install SingleFile for HTML snapshots:**
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
## Configuration
|
||||
### Configuration
|
||||
|
||||
To use the tool, you must create a `credentials.yaml` file in the project root. You can also specify a different path using the `-c` or `--config` command-line option.
|
||||
|
||||
Create the `credentials.yaml` file with the following content:
|
||||
Create a `credentials.yaml` file in the project root:
|
||||
|
||||
```yaml
|
||||
# The URL of your Canvas instance (e.g., https://your-school.instructure.com)
|
||||
@@ -86,60 +76,42 @@ API_KEY: <Your Canvas API token>
|
||||
# Your Canvas User ID
|
||||
USER_ID: 123456
|
||||
# Path to your browser cookies file (Netscape format).
|
||||
# This is only required when using the --singlefile flag.
|
||||
COOKIES_PATH: ./cookies.txt
|
||||
# (Optional) Path to your Chrome/Chromium executable if SingleFile cannot find it.
|
||||
# CHROME_PATH: C:\Program Files\Google\Chrome\Application\chrome.exe
|
||||
# (Optional) Timeout in seconds for SingleFile to capture a page. Default: 60
|
||||
# Increase this if you see "Capture timeout" errors during HTML snapshots.
|
||||
# Required for --singlefile and kaltura_downloader.py.
|
||||
# COOKIES_PATH: ./cookies.txt
|
||||
# Required for kaltura_downloader.py only.
|
||||
# KAF_COOKIE: <kms_ctamuls cookie value from canvaskaf.ucsd.edu>
|
||||
# (Optional) Path to Chrome/Chromium if SingleFile cannot find it.
|
||||
# CHROME_PATH: /usr/bin/chromium
|
||||
# (Optional) SingleFile capture timeout in seconds. Default: 60
|
||||
# SINGLEFILE_TIMEOUT: 180
|
||||
# (Optional) A list of course IDs to skip when exporting data.
|
||||
# (Optional) Course IDs to skip.
|
||||
# COURSES_TO_SKIP:
|
||||
# - 12345
|
||||
# - 67890
|
||||
```
|
||||
|
||||
### Finding Your Credentials
|
||||
**Finding your credentials:**
|
||||
- **`API_KEY`**: Canvas → Account → Settings → Approved Integrations → "+ New Access Token"
|
||||
- **`USER_ID`**: Visit `https://<your-canvas-url>/api/v1/users/self` while logged in and find the `id` field
|
||||
- **`COOKIES_PATH`**: Export cookies from your browser in Netscape format using an extension like "Get cookies.txt Clean"
|
||||
- **`KAF_COOKIE`**: The `kms_ctamuls` cookie from `canvaskaf.ucsd.edu` — log into Canvas, navigate to a Media Gallery, then inspect cookies on `canvaskaf.ucsd.edu`
|
||||
|
||||
- **`API_URL`**: Your institution's Canvas URL.
|
||||
- **`API_KEY`**: In Canvas, go to `Account` > `Settings`, scroll down to `Approved Integrations`, and click `+ New Access Token`.
|
||||
- **`USER_ID`**: After logging into Canvas, visit `https://<your-canvas-url>/api/v1/users/self`. Your browser will show a JSON response; find the `id` field.
|
||||
- **`COOKIES_PATH`**: Required **only if** you use the `--singlefile` flag. Browser cookies are needed to download complete HTML pages as if you were logged in. The script will now detect if your cookies are expired or invalid and will stop downloading HTML pages to prevent errors. For best results, log into Canvas and then export your cookies right before running the script. Use a browser extension like "Get cookies.txt Clean" for Chrome to export them in Netscape format.
|
||||
- **`CHROME_PATH`** (Optional): The script attempts to auto-detect Chrome/Chromium on Windows, macOS, and Linux. If it fails, you can specify the path here.
|
||||
- **`SINGLEFILE_TIMEOUT`** (Optional): Maximum time in seconds to wait for SingleFile to capture a single HTML page. Default is `60` seconds. If you have a slow connection or a busy computer and see "Capture timeout" errors, increase this value.
|
||||
- **`COURSES_TO_SKIP`** (Optional): A list of course IDs to exclude from the export. To find a course ID, go to the course's homepage and look at the URL for the number that follows `/courses/`.
|
||||
|
||||
## Running the Exporter
|
||||
|
||||
Once your `credentials.yaml` is set up, run the script:
|
||||
### Running
|
||||
|
||||
```bash
|
||||
python export.py [options]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
|
||||
| Flag | Description | Default |
|
||||
| ----------------------- | --------------------------------------------- | ------------------ |
|
||||
| `-c`, `--config <path>` | Path to your YAML credentials file. | `credentials.yaml` |
|
||||
| `-o`, `--output <path>` | Directory to store exported data. | `./output` |
|
||||
| `--singlefile` | Enable HTML snapshot capture with SingleFile. | Disabled |
|
||||
| `-v`, `--verbose` | Enable verbose output for debugging. | Disabled |
|
||||
| `--version` | Show the version of the tool and exit. | N/A |
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
# Run with default settings (uses ./credentials.yaml, outputs to ./output)
|
||||
# Export all course data as JSON
|
||||
python export.py
|
||||
|
||||
# Run with a custom output directory and enable HTML snapshots
|
||||
python export.py -o /path/to/my-canvas-backup --singlefile
|
||||
# Export with HTML snapshots
|
||||
python export.py --singlefile
|
||||
|
||||
# Download Kaltura lecture videos
|
||||
python kaltura_downloader.py
|
||||
```
|
||||
|
||||
After the export is complete, the tool will display a detailed summary of all the data that was successfully extracted, including counts of assignments, files, and pages, as well as any warnings or errors encountered.
|
||||
See [GUIDE.md](GUIDE.md) for all CLI options and detailed setup steps.
|
||||
|
||||
# Contribute
|
||||
## Contribute
|
||||
|
||||
I would love to see this script's functionality expanded and improved! I welcome all pull requests 🙂
|
||||
I would love to see this script's functionality expanded and improved! I welcome all pull requests 🙂
|
||||
Thank you!
|
||||
|
||||
Reference in New Issue
Block a user