Syllabi (Canvas)
CMU’s Canvas instance hosts a “Syllabus Registry” course that aggregates instructor-uploaded syllabi across most departments and recent terms.
Source
Master course: https://canvas.cmu.edu/courses/sis_course_id:syllabus-registry (Canvas course id 3769). The course’s metadata is is_public: true, but listing files and pages requires an authenticated Canvas API token.
Token
Authenticate API requests with Authorization: Bearer <token> where the token is generated from https://canvas.cmu.edu/profile/settings under “Approved Integrations” -> “+ New Access Token”.
The token currently in use for development expires August 25 at 12:00 AM. After that, a new token is needed.
Structure
The registry is a three-level Canvas hierarchy:
- The master course (
3769) contains 30 modules, one per term, named likeSpring 2026 (S26)orSummer 2024 (M24). Term codes follow<season><yy>whereseasonisS(spring),M(summer-1),N(summer-2), orF(fall). Coverage runs from Summer 2019 (M19). - Each term module’s items are 60
ExternalUrlentries, one per department, with titles likeArchitecture (48XXX). Each item’sexternal_urlpoints to a per-(term, department) sub-course identified bysis_course_id:syllabus-registry-<TERM>-<DEPT>(e.g.syllabus-registry-S26-ARC). At the time of writing, the registry contains 1797 such sub-courses across all terms. - Each dept sub-course has up to four modules:
Notice to Users,Available Syllabi,Unavailable Syllabi,Individualized Experiences. Items inside are eitherFile(a Canvas file object that resolves to a downloadable PDF/DOC) orPage(a Canvas page).
- “Available Syllabi”
Fileitems are each a real downloadable file (e.g.48649_S26_designleadership_mcnutt.pdf, content typeapplication/pdf). - “Available Syllabi”
Pageitems contain only a small JSON redirect blob like<div id="syllabus-source" style="display: none;">{"canvas_course_id":"52739",...}</div>. The actual syllabus lives on the regular Canvas course site for that class, which is generally enrollment-restricted, so most of these pointers are not retrievable from a normal student token. - “Unavailable Syllabi” pages are placeholders the registry generates for courses where the instructor never uploaded a syllabus. They contain no syllabus content.
- “Individualized Experiences” entries are independent-study and similar courses. Most are placeholder pages.
Retrieval
To list every file in the registry, we walk:
GET /api/v1/courses/3769/modules?include[]=items&per_page=100to get every term module with its dept items.- For each term-module item, the
external_urlends insis_course_id:<id>. HitGET /api/v1/courses/sis_course_id:<id>/modules?include[]=items&per_page=20for each. - Within each sub-course’s
Available Syllabimodule, everyFile-typed item has aurllike/api/v1/courses/<id>/files/<file_id>. Fetch that to get the file metadata, including aurlfield that is the actual downloadable URL with averifierquery parameter.
Pagination is via the standard Canvas Link header; modules and items use per_page up to 100. The ?per_page=100 parameter is a soft hint — Canvas may return fewer.
Direct file listing on /api/v1/courses/3769/files returns 403, so the module-walk is the only way to enumerate syllabi from this course.
CLI
The scraper has a syllabi mode that walks the registry and saves every File and Page item from each sub-course’s Available Syllabi module:
cargo run --release -- --mode syllabi --canvas-token <token>
The token can also be supplied via CANVAS_TOKEN env. Output goes to data/syllabi/<term>/<dept>/<course_section>.<ext> (configurable via --syllabi-dir). File items are saved as the original PDF/DOC. Page items are saved as <course_section>.url, a plain-text file containing the Canvas page URL where the syllabus is rendered. We do not try to dereference the page’s syllabus-source redirect or extract a single canonical URL from the target course’s syllabus_body, because in a 100-item sample 72% of pages had an empty syllabus_body, only 5% had a single href, and 24% had between 2 and 28 hrefs that mixed the actual syllabus with mailto links, in-page anchors, Zoom URLs, and supplementary readings. There is no reliable way to identify “the” syllabus URL from those, so we save the Canvas page link and let downstream decide.