/cs-scrape¶
Run a gated extraction pipeline for $ARGUMENTS using skills/universal-scraping-architect/SKILL.md.
Pre-flight gates (stop if any fails)¶
- Target stated? If
$ARGUMENTSis empty, ask for the URL or file path plus the desired output format — do not guess. - Live-site etiquette: for URLs, check
robots.txtand plan rate limits; refuse disallowed targets. - Privacy: if the target is a local/sensitive file, do not send it to an external API — force Mode 2 (local Python).
- Secrets: Firecrawl key only via
os.getenv('FIRECRAWL_API_KEY'); if a key appears inline anywhere, fix that first.
Workflow¶
- Route — state the mode and why (per the skill's routing rules): Mode 1 Firecrawl (public/JS-heavy URL, bulk crawl) · Mode 2 local Python (local files, private data, simple static HTML) · Mode 3 hybrid (Firecrawl extract + pandas clean).
- Budget — estimate API quota / token limits before multi-page jobs; add checkpointing + pagination.
- Extract — start from the matching runner template (run from the plugin root;
--samplepreviews the summary shape offline): - Validate (mandatory, exit-code gated):
- exit 0 (
status: ok) → continue - exit 1 (
warning= empty output,error= malformed JSON) → fix and re-extract; never deliver unvalidated data
Then check required fields and duplicates against the job spec. 5. Deliver — CSV (tabular) / JSON (nested) / Markdown (docs, chunked), per the user's requested format, with a summary of mode chosen, row counts, empty values, and the validation verdict.