/cs-scrape¶

Slash Command Source

Run a gated extraction pipeline for $ARGUMENTS using skills/universal-scraping-architect/SKILL.md.

Pre-flight gates (stop if any fails)¶

Target stated? If $ARGUMENTS is empty, ask for the URL or file path plus the desired output format — do not guess.
Live-site etiquette: for URLs, check robots.txt and plan rate limits; refuse disallowed targets.
Privacy: if the target is a local/sensitive file, do not send it to an external API — force Mode 2 (local Python).
Secrets: Firecrawl key only via os.getenv('FIRECRAWL_API_KEY'); if a key appears inline anywhere, fix that first.

Workflow¶

Route — state the mode and why (per the skill's routing rules): Mode 1 Firecrawl (public/JS-heavy URL, bulk crawl) · Mode 2 local Python (local files, private data, simple static HTML) · Mode 3 hybrid (Firecrawl extract + pandas clean).
Budget — estimate API quota / token limits before multi-page jobs; add checkpointing + pagination.

Extract — start from the matching runner template (run from the plugin root; --sample previews the summary shape offline):

python3 skills/universal-scraping-architect/scripts/firecrawl_example.py --sample
python3 skills/universal-scraping-architect/scripts/local_bs4_example.py --sample

Validate (mandatory, exit-code gated):

python3 skills/universal-scraping-architect/scripts/validate_extraction.py extracted_output.json --json

exit 0 (status: ok) → continue
exit 1 (warning = empty output, error = malformed JSON) → fix and re-extract; never deliver unvalidated data

Then check required fields and duplicates against the job spec. 5. Deliver — CSV (tabular) / JSON (nested) / Markdown (docs, chunked), per the user's requested format, with a summary of mode chosen, row counts, empty values, and the validation verdict.