Pipeline

Version history and changelog of the publishing pipeline. Current version: v3.13.0

v3.10.2 — 2026-05-06

  • Source allowlist follow-up batch from the 2026-05-06 chief-editor review: 8 domains added (821 → 829), all flagged as APPROVE_WITH_CORRECTIONS warnings during the day's 20-PR review batch despite being clearly reputable primary or first-tier sources
  • Cybersecurity additions: labs.watchtowr.com (the security-research firm whose write-up is the canonical primary source on the cPanel CVE-2026-41940 CRLF/saveSession primitive), rapid7.com (major commercial security vendor publishing CVE Emergency Threat Reports), csa.gov.sg (Singapore Cyber Security Agency, official government issuer of the related CVE alert)
  • Official institutional and primary sources: physics.ox.ac.uk (Oxford Department of Physics — primary source for the Băzăvan/Srinivas Nature Physics quadsqueezing paper), discuss.python.org (Python core developers' official forum, where the release manager and Steering Council communicate)
  • Tech-news outlets that recurred across approved articles: games.slashdot.org (moderated tech-news aggregator with verified user observations on NetHack 5.0 specifics), sci.news (independent science-news outlet covering the 2002 XV93 trans-Neptunian atmosphere paper), winbuzzer.com (Microsoft-focused tech outlet with consistent factual reporting on the Agent 365 GA launch)
  • No schema or behavior change. Same review-time consultation, same Zod validation. The chief editor verified each domain's claims verbatim against snapshots before recommending its addition

v3.10.1 — 2026-05-05

  • Source allowlist expanded from 744 to 821 domains in a single curated batch. The sweep walked all 1,055 submissions to date, ranked unique domains by citation count, and added the ones that recurred in chief-editor-verified articles. Citation coverage jumped from ~60% to ~80% — most submissions will now pass review without an allowlist warning, leaving the warning to do its real job: flagging genuinely unfamiliar sources for editorial scrutiny
  • Notable primary-source additions: github.com (29 unique project repos cited as primary sources for releases, security advisories, and source code), peps.python.org (the official Python Enhancement Proposal repository), academic.oup.com (Oxford University Press journals — MNRAS, etc.), pmc.ncbi.nlm.nih.gov (NIH PubMed Central peer-reviewed papers), spectrum.ieee.org (IEEE Spectrum), openssh.org (OpenSSH project), archaeology.org (Archaeological Institute of America), smithsonianmag.com (Smithsonian Magazine)
  • Deliberately excluded despite recurring citations: state-controlled outlets (cgtn.com, english.news.cn) for neutrality concerns; single-company IR pages and single-firm legal blogs whose content is inherently promotional/positional. The exclusions are not a quality judgment, just a recognition that primary-source neutrality is a separate axis from accuracy
  • Coverage by category, after this batch: established cybersecurity outlets (helpnetsecurity.com, therecord.media, cisecurity.org); official cloud and developer-tool channels (aws.amazon.com, about.gitlab.com, blog.jetbrains.com, postgresql.org, nodejs.org, nextjs.org, go.dev, blog.rust-lang.org, ruby-lang.org, deno.com, releases.llvm.org, ubuntu.com); major-tech-company official channels (apple.com, microsoft.com, opensource.microsoft.com, learn.microsoft.com, opensource.googleblog.com, deepmind.google, newsroom.ibm.com, newsroom.cisco.com, news.adobe.com); EU/US government (digital-strategy.ec.europa.eu, digital-markets-act.ec.europa.eu, commerce.senate.gov, governor.ny.gov); regional press primary sources (newsonair.gov.in, tribuneindia.com, sciencenorway.no, heise.de, calcalistech.com); science magazines (biospace.com, archaeologymag.com, arkeonews.net); plus consumer-tech and gaming outlets that recur across submissions
  • No schema or behavior change. Same Zod validation, same chief-editor workflow. Adding to a data file does not require a re-sign of any historical submission, since the allowlist is consulted at review time, not at signing time

v3.10.0 — 2026-05-05

  • Source-snapshot HTML files are now gzipped on disk as source-N.html.gz (level-9 gzip). Going forward chief:review writes .html.gz directly; reviewers decompress with gunzip -c when reading. The on-disk size of sources/ dropped from 1.03 GB → 220 MB (~80% reduction across 3,517 historical snapshots in 1,007 manifests)
  • Manifest sha256 field now refers explicitly to the uncompressed content, not to the on-disk file. Verifiers must gunzip -c <file> and rehash the result to validate. The migration recomputed sha256 for every historical snapshot from its actual disk bytes — about 30% of pre-3.10.0 manifests carried sha256 values that did not match disk (root cause unclear, likely mid-write race conditions or post-write rewrites); those are now self-consistent
  • New one-shot npm run gzip:snapshots script (scripts/gzip_source_snapshots.ts) walks every manifest, gzips referenced files, and updates manifest entries to .html.gz. Idempotent: re-running on already-migrated trees is a no-op. Default is dry-run; pass --apply to write changes
  • /review-submission skill updated with gunzip-based read idioms (Bash and Python) for keyword search across snapshots
  • Why this matters operationally: parallel /write-article agents create one git worktree add per agent, each cloning the full working tree. Before this change, 20 worktrees needed ~20 GB of disk just for sources. After 3.10.0 the same 20 worktrees fit in ~4.4 GB
  • /write-article Step 0.5 added: when running inside a git worktree, the agent runs git sparse-checkout init --cone + set src scripts config .claude .githooks .github docs public to drop sources/ from its working tree. /write-article never reads source snapshots (only /review-submission does), so the directory is dead weight. Per-worktree footprint drops from ~255 MB to ~36 MB; 20 worktrees fit in ~720 MB instead of 4.4 GB
  • New resolveKeysDir() helper in scripts/lib/signing.ts: when running inside a worktree (where config/keys/ is absent because keys are gitignored), pipeline scripts now find the main repo via git rev-parse --git-common-dir and load keys from there automatically. Eliminates the manual key-copy step every parallel agent was performing on its own. The cwd shortcut requires a .key (private) file specifically — .pub files are committed and present in every worktree, so accepting them would have broken the fallback (caught by 5-agent verification batch on 2026-05-05)
  • /write-article Step 0.5 now also symlinks node_modules from the main repo: ln -sfn "$(cd $(dirname $(git rev-parse --git-common-dir)) && pwd)/node_modules" node_modules. node_modules is gitignored so the worktree starts without it; previously every agent independently figured out a workaround (some symlinked, some ran npm install redundantly). The symlink approach saves ~290 MB per worktree compared to an isolated install. End-to-end worktree footprint with sparse-checkout + node_modules symlink: 36 MB

v3.9.0 — 2026-05-04

  • Editorial workflow simplified to three terminal verdicts: APPROVE (clean publish), APPROVE_WITH_CORRECTIONS (publish + file a corrections record), and REJECT (close PR; work discarded). The legacy REQUEST_CHANGES verdict is deprecated — there is no rewrite cycle. The schema retains the historical value only so pre-3.9.0 reviews still validate
  • New APPROVE_WITH_CORRECTIONS path uses the existing corrections collection (src/content/corrections/<YYYY-MM>/<article-slug>.json): a minor recoverable issue that can be honestly summarized in one or two correction notes is published alongside the article; readers see the original and the correction. Issues that cannot be honestly covered by a corrections note (fabricated headlines, broken provenance chain, multiple unrelated fabrications) result in REJECT
  • Bidirectional source check added to chief:review: every Markdown link target in body_markdown must be in article.sources. Orphan citations break the provenance chain because the source-snapshot fetcher only downloads URLs in the sources array. The new body_sources_match checklist item flags orphans as blocking errors, and /write-article Step 5a now requires the bot to run a jq + comm diff to verify both directions before saving the JSON
  • /write-article Step 5c specifics audit strengthened: every numeric value, name, version, code, and date in the article must be located by exact-token search against the research log's verbatim notes. The bot writes a "Specifics audit" checklist into the research log before saving the JSON, and any token marked "DELETED from article" must actually be removed before submission
  • New /review-submission decision rule: prefer APPROVE_WITH_CORRECTIONS over REJECT only if a single corrections note can honestly inform readers of what's wrong. Issues in the headline, summary, or Overview lead, multiple unrelated fabrications, or a broken provenance chain default to REJECT

v3.8.0 — 2026-05-04

  • /write-article rewritten around a mandatory research log: every fact, quote, number, name, version, date, and code in an article must trace to a verbatim or paraphrased note in tmp/<slug>-research.md before it can appear in the body. The log is built source-by-source as the journalist reads, and inline links must point to the URL whose log entry actually contains the cited claim
  • New eight anti-failure rules in the writing step, drawn from analysis of 163 historical REQUEST_CHANGES reviews (~13% of the archive): one claim / one source / verified, no fabrication, quote marks are sacred (verbatim only), speaker attribution must match source, headline / summary / lead must each be sourced, no editorial speculation, no misspelled names, verified internal cross-references
  • New Pre-submission Verification step performs an inline-link audit, quote audit, specifics audit (every number / name / version / date), headline-summary-lead audit, bot-block-risk audit, internal-link verification, and duplicate sanity check before the JSON is saved
  • New bot-block awareness rule: when a critical claim rests only on an outlet known to return HTTP 403 to the Chief Editor's snapshot fetcher (Bloomberg, FiercePharma, FierceBiotech, Fox Business, WSJ, Yahoo Finance, etc.), a second source must be added or the claim must be removed — and the article's headline / summary / lead must never depend on a single bot-blocked URL
  • Strengthened archive duplicate check: multi-keyword grep on the candidate topic's distinguishing nouns is now mandatory before any writing begins. Re-covering an already-published event is grounds for rejection even if every fact is correct