OpenGov summary: Difference between revisions

4,135 bytes added ,  Yesterday at 00:45
 
(12 intermediate revisions by the same user not shown)
Line 119: Line 119:
* If there's disagreement or low confidence, the item flags for quick human review (one-click approve/reject/retry on the Clearance Dashboard).
* If there's disagreement or low confidence, the item flags for quick human review (one-click approve/reject/retry on the Clearance Dashboard).


==== Why This Two-Step Approach? ====
=== Why This Two-Step Approach? ===
 
* A single AI can sometimes confidently produce wrong or invented details (a common issue in LLMs).
* A single AI can sometimes confidently produce wrong or invented details (a common issue in LLMs).
* By having one model create and a different model critically review, the system catches more errors—studies on multi-agent or dual-LLM verification show significant reductions in hallucinations (often 60-90% in similar pipelines).
* By having one model create and a different model critically review, the system catches more errors—studies on multi-agent or dual-LLM verification show significant reductions in hallucinations (often 60-90% in similar pipelines).
Line 131: Line 130:
* Verifier → Double-checks it rigorously before anything goes live.
* Verifier → Double-checks it rigorously before anything goes live.


=== TBD ===
=== Zero-base burden model ===
It will use two different AIs
The zero-base burden model is the core operational philosophy of OpenGov Encyclopedia: design the system so that human effort is minimized to near-zero for routine operations, while still maintaining full federal control, compliance, and accountability. This approach draws from federal priorities for efficient, low-touch AI governance (as emphasized in OMB guidance like M-25-21 on accelerating AI adoption through innovation and reduced bureaucracy, and related 2025-2026 directives promoting agile, cost-effective AI deployment without unnecessary administrative overhead).
*
 
* Leverages Grok (via existing GSA OneGov agreement at $0.42/agency through March 2027) in a dual-AI pipeline (Grok as primary generator + complementary model like ChatGPT Enterprise or Gemini for Government as verifier). Offsets perceived biases for neutrality; populates only verifiable fields (progressive completeness — core entity data always included, optional details filled as confidence grows).
These features ensure full compliance with FOIA (easy retrieval of historical versions and decision trails) and NARA records management requirements (permanent, auditable preservation of changes without manual intervention).
* '''Zero-base burden model''': ~80-95% automated (event-driven updates from source changes, daily gap scans); human role limited to <5% escalations (one-click approve/reject via Clearance Dashboard). No manual writing; full audit trails and immutable revisions for FOIA/NARA compliance.
 
This model delivers high freshness and broad coverage with virtually no ongoing manual workload — aligning with federal goals for efficient AI use (e.g., reducing bureaucratic barriers while preserving safeguards). It lets limited staff focus on strategic oversight rather than day-to-day maintenance, making OpenGov Encyclopedia sustainable and scalable across agencies. If piloted successfully, it could serve as a blueprint for other low-touch federal knowledge initiatives.
 
In practice, this means:
 
==== ~80–95% fully automated processing   ====
The vast majority of content creation, updates, and maintenance happens without any human intervention. The active generator AI in the dual pipeline monitors official sources continuously. When a change is detected (e.g., a new program announcement in the Federal Register, an updated funding figure on USAspending.gov, or a revised agency page on a whitelisted .gov subdomain), the system automatically:
 
* Triggers re-processing of the affected entity/page.
* Retrieves the fresh content via RAG.
* Generates a draft (filling Cargo fields and narrative text).
* Runs it through the verifier AI for cross-check.
* Publishes approved changes as a new MediaWiki revision if confidence thresholds are met.
 
This event-driven architecture ensures the knowledge graph stays current in near-real-time for high-signal changes (e.g., major legislation or funding updates), without scheduled batch jobs overwhelming resources.
 
==== Daily gap scans for completeness   ====
A lightweight nightly automated scan identifies "missing" entities or gaps in existing ones (e.g., a new sub-agency subdomain appears in the CISA .gov inventory, or a program referenced in multiple sources but lacking a dedicated page). The pipeline proactively creates or enhances pages for these, starting with core verifiable fields. This builds out the inventory progressively without manual queues.
 
==== Progressive completeness   ====
Not every field needs to be perfect on day one. The system prioritizes:
 
* Core fields (always populated if verifiable): Entity name, sponsoring agency, primary .gov link, status, and basic relationships — these form the reliable backbone of the knowledge graph.
* Optional/enhanced fields (e.g., detailed eligibility criteria, historical funding trends, cross-program links): These fill in over time as additional source evidence emerges and verification confidence grows (e.g., from 80% → 98%). Low-confidence or unverified details are clearly marked (e.g., "Pending confirmation" or blank with a note), ensuring transparency rather than forcing incomplete rejection.
 
This "good enough to start, improve over time" strategy maximizes coverage quickly while upholding accuracy.
 
==== Human involvement strictly limited to <5% escalations   ====
Humans (authorized federal staff) are only involved in exceptional cases:
 
* Verifier flags a discrepancy or low confidence on a high-impact item (e.g., a major program change affecting public services).
* Random audit samples for oversight.
* Edge cases like ambiguous source data.
 
Escalations route to a simple Clearance Dashboard — a custom MediaWiki special page or integrated tool — where staff review side-by-side diffs (draft vs. sources), then click one button: Approve, Reject, or Request Retry (with optional note). No writing, editing, or content creation is required from humans. This keeps the burden minimal (often 1-2 minutes per case) and scalable even as the graph grows to thousands of entities.
 
==== No manual writing required   ====
Federal staff never draft, rewrite, or curate text/narrative. All content originates from AI synthesis of official sources, verified through the dual pipeline. This eliminates the traditional "content team" workload that plagues many government wikis or databases.
 
==== Built-in compliance and traceability features   ====
 
* Immutable MediaWiki revisions: Every published version is permanently stored with timestamps, attribution (e.g., "GrokBot" or "GeminiBot" username for AI contributions), and diffs.
* Cargo data snapshots: Structured fields are versioned alongside pages, preserving historical states for queries or audits.
* Signed audit logs: All automated actions (ingestion, generation, verification, publish) are logged with digital signatures, timestamps, and source references — fully queryable and exportable.
* MediaWiki page history: Open to public view (or restricted as needed), allowing anyone to see the complete change timeline, compare versions, and understand evolution over time.
 
=== Always defers to originals ===
OpenGov Encyclopedia is built on the principle that it is never the authoritative source. Every page explicitly directs users back to the original federal .gov site(s) for verification, actions, applications, or any official purpose. This respects agency ownership, prevents confusion or duplication, and builds citizen trust through transparency.
 
In short, OpenGov Encyclopedia enhances discovery and understanding while always honoring the single source of truth on the original .gov—exactly what citizens and agencies need in an AI-powered era.
 
Every page incorporates clear, consistent, USWDS-styled elements that are impossible to miss. These use proven patterns like alerts for notices and footers for reassurance, adapted to MediaWiki's capabilities:
 
==== Top banner (prominent notice) ====
 
* A high-visibility alert-style box appears at the top of every page. 
* Text:  This material is provided for background context and relationships only and is not the official record.
* Styled using a custom MediaWiki template with USWDS-inspired classes (added via modified skin CSS in MediaWiki:Common.css or the USWDS-integrated skin file). Includes ARIA attributes for accessibility (role="status" or role="region" with aria-label).
 
==== Direct source link ====


* '''Always defers to originals''': Prominent banners, footers, and action buttons on every page: "Supplemental context only. Official source of truth: [direct .gov link]. Complete actions there." Builds trust and funnels traffic to primary sites.
* Placed immediately below the banner or integrated into the Cargo infobox/header. 
* Text: Official source of truth: [Primary .gov URL] — complete actions and verify details there.
* Rendered as a large, clickable USWDS-style button or link (e.g., usa-button usa-button--primary). Multiple sources are listed cleanly if applicable.


For the dual-AI verification (Generator + Verifier), pairing models with complementary strengths—and potentially offsetting biases—is smart. It enhances neutrality, reduces single-model hallucinations, and aligns with OMB's 2025-2026 emphasis on "unbiased AI" (e.g., via NIST AI RMF updates). Grok (via GSA OneGov) is a great base as the Generator—it's fast, truth-seeking, and handles real-time orchestration well.
==== Action buttons ====
'''Perceived political biases (based on 2025-2026 studies)''':
Every relevant section ends with directive buttons: 
*  Apply Now →
*  Learn More / Eligibility Details →
*  Take Action / Submit Application →
*  Styled as USWDS usa-button (primary or outline variants). All point exclusively to the official agency site, USA.gov task page, or form endpoint—no internal completion paths.
==== Footer ====
(persistent reassurance with dynamic Cargo data)


* '''Grok (xAI)''': Often rated as right-leaning or center-right (e.g., conservative nationalism, libertarian on speech/elites). Some analyses place it slightly left-of-center but less so than peers; it shows "dramatic swings" but favors right on economy/government issues.
* At the bottom of every page, styled in the USWDS identifier/footer pattern.  
* '''ChatGPT/GPT series (OpenAI)''': Consistently left-leaning (liberal bias on social/economic topics; perceived as four times more left-slanted than Google's models in user studies).
* Uses a Cargo-powered template to display live, queryable metadata: 
* '''Gemini (Google)''': Centrist to somewhat left-leaning; lowest overall bias in many tests (e.g., 97% "evenhanded" score), but slanted left on progressive values/environmentalism. Most balanced per multiple quizzes.
** Dual-AI verified from official sources
* '''Claude (Anthropic)''': Strong left-liberal bias (e.g., progressive activist lean; refused politically charged questions in tests). Unavailable now due to the directive.
** Last verified: [timestamp from Cargo field]
* '''Perplexity''': More conservative/center-right; outliers in left-leaning dominance of AI models.
** Confidence: [score from Cargo field, e.g., 98%]
* '''Llama (Meta)''': Moderate left bias, especially on social issues; mixed but often centrist.
** Always check primary .gov for authoritative information.
* General trend: Most major models lean left-of-center, reflecting training data/creators' ideologies (e.g., Western companies favor liberal values). Grok and Perplexity are the most right-leaning exceptions.


'''Recommendation: Pick two with offsetting biases'''
Cargo integration:


* '''Primary pair: Grok (right-leaning) + ChatGPT (left-leaning)''': Excellent balance—Grok as Generator for dynamic synthesis, ChatGPT as Verifier for rigorous fact-checking. Both available via GSA (OneGov for Grok, MAS/FedRAMP for ChatGPT at $1/agency until Aug 2026). This mitigates ideological skew (e.g., Grok's conservatism offset by GPT's liberalism) while ensuring high accuracy.
Verification fields (e.g., LastVerified=Date, Confidence=Float) are declared in the Cargo table. The footer template queries the current page’s data (using |where=_pageName={{PAGENAME}}) to populate values dynamically.
* '''Alternative: Grok + Gemini (centrist/left-leaning)''': If you want more neutrality, Gemini's evenhandedness (highest in Anthropic's 2025 eval) complements Grok. Available via USAi.gov and FedRAMP 20x Low ($0.47 until Sep 2026).
* '''Why two with different biases?''' Yes—studies show diverse models reduce collective blind spots (e.g., one catches the other's slant on policy topics). Avoid same-family models; aim for cross-company diversity. Test in pilot: Run sample drafts through pairs and measure neutrality via tools like Anthropic's open-source "evenhandedness" evaluator.
* '''Implementation''': Alternate roles (e.g., swap Generator/Verifier per run) or use ensemble scoring. All via APIs in a FedRAMP-compliant environment.


== Truth layer ==
This enables site-wide queries, such as sorting articles by verification date or confidence rating: 
OpenGov Encyclopedia would serve as the authoritative ground truth layer the federal government needs to feed clean, structured data into the dozens of agency LLMs and chatbots that are currently hallucinating on messy websites and PDF archives.


It is not another public-facing website competing with USA.gov or any agency domain. It is a supplemental knowledge layer that:
<pre>{{#cargo_query:
tables=VerificationMetadata
|fields=Page=Page, LastVerified=Last Verified, Confidence=Confidence
|order by=LastVerified DESC
|limit=50
}}</pre>


* Respects agency ownership by always linking back to the original source as the single point of truth.
(Can be embedded on a dashboard or oversight page.)
* Helps citizens accomplish real tasks by providing context and then directing them to official .gov destinations.
* Provides the structured “fuel” agencies need for more accurate AI-driven services.


== AI layer ==
All elements are automatic (baked into page templates and skin), consistent across the site, and meet WCAG 2.1 AA / Section 508 standards (high-contrast, keyboard-navigable, ARIA roles).
Powered by Grok through the GSA OneGov agreement (with inherited security controls), the platform uses a zero-base burden strategy:  


* Grok + automated orchestration handle ~80% of content creation, extraction, and verification.  
This design turns a potential concern into a strength.  
* Federal staff perform only exception-based, one-click attestation.


This is high-octane fuel for federal AI: a single, standardized semantic layer that reduces fragmentation, improves accuracy across government digital services, and delivers measurable value with virtually no ongoing manual workload.
* Citizens see plain-English, repeated messaging that makes the supplemental role crystal clear from the first second.
* Agencies retain full ownership: OpenGov Encyclopedia never competes—it acts as a helpful map that sends users straight to the right .gov destination, better informed and ready to act. 
* Traffic funnels to primary sites: Prominent, task-oriented buttons improve completion rates on official pages.
* Trust is reinforced: Dynamic verification details in the footer show real-time quality (timestamp + confidence score), while the Cargo backend allows easy auditing and reporting on freshness and accuracy across thousands of pages.


== Why Now?   ==
== Why Now?   ==