OpenGov summary: Difference between revisions

From USApedia
Line 19: Line 19:
=== 1. What is a Knowledge Graph? (Quick Basics) ===
=== 1. What is a Knowledge Graph? (Quick Basics) ===
A knowledge graph is like a smart map of information:
A knowledge graph is like a smart map of information:
* Nodes = things (entities), e.g., "FEMA", "Disaster Assistance Program", "Stafford Act".
* Nodes = things (entities), e.g., "FEMA", "Disaster Assistance Program", "Stafford Act".
* Edges = relationships between them, e.g., "FEMA sponsors → Disaster Assistance Program", "Disaster Assistance Program is authorized by → Stafford Act", "Stafford Act funds flow to → multiple agencies".
* Edges = relationships between them, e.g., "FEMA sponsors → Disaster Assistance Program", "Disaster Assistance Program is authorized by → Stafford Act", "Stafford Act funds flow to → multiple agencies".
* The power comes from being able to ask questions across connections, like "Show me all programs authorized by laws passed after 2010 that FEMA sponsors and that have >$100M funding."
* The power comes from being able to ask questions across connections, like "Show me all programs authorized by laws passed after 2010 that FEMA sponsors and that have >$100M funding."


Line 30: Line 27:
=== 2. How MediaWiki + Cargo Creates This ===
=== 2. How MediaWiki + Cargo Creates This ===
* MediaWiki handles the "Wikipedia-like" part:
* MediaWiki handles the "Wikipedia-like" part:
**Each federal entity gets its own page (e.g., a page called "Federal Emergency Management Agency" or "Wildfire Mitigation Grant Program").
**Each federal entity gets its own page (e.g., a page called "Federal Emergency Management Agency" or "Wildfire Mitigation Grant Program").
**Pages have readable narrative text, history, discussion tabs, and look official with a USWDS (U.S. Web Design System) skin.
**Pages have readable narrative text, history, discussion tabs, and look official with a USWDS (U.S. Web Design System) skin.
**This makes it citizen-friendly — people read summaries, see context, and get directed to official .gov links.
**This makes it citizen-friendly — people read summaries, see context, and get directed to official .gov links.
* Cargo turns those pages into structured, graph-like data:
* Cargo turns those pages into structured, graph-like data:
**Templates act like fill-in-the-blank forms. For example, a "Federal Program" template might have fields like:
**Templates act like fill-in-the-blank forms. For example, a "Federal Program" template might have fields like:
***Program Name
***Program Name
***Sponsoring Agency (links to the Agency page)
***Sponsoring Agency (links to the Agency page)
***Authorizing Legislation (links to the Statute page)
***Authorizing Legislation (links to the Statute page)
***Annual Funding Amount
***Annual Funding Amount
***Eligibility Summary
***Eligibility Summary
***Primary Official URL (always points back to the real .gov site)
***Primary Official URL (always points back to the real .gov site)
***Status (Active / Proposed / Expired)
***Status (Active / Proposed / Expired)
**When someone (or the AI pipeline) fills in the template on a page, Cargo automatically stores the answers in database tables — one table per template type.
**When someone (or the AI pipeline) fills in the template on a page, Cargo automatically stores the answers in database tables — one table per template type.
***Example: All "Federal Program" templates feed into a single "Programs" table in the background, with columns matching the fields.
***Example: All "Federal Program" templates feed into a single "Programs" table in the background, with columns matching the fields.


  This is where the knowledge graph emerges:
This is where the knowledge graph emerges:
 
*Because fields can link to other pages (e.g., "Sponsoring Agency" contains a link to the "FEMA" page), Cargo knows relationships exist.
**Because fields can link to other pages (e.g., "Sponsoring Agency" contains a link to the "FEMA" page), Cargo knows relationships exist.
*The system doesn't need fancy triple-store tech (like RDF/OWL in heavier graphs); it uses simple relational tables but supports joins, list fields, and hierarchy traversal to mimic graph behavior.
 
**The system doesn't need fancy triple-store tech (like RDF/OWL in heavier graphs); it uses simple relational tables but supports joins, list fields, and hierarchy traversal to mimic graph behavior.


=== 3. Querying the Graph — The Real Power ===
=== 3. Querying the Graph — The Real Power ===
Line 71: Line 52:


* Simple lookup: "List all active programs sponsored by NOAA with funding >$50 million."
* Simple lookup: "List all active programs sponsored by NOAA with funding >$50 million."
**Cargo scans the "Programs" table, filters on Sponsor = "NOAA", Funding > 50M, Status = "Active".
**Cargo scans the "Programs" table, filters on Sponsor = "NOAA", Funding > 50M, Status = "Active".
* Relationship traversal (graph-like):
* Relationship traversal (graph-like):
**"Show every program authorized by legislation containing '42 U.S.C.' that is sponsored by an agency in the Department of the Interior."
**"Show every program authorized by legislation containing '42 U.S.C.' that is sponsored by an agency in the Department of the Interior."
***Joins the Programs table to Agencies table (via Sponsor field) and checks the Authorizing Legislation field.
***Joins the Programs table to Agencies table (via Sponsor field) and checks the Authorizing Legislation field.
* Cross-entity discovery:
* Cross-entity discovery:
**"Find all disaster-related programs connected to FEMA, including their authorizing laws and related agencies."
**"Find all disaster-related programs connected to FEMA, including their authorizing laws and related agencies."
***Uses joins and list fields (e.g., if a program has multiple linked agencies or statutes).
***Uses joins and list fields (e.g., if a program has multiple linked agencies or statutes).
* Dynamic pages:
* Dynamic pages:
**A task-oriented page like "Prepare for Wildland Fire" can embed live query results: a table of relevant programs, pulled fresh from Cargo data, with links back to official sites.
**A task-oriented page like "Prepare for Wildland Fire" can embed live query results: a table of relevant programs, pulled fresh from Cargo data, with links back to official sites.


Queries can output as:
Queries can output as:
* Tables/lists on wiki pages
* Tables/lists on wiki pages
* Maps (if coordinates are stored)
* Maps (if coordinates are stored)
* JSON/CSV exports (for feeding agency LLMs or dashboards)
* JSON/CSV exports (for feeding agency LLMs or dashboards)
* Inline dynamic content (updates whenever source data changes)
* Inline dynamic content (updates whenever source data changes)


=== 4. Why This Counts as a Knowledge Graph (Even If Lightweight) ===
=== 4. Why This Counts as a Knowledge Graph (Even If Lightweight) ===
* It has entities (pages/nodes) and typed relationships (via template fields and links).
* It has entities (pages/nodes) and typed relationships (via template fields and links).
* You can traverse connections using joins, HOLDS (for lists), WITHIN (for hierarchies), etc.
* You can traverse connections using joins, HOLDS (for lists), WITHIN (for hierarchies), etc.
* It's queryable at scale — supports parametric searches agencies need for AI (e.g., "programs related to arid land agriculture with >$50M funding").
* It's queryable at scale — supports parametric searches agencies need for AI (e.g., "programs related to arid land agriculture with >$50M funding").
* Data stays fresh and attested — tied to official .gov sources via the AI pipeline, with confidence scores and always-link-back banners.
* Data stays fresh and attested — tied to official .gov sources via the AI pipeline, with confidence scores and always-link-back banners.
* Unlike heavier graphs (e.g., Neo4j or RDF stores used in some federal pilots), it's simple, open-source, low-maintenance, and integrated with readable wiki pages.
* Unlike heavier graphs (e.g., Neo4j or RDF stores used in some federal pilots), it's simple, open-source, low-maintenance, and integrated with readable wiki pages.



Revision as of 22:52, 4 March 2026

OpenGov Encyclopedia - Executive Summary / Sales Pitch

The U.S. federal government manages one of the world's largest and most complex organizational landscapes: thousands of agencies, sub-agencies, programs, authorizing statutes, funding flows, and cross-cutting initiatives. Existing public assets like USA.gov (citizen front door), Search.gov (federated search), and USAspending.gov (spending transparency) provide essential services—but they don't deliver the unified, machine-readable semantic layer that modern agency AI systems desperately need.

Agency LLMs and chatbots are currently "starving" for reliable ground truth. Most rely on scraping inconsistent .gov websites or parsing unstructured PDFs, leading to frequent hallucinations, fragmented answers, and reduced public trust in government digital services.

OpenGov Encyclopedia closes this critical gap as a supplemental, lightweight knowledge infrastructure — never a replacement or competitor to existing .gov platforms.

Core Dual Purpose

  • Citizen-centric interface: Wikipedia-style narrative pages (MediaWiki base + USWDS federal skin) organized around real tasks people want to accomplish (e.g., "Prepare for a Wildland Fire," "Access Housing Assistance," "Navigate Federal AI Opportunities & Regulations"). Each page provides clear context, relationships, and eligibility hints — then immediately directs users to the official agency or USA.gov destination to act.
  • API-first knowledge graph: Structured, queryable data (via Cargo extension) capturing precise typed relationships (e.g., "which agency sponsors this program?", "what legislation authorizes it?", "what funding connects them?"). This becomes high-quality "fuel" for agency RAG pipelines, reducing hallucinations and enabling parametric searches (e.g., "all active programs >$50M related to climate resilience").

Knowledge graph

OpenGov Encyclopedia uses MediaWiki (the same software that powers Wikipedia) combined with the Cargo extension to function as a lightweight knowledge graph. This setup provides both human-readable pages (like Wikipedia articles) and machine-readable, structured, queryable data — perfect for serving as a "truth layer" that feeds clean information to agency AI systems while helping citizens understand federal structures.

Here's a simple, step-by-step explanation for someone not familiar with these tools:

1. What is a Knowledge Graph? (Quick Basics)

A knowledge graph is like a smart map of information:

  • Nodes = things (entities), e.g., "FEMA", "Disaster Assistance Program", "Stafford Act".
  • Edges = relationships between them, e.g., "FEMA sponsors → Disaster Assistance Program", "Disaster Assistance Program is authorized by → Stafford Act", "Stafford Act funds flow to → multiple agencies".
  • The power comes from being able to ask questions across connections, like "Show me all programs authorized by laws passed after 2010 that FEMA sponsors and that have >$100M funding."

Traditional databases or spreadsheets can store facts, but knowledge graphs excel at revealing connections and enabling complex, relationship-based searches.

2. How MediaWiki + Cargo Creates This

  • MediaWiki handles the "Wikipedia-like" part:
    • Each federal entity gets its own page (e.g., a page called "Federal Emergency Management Agency" or "Wildfire Mitigation Grant Program").
    • Pages have readable narrative text, history, discussion tabs, and look official with a USWDS (U.S. Web Design System) skin.
    • This makes it citizen-friendly — people read summaries, see context, and get directed to official .gov links.
  • Cargo turns those pages into structured, graph-like data:
    • Templates act like fill-in-the-blank forms. For example, a "Federal Program" template might have fields like:
      • Program Name
      • Sponsoring Agency (links to the Agency page)
      • Authorizing Legislation (links to the Statute page)
      • Annual Funding Amount
      • Eligibility Summary
      • Primary Official URL (always points back to the real .gov site)
      • Status (Active / Proposed / Expired)
    • When someone (or the AI pipeline) fills in the template on a page, Cargo automatically stores the answers in database tables — one table per template type.
      • Example: All "Federal Program" templates feed into a single "Programs" table in the background, with columns matching the fields.

This is where the knowledge graph emerges:

  • Because fields can link to other pages (e.g., "Sponsoring Agency" contains a link to the "FEMA" page), Cargo knows relationships exist.
  • The system doesn't need fancy triple-store tech (like RDF/OWL in heavier graphs); it uses simple relational tables but supports joins, list fields, and hierarchy traversal to mimic graph behavior.

3. Querying the Graph — The Real Power

Cargo lets anyone (humans or machines via API/JSON exports) run queries across the data. These queries reveal connections automatically.

Examples in OpenGov Encyclopedia context:

  • Simple lookup: "List all active programs sponsored by NOAA with funding >$50 million."
    • Cargo scans the "Programs" table, filters on Sponsor = "NOAA", Funding > 50M, Status = "Active".
  • Relationship traversal (graph-like):
    • "Show every program authorized by legislation containing '42 U.S.C.' that is sponsored by an agency in the Department of the Interior."
      • Joins the Programs table to Agencies table (via Sponsor field) and checks the Authorizing Legislation field.
  • Cross-entity discovery:
    • "Find all disaster-related programs connected to FEMA, including their authorizing laws and related agencies."
      • Uses joins and list fields (e.g., if a program has multiple linked agencies or statutes).
  • Dynamic pages:
    • A task-oriented page like "Prepare for Wildland Fire" can embed live query results: a table of relevant programs, pulled fresh from Cargo data, with links back to official sites.

Queries can output as:

  • Tables/lists on wiki pages
  • Maps (if coordinates are stored)
  • JSON/CSV exports (for feeding agency LLMs or dashboards)
  • Inline dynamic content (updates whenever source data changes)

4. Why This Counts as a Knowledge Graph (Even If Lightweight)

  • It has entities (pages/nodes) and typed relationships (via template fields and links).
  • You can traverse connections using joins, HOLDS (for lists), WITHIN (for hierarchies), etc.
  • It's queryable at scale — supports parametric searches agencies need for AI (e.g., "programs related to arid land agriculture with >$50M funding").
  • Data stays fresh and attested — tied to official .gov sources via the AI pipeline, with confidence scores and always-link-back banners.
  • Unlike heavier graphs (e.g., Neo4j or RDF stores used in some federal pilots), it's simple, open-source, low-maintenance, and integrated with readable wiki pages.

Safeguards

  • Authoritative sourcing only: Pulls exclusively from official federal sources — Federal Register API, eCFR, USAspending APIs, agency databases, and the full inventory of .gov domains/subdomains from CISA's current-federal.csv whitelist (~1,000+ executive branch domains, including major agencies like EPA, NASA, FEMA, NOAA, and their subdomains).
  • AI-first with government-grade rigor: Leverages Grok (via existing GSA OneGov agreement at $0.42/agency through March 2027) in a dual-AI pipeline (Grok as primary generator + complementary model like ChatGPT Enterprise or Gemini for Government as verifier). Offsets perceived biases for neutrality; populates only verifiable fields (progressive completeness — core entity data always included, optional details filled as confidence grows).
  • Zero-base burden model: ~80-95% automated (event-driven updates from source changes, daily gap scans); human role limited to <5% escalations (one-click approve/reject via Clearance Dashboard). No manual writing; full audit trails and immutable revisions for FOIA/NARA compliance.
  • Always defers to originals: Prominent banners, footers, and action buttons on every page: "Supplemental context only. Official source of truth: [direct .gov link]. Complete actions there." Builds trust and funnels traffic to primary sites.

For the dual-AI verification (Generator + Verifier), pairing models with complementary strengths—and potentially offsetting biases—is smart. It enhances neutrality, reduces single-model hallucinations, and aligns with OMB's 2025-2026 emphasis on "unbiased AI" (e.g., via NIST AI RMF updates). Grok (via GSA OneGov) is a great base as the Generator—it's fast, truth-seeking, and handles real-time orchestration well. Perceived political biases (based on 2025-2026 studies):

  • Grok (xAI): Often rated as right-leaning or center-right (e.g., conservative nationalism, libertarian on speech/elites). Some analyses place it slightly left-of-center but less so than peers; it shows "dramatic swings" but favors right on economy/government issues.
  • ChatGPT/GPT series (OpenAI): Consistently left-leaning (liberal bias on social/economic topics; perceived as four times more left-slanted than Google's models in user studies).
  • Gemini (Google): Centrist to somewhat left-leaning; lowest overall bias in many tests (e.g., 97% "evenhanded" score), but slanted left on progressive values/environmentalism. Most balanced per multiple quizzes.
  • Claude (Anthropic): Strong left-liberal bias (e.g., progressive activist lean; refused politically charged questions in tests). Unavailable now due to the directive.
  • Perplexity: More conservative/center-right; outliers in left-leaning dominance of AI models.
  • Llama (Meta): Moderate left bias, especially on social issues; mixed but often centrist.
  • General trend: Most major models lean left-of-center, reflecting training data/creators' ideologies (e.g., Western companies favor liberal values). Grok and Perplexity are the most right-leaning exceptions.

Recommendation: Pick two with offsetting biases

  • Primary pair: Grok (right-leaning) + ChatGPT (left-leaning): Excellent balance—Grok as Generator for dynamic synthesis, ChatGPT as Verifier for rigorous fact-checking. Both available via GSA (OneGov for Grok, MAS/FedRAMP for ChatGPT at $1/agency until Aug 2026). This mitigates ideological skew (e.g., Grok's conservatism offset by GPT's liberalism) while ensuring high accuracy.
  • Alternative: Grok + Gemini (centrist/left-leaning): If you want more neutrality, Gemini's evenhandedness (highest in Anthropic's 2025 eval) complements Grok. Available via USAi.gov and FedRAMP 20x Low ($0.47 until Sep 2026).
  • Why two with different biases? Yes—studies show diverse models reduce collective blind spots (e.g., one catches the other's slant on policy topics). Avoid same-family models; aim for cross-company diversity. Test in pilot: Run sample drafts through pairs and measure neutrality via tools like Anthropic's open-source "evenhandedness" evaluator.
  • Implementation: Alternate roles (e.g., swap Generator/Verifier per run) or use ensemble scoring. All via APIs in a FedRAMP-compliant environment.

Truth layer

OpenGov Encyclopedia would serve as the authoritative ground truth layer the federal government needs to feed clean, structured data into the dozens of agency LLMs and chatbots that are currently hallucinating on messy websites and PDF archives.

It is not another public-facing website competing with USA.gov or any agency domain. It is a supplemental knowledge layer that:

  • Respects agency ownership by always linking back to the original source as the single point of truth.
  • Helps citizens accomplish real tasks by providing context and then directing them to official .gov destinations.
  • Provides the structured “fuel” agencies need for more accurate AI-driven services.

AI layer

Powered by Grok through the GSA OneGov agreement (with inherited security controls), the platform uses a zero-base burden strategy:  

  • Grok + automated orchestration handle ~80% of content creation, extraction, and verification.  
  • Federal staff perform only exception-based, one-click attestation.

This is high-octane fuel for federal AI: a single, standardized semantic layer that reduces fragmentation, improves accuracy across government digital services, and delivers measurable value with virtually no ongoing manual workload.

Why Now?  

LLMs are starving for clean, structured “ground truth.” Most RAG pipelines today scrape inconsistent .gov websites or parse PDFs, leading to frequent hallucinations and unreliable outputs.  

OpenGov Encyclopedia closes this gap by providing:

  • Precise, typed relationships (which agency sponsors which program? Which legislation authorizes it? What funding flows connect them?)
  • Parametric search (e.g., “all active programs with >$50M funding related to arid land agriculture”)
  • Real-time freshness from monitored sources
  • Full audit trail and human attestation for compliance

Federal AI adoption is accelerating under OMB guidance and America's AI Action Plan — but fragmentation and poor data quality undermine it. Existing federal knowledge graph efforts (e.g., CDO Council's Fuels Knowledge Graph for wildland fire metrics, USGS GeoKB for geospatial semantics, NASA people/mission graphs) prove the power of structured relationships but remain domain-siloed or internal. OpenGov Encyclopedia unifies this at low cost: open-source stack (MediaWiki + Cargo), near-zero new infra, phased pilot in high-value areas (e.g., disaster resilience, small business/housing assistance — cross-agency priorities).

Complements the ecosystem

| Platform              | Primary Role                                      | How OpenGov Encyclopedia Complements It                                      |

|-----------------------|---------------------------------------------------|-----------------------------------------------------------------------------|

| Agency websites   | Primary authoritative content and services        | Creates concise summaries + relationship maps; always links back to the original agency page as the source of truth |

| USA.gov           | Citizen front door for navigation & task completion | Provides deep context and task-oriented discovery so users reach USA.gov (or agency sites) better informed and ready to act |

| Search.gov        | On-site search across federal domains             | Supplies structured entities and JSON-LD sitemaps for richer, more accurate results |

| USAspending.gov   | Raw spending & award data                         | Adds narrative explanations and program relationships around the numbers    |


PlatformRoleHow OpenGov Encyclopedia HelpsAgency websitesPrimary content & servicesAdds concise summaries + cross-agency relationship maps; always links backUSA.govCitizen navigation & task starting pointProvides deeper context so users arrive informed & ready; boosts task completionSearch.govFederated search across .govSupplies structured entities/JSON-LD for richer resultsUSAspending.govRaw spending dataLayers narrative explanations, program connections around the numbers

Practical, Task-Oriented Value  

The platform is organized around real tasks people want to accomplish, not just agency names. Examples of built-in topic/task pages include:

- Prepare for a disaster (links relevant FEMA, NOAA space weather, and HHS programs + direct USA.gov action links)

- Understand AI regulations and opportunities (cross-agency view of NIST standards, grant programs, and policy updates)

- Find housing or small-business assistance (structured program finder with sponsor, eligibility hints, and official application links)

- Research space weather impacts (connects NOAA monitoring, research programs, and emergency response frameworks)

Every task page gives context and relationships, then immediately directs users to the official agency or USA.gov page to complete the action.

Zero-Base Burden Strategy (80/20 Governance Model)  

  • 80% Automated Orchestration — Grok monitors a small, curated list of high-signal sources. When a change is detected, it auto-drafts the update and populates Cargo fields.
  • 20% Human Attestation — Staff simply click “Approve” on the Clearance Dashboard. No writing required.
  • Result — 90%+ reduction in manual labor while maintaining full federal control and compliance.

Cost & Next Steps  

- Near-zero new infrastructure cost (MediaWiki + Cargo open-source; Grok already available via OneGov).

- Phased pilot on high-value task areas (AI initiatives, space weather, disaster preparedness, housing assistance) in weeks.

Next Steps (Executive-Actionable)  

1. Proof of Concept Review — View the live “AI Policy & Space Weather Task” prototype running on MediaWiki + Cargo.  

2. Feasibility Brief — 30-minute technical call with GSA OneGov leads to confirm Grok-to-wiki pipeline interoperability.  

3. Governance Workshop — Define “Single Source of Truth” protocols that respect agency content ownership and prevent duplication.

Closing

OpenGov Encyclopedia is not another website — it is the clean, structured fuel for the federal AI ecosystem and a helpful map that respects agency ownership while helping citizens accomplish real tasks.

We are prepared to demonstrate the prototype and tailor the approach to your priorities.