NEPATEC2.0 - The Next Step in AI-Driven Permitting Innovation

The tl;dr - The Pacific Northwest National Laboratory (PNNL) has released a massive collection of environmental documents from the last 50 years across 60 agencies. The dataset, NEPATEC 2.0, represents a significant advancement over the 1.0 release. More importantly, the documents have been structured with 14 metadata properties to make them searchable in ways never before possible without requiring extensive human hours and eye strain. In addition to powering a new round of permitting tools from PNNL, the dataset is being released to the public for experimentation with or incorporation into their own projects, marking a leap forward in the availability of verifiable and accurate data in the public sphere. This data can be used in many useful ways like:

  • An applicant or agency in one state can see what happened for similar projects in other states to jump-start their research on overlapping regional and federal regulations.

  • More complex searches combining multiple variables such as “all waste management projects that mention air quality,” “any mention of the black-footed ferret in New Mexico, Texas, or Arizona,” or “projects involving the Department of Energy in the last ten years.” 

  • With the extensive catalog of Categorical Exclusions (CEs), applicants can now search across the CEs, leading to better chances of finding a CE applicable to their work or combining CEs in new ways.

Read PNNL’s full report on how they constructed the data. (PDF)


The push for better, faster permitting is underway across the US, with a wealth of books, podcasts, articles, and legislative attention, as well as a new K-POP movie. (OK, not that big, but work with me here.) 

Permitting has historically been hindered at the system level by outdated platforms, fragmented data, and disconnected digital tools, and at the content level by the inability to effectively review past work and learn from previous experiences. A team at the Pacific Northwest National Laboratory (PNNL), collaborating with the Council on Environmental Quality (CEQ) and experts across the federal agency landscape, is working to address both challenges.  

Enter the Pacific Northwest National Laboratory with a structured dataset NEPATEC (the National Environmental Policy Act Text Corpus, a testament to the enduring popularity of acronyms) and a set of tools to take advantage of this data, starting with the PermitAI project. The new dataset provides a foundation for researching and analyzing NEPA documents in ways that benefit anyone involved or interested in permitting, not just at the Federal / NEPA level.

The PNNL, NEPATEC, and PermitAI

PNNL developed PermitAI and its components in accordance with the CEQ’s NEPA and Permitting Data and Technology Standard, as well as the broader Permitting Technology Action Plan, supporting government-wide efforts to digitize and streamline environmental reviews. (Agencies were to begin adoption and implementation of these standards by August 28, 2025, meaning more data searchable and shareable via applications and tools in the coming quarters.)

It’s important to clarify the relationship between NEPATEC and PermitAI, as they sit in related but distinct places in this growing ecosystem of data and tools from the scientists at the PNNL.

NEPATEC is a structured dataset and a collection of documents with metadata layered on top to help search, combine, and add context for humans and AI models that use the data. The 2.0 release of NEPATEC contains approximately 150,000 documents and 7 million pages of data within.

PermitAI is an initiative and a collection of tools that take advantage of the NEPATEC data, as well as other data sources to come. PermitAI’s tools are currently in beta testing by Federal agencies, with more details to come on their public availability as they are tested and refined.

The key here is that NEPATEC2.0 is open-source, high-quality, and structured, allowing data to be readily used in projects or experiments by anyone. You can download it and begin tinkering with it in as long as it takes you to write some code or drop it into an existing application’s dataset.

The Sheer Volume of Documenting NEPA 

It’s no secret that the NEPA produces a lot of documentation. (Google “NEPA report length” for a fun afternoon read.) However, there are times when the sheer scale of the totals can be surprising to those not immersed in the work. For example, over the last ten years, between Environmental Impact Statements (EIS), Environmental Assessments (EA), and Categorical Exclusions (CE), the Federal government produces, on average, over 100,000 documents at more than 1,000,000 total pages a year - and that can be considered a conservative estimate. The challenge is how to learn from the millions of pages and their stories, facts, and figures within.

It’s also important to note how the majority of past NEPA documents have few standards or shared attributes. A page from PNNL’s paper on the release of NEPATEC2.0 shows how even the cover pages vary from agency to agency, report type to type, and year to year.

 
A collection of scanned cover pages of various NEPA documents, CEs, EAs, FONSIs, and others. Each with different designs, layouts, and information contained on the page.

Figure 1 from the PNNL_PermitAI_NEPATECv2_Public_Release_20_08_25.pdf

 

So, the challenge isn’t just to gather all these NEPA documents; it’s to organize and assemble them so that they can be grouped, sorted, and the information within them structured, allowing a reader to find specific facts or details without having to read the entire document.

What is NEPATEC 2.0?

This latest release is an expanded, machine-readable dataset designed to unify NEPA documents that were previously siloed and stored in numerous federal systems. The NEPATEC 1.0 contained data from 28,000 documents across  2,917 projects that underwent  Environmental Impact Statements review. Now, with this new iteration, they’ve expanded to over 120,000 documents, including EISs, EAs, CEs, and other key NEPA planning documents from 60,000 projects prepared by more than 60 agencies, encompassing decisions spanning over 50 years of data collection. 

The increased data set is primarily from the Department of Energy, the Bureau of Land Management, and the USDA (with documents mainly from the Forest Service). The PolicyAI team enhanced and expanded its methods for extracting data from PDFs, including web scraping from websites/databases, programmatic data retrieval from application programming interfaces (APIs), and direct file transfers via email.

Beyond the size of the new dataset, NEPATEC 2.0 implements standardized metadata that aligns with CEQ’s recommendations made as part of the Executive Order and establishment of the Permitting Innovation Center earlier this year. These standards provide a common language and structure for entities, such as projects, processes, and documents. Previous attempts to search across NEPA documents suffered from inconsistencies in format, structure, and the lack of metadata. This new release kick-starts the transformation of documents into data.


Why Metada is Important

The metadata fields (such as lead agency, location, and project type) enable quick filtering and finding tools relevant to a specific search. Instead of scanning hundreds of documents, a user can search for “environmental impact statements that mention the “Red-headed Woodpecker” or “categorical exclusions that mention energy transmission in Colorado.” This ability for searching and comparing is the big step we’ve been waiting for when it comes to permitting data. Read more about our thoughts on improving permitting.


The new release utilizes metadata attributes, including Lead Agency, Process Family, Process Type, Project Title, Location, Project Sponsor, Project Sector, Project Type, Document Type, Document Title, Prepared By, CE Category*, Action Description*, Section or Volume Title, and Main Document. (See the details for each below.)

 
A table of the metadata attributes from a table in the report, specifically the entities named earlier along with descriptions and data types. All types are “text” save for File, Main Document, which is “boolean.”

Table 2 from the PNNL_PermitAI_NEPATECv2_Public_Release_20_08_25.pdf

 

Finally, this release emphasized the inclusion of NEPA experts in the process to ensure accurate categorization and tagging. Without this “human-in-the-loop” step, the process would continue to be error-prone and require constant updates and changes, without guaranteeing validity. We are excited to see the virtuous loop of collaboration between technology and subject matter experts, benefiting all.

The Benefits Of This New, Organized Data

NEPATEC 2.0 is the most significant collection of NEPA documents in one place, organized, structured, and ready for cross-referencing, not just a drive full of PDFs with a search box. There is also now one location to find all this information (they got rid of redundancies where multiple agencies saved the same document). The cleaned-up metadata also specifically enables better geographic analysis, with improved ability to examine past actions by location, region, and other factors.

By releasing this data set as open-source, it allows any group to access the data and conduct their own experiments. State agencies, academic institutions, and private companies alike are welcome to use the data for their work, see the value of having standardized data in practice, and ideally, less litigation, more CEs, and faster permits because you can see the history of what worked and what hasn’t.

For example, imagine this scenario: an applicant or agency in Colorado can easily see what happened for a similar project in another state and push the boundaries of what might be possible for them. Applicants seeking precedents for their types of projects can now refine their search to specific regions in a way that was not previously possible.

The establishment of a new, single dataset is also crucial due to the multiple copies of NEPA documents that exist across agencies. Slight variations in documents can lead to issues with “sources of truth” (EPIC’s term, not PNNL’s). While redundancy is beneficial, the lead agency may produce NEPA documents that differ from those of the cooperating agency. PNNL also identified instances of reused NEPA documents for purposes that weren’t immediately apparent. It highlights the need for continued environmental professional participation in sorting and organizing documents that are unclear or lack a one-to-one match for the criteria.

PNNL and the PermitAI Initiative

The work PNNL is doing with NEPATEC has been immediately rolled into active projects in development. The biggest is the collection of tools in the PermitAI (previous PolicyAI) project. PermitAI's goals are to use AI algorithms to process and categorize data instantly, streamlining labor-intensive tasks, including automating tedious tasks such as comment tagging, processing, and summarizing. The hope is that automating early work allows staff to focus more on analytical tasks early in the process. AI can also synthesize key points from supporting literature and revise text to meet target length and reading levels, leading to higher accuracy and consistency in documents. 

With this work and new data, the PermitAI team is currently beta-testing SearchNEPA, an interactive AI-driven toolkit for federal NEPA reviewers, and ChatNEPA, which provides AI-generated responses to questions based on the NEPATEC corpus. Future applications under development include EngageNEPA and CommentNEPA. (Note to self: make new stickers that say “Put a NEPA on it.”)

The New Dataset In Other Tools

NEPATEC 2.0 is scheduled for addition to CEQ’s CE Explorer Tool by providing examples of completed Categorical Exclusions (CEs) that apply to specific categories or have similar project descriptions. It contains over 50,000 CE decisions from DOE, BLM, and USDA, serving as a valuable reference for drafting future CE documents. This ability to search across the CEs means a better chance of finding a CE applicable to their project or proposed project, or combining CEs in new ways. CEs have long been seen as an area ripe for development and untapped potential for innovation and acceleration in the world of permitting and NEPA.

As mentioned earlier, the dataset also allows for unique analysis of NEPA review distribution by geography and project type. This information can identify past environmental issues, baseline conditions, relevant literature, and impacts for areas considered for new actions. Any project that wants to find and compare NEPA information by location can benefit from this release.

AI tools, using NEPATEC 2.0, can survey large numbers of EISs and provide concise summaries of differences in proposed actions, alternatives, resources analyzed, and mitigation requirements. The new tools can help preparers understand existing documents more efficiently and inform new drafting approaches. For instance, AI could generate lists of potential mitigation measures for energy projects that affect specific species or provide an annotated list of alternatives for oil and gas projects in a particular region.

The Road Ahead

As with all projects of this nature, there’s still work to be done and improvements to be made. PNNL plans to release a 3.0 data set before the end of the year, further expanding the database. The database can also extend beyond NEPA documents to include other permitting review documents.

As the data set grows, we hope to see more permits and review processes added - not just NEPA. We also hope states looking to improve their own permitting processes will consider a similar approach to organizing and structuring their own vast collections of documents. The connection between Federal and State actions could be smoothed dramatically with shared standards. This dataset is a perfect example of high-quality, reliable, structured data for efforts outside of the immediate NEPA landscape. We’d love to see technology teams and researchers grab it and add to their own experiments and efforts. There are countless insights to be found from sifting and sorting this data with confidence in the results. 

If this release focuses on the Department of Energy, the Bureau of Land Management, and the USDA, imagine how many more thousands of documents there are across the agency landscape. There are millions more pages to be added and more details to layer on as the PNNL team iterates on their approach and gathers feedback from testers in the field.

NEPATEC 2.0, is a critical step towards a more transparent, efficient, and data-driven future for federal environmental permitting. By transforming how ecological data is accessed, analyzed, and applied, agencies, states, and applicants will be able to make more informed decisions, reduce delays, and ultimately contribute to building critical infrastructure more quickly and effectively. It doesn’t have the star power of a K-pop movie but, really, who can compete with that?

Previous
Previous

Making SRF Loans More Affordable Through Interest Rate Policies, Not Just Principal Forgiveness

Next
Next

Automatic Enrollment Policies Can Make Lead Service Line Replacement Projects More Efficient