Imagine you’ve been asked to help put together a puzzle. It’s a very large puzzle with a lot of pieces—and once you put it together, it will help you resolve an important problem. You’re told that the vast majority of the puzzle pieces you’ll need are kept behind a particular door. Naturally, you head inside and turn on the light. Before you sits a vast warehouse of cardboard boxes. Some are labeled while others are not, some are within your grasp and many others are stacked on towering shelves.

Swap the puzzle pieces in this analogy out with environmental data, and you can begin to grasp the situation faced by data leaders across the federal government working on evidenced-based environmental policy-making; especially Chief Data Officers (CDOs).

The origins of that status quo aren’t complicated: the 2018 Evidence Act created the CDO position across federal agencies, and designed the role to lead efforts to find, organize, and deliver the many “puzzle pieces” that will help us answer vital national policy questions, such as:

Is our water safe to drink, and how can we make it safer?
Are our infrastructure investments reaching communities that need them most?
How do we manage invasive species in a changing climate?

Back at the warehouse, the good news for CDOs is that there are a ton of people that work with puzzle pieces all day, every day, and who know parts of the warehouse very well since they were involved in packing and storing the boxes to begin with. The bad news for CDOs, though, is that these experts don’t report to them—and there hasn’t been a clear method for labeling and organizing the boxes for many years. Worse still is the fact that some of the puzzle pieces are for older puzzles that no one is working on anymore.

With this challenging context in mind, EPIC set out to learn more about the capacity issues environmental data leaders like CDOs face today—and to define strategies that might help them build or bolster the capacity they need when it comes to environmental data.

What We Learned

To learn more, we conducted a series of interviews with numerous data leaders in environmental agencies (e.g., EPA, NOAA, the Forest Service, DOI). Here are the key things we discovered during the course of those conversations:
We conducted interviews with leaders involved in setting up data intermediaries to find out—and here’s what we learned:

Getting buy-in on data work from every corner of the agency is challenging. Environmental agencies create “fractals of silos”—both because regional and field offices tend to have a lot of autonomy, and because scientific work itself often pushes in the direction of specialization. If you need to gain awareness of what’s spread across your warehouse—and eventually label it—you first need to get the attention of many different teams, all with different cultures and ways of operating. That process is important but time-consuming.
A lot of time is spent navigating cascading and overlapping policies. The Evidence Act, the Geospatial Data Act, the Nelson Memo, and other data-related policies have created layers of requirements that are not always straightforward for agency staff to navigate and implement in day-to-day work. Keeping with the warehouse analogy, this challenge is like focusing on relabeling existing cardboard boxes only to be told late in the effort that you actually need to use transparent boxes of a certain size for most puzzle pieces.
Dedicated communications staffing is one of the most pressing needs. Great communicators were mentioned as one of the top capacity needs to advance data work. Data leaders are asked to communicate to many audiences and it's almost impossible to over communicate about data policies and processes, and the value of data governance for agency policy goals. Unfortunately, few CDOs have dedicated communications capacity at their disposal. That needs to change.
People with other job titles frequently end up stepping into data management roles. For example, many staff originally hired as biologists end up migrating into data roles—with or without formal training in data science or management. Moreover, titles and job descriptions often don’t reflect data-related roles—which not only matters internally for accurate work and role tracking, agency talent investments, etc., but also for signaling accurately to the “external” talent market. In our analogy, people who know their way around the warehouse probably know something about driving a forklift or delivery truck, but potential risks and inefficiencies will crop up if they haven’t been trained to take on those tasks. The same is true for data management.
Partnerships aren’t yet a major part of the equation. There is so much to figure out across our warehouse that there’s often little time to think about how other organizations may be able to help with the sorting, labeling, and delivery of key data. But emerging examples, such as eBird and the Trails Stewardship Initiative, show that opportunities exist for federal agencies to tap into private and non-profit capacity in tackling data governance challenges. More partnerships will help the government take advantage of these models of innovation and emerging best practices.

What Strategies Can Help?

Across our interviews, we identified several strategies that are already being employed across federal agencies to tackle these challenges. These include:

Establishing these collaboration spaces—with a data focus—can help break down silos across regions, disciplines, and offices. Successful CoPs also allow for a more “bottom up” approach that draws people in, helps build capacity, saves time and resources, and reinforces cultural change by making data champions less isolated and more engaged with each other about their work on the ground. In other words, when done right, CoPs can enable tunneling to happen from both ends—not just from the top down. The CoPs we heard about in our research benefited from clear leadership support, but also generally had a lot of leeway to figure out their own solutions and best practices. Great facilitation, openness to external participants, and funding for in-person meetings were also cited as major enablers of success.
Some agencies use tools like these to support a variety of data projects, including collecting, reporting, integrating, and managing data, as well as studying and sharing relevant best practices and innovations. Having flexible resources like Challenges.gov and targeted RFPs to help adapt to changing policies and needs within the government can also serve as a draw for connecting with non-governmental innovators and participation in new communities of practice.
Better (and increased awareness and use of) digital tools for data-related tasks—including analytics and metadata entry–were mentioned by interviewees as another strategy for increasing capacity. Those tools can also help with the creation of data management plans, and to help ensure there’s a clear chain between creators and users. We also see potential in this context for artificial intelligence (AI) to help streamline workflows related to data in the future, assuming that privacy and security hurdles can be overcome.

We likewise discovered examples of training that could address specific needs. These include:

Training for scientists that migrate into data roles was mentioned as a need. ESIP currently operates at a Data Management Training Clearinghouse with many training options and resources.
Leadership’s ability to ask informed (but not technical questions) as an essential skill at the top and mid levels to enable progress on data work. Most agency leaders do not need to be trained data scientists or managers, but do need to know enough to have productive conversations with them. GovLab Academy and Non-Invasive Data Governance are examples of training providers specifically focused on data governance in large organizations.
Hiring and training new staff is an opportunity to set a foundation. One example of a past course that was cited as particularly useful was focused on risk and decision-making, and it had participants play alternative roles throughout the agency to better understand their point of view.

Looking Ahead: Where We See Opportunities

We see three major solution areas in this space, and looking ahead, we intend to explore where EPIC can do our part to help environmental data stewards do their work more effectively:

Understanding and communicating the value of data improvements by more explicitly linking specific policies to specific data needs or issues. Capacity usually follows environmental policy goals, not data policy goals, and so making these links is crucial.
Tools that harness GenerativeAI both require good data and can help alleviate some data governance capacity issues over the long term. For example, by enabling much faster, more accurate, and more comprehensive metadata entry. We’re kicking the tires to see how they might be helpful, for example in our recent test of AI-assisted data extraction from Army Corps of Engineer public notices.
Leveraging partnerships for better data collection, governance, and use by distilling and sharing lessons from collaborative environmental data projects—and by supporting or facilitating the creation of data intermediaries where they’re needed.

Have ideas about where we can improve data governance capacity? Did we miss something? Don’t hesitate to reach out!

Our environmental data is hidden away in unmarked boxes; we need capacity to open them up.

Environmental Policy Innovation Center

Our Focus Areas

Our Approaches

Our environmental data is hidden away in unmarked boxes; we need capacity to open them up.

Communities of Practice (CoPs).

Competitive Internal Requests for Proposals (RFPs) or Federal prize competitions.

Self-service software tools.

Scientists in data roles

Leadership

New staff

USDA Funding Ready to Pay Maryland Oyster Farmers for Nitrogen Reductions

Beetles in a Pay Stack: Stacking and Bundling in Biodiversity Credit Markets

Environmental Policy Innovation Center

Our Focus Areas

Our Approaches