The call to Angela Rasmussen came out of the blue and posed a troubling question. Had she heard the rumour that key data sets would be removed from the U.S. Centers for Disease Control and Prevention’s website the next day?
It’s something Rasmussen had thought could never happen.
“It had never really been thought of before that CDC would actually start deleting some of these crucial public health data sets,” said the University of Saskatchewan virologist. “These data are really, really important for everybody’s health — not just in the U.S. but around the world.”
The following day, Jan. 31, Rasmussen started to see data disappear. She knew she needed to take action.
Rasmussen reached out to a bioinformatician friend, who knew how to preserve data and make backup copies of websites. With others, they scrambled to preserve the data in case it was deleted.
On Data Purge Eve, many people stayed up late to save the CDC website. @charles_gaba downloaded the whole thing.
A group of us are working to make these preserved data an accessible & publicly available resource. More to come, but get started here https://t.co/cljgv1U9lP
“We set about archiving the entire CDC website,” said Rasmussen.
Since then, Rasmussen and her colleague have teamed up with others like American health-care data analyst Charles Gaba and turned their attention to other sites with health data, preserving information from departments and agencies like the Food and Drug Administration (FDA) and the Centers for Medicare & Medicaid Services.
Rasmussen said the publication of some studies, such as three that would shed light on H5N1 bird flu, also appear to be affected by the change of administration.
Rasmussen is just one of several Canadian residents who have joined what has become an international guerilla archiving effort to preserve copies of U.S. government web pages and data being rapidly taken offline by U.S. President Donald Trump’s administration.
An analysis by the New York Times identified thousands of pages taken down in the days following Trump’s inauguration, in part as a result of Trump’s executive order targeting diversity initiatives.
Among the pages observers have seen disappear are ones that monitor HIV infections, deal with health risks for youth and contain census data, education data and information about assisted reproduction technologies. A website containing the names of those charged in connection with the Jan. 6, 2021, attack on the Capitol was also removed.
A comparison of the U.S.data.gov home page on Jan. 17, before Trump’s inauguration, and Wednesday, shows 522 fewer data sets.
Some commenters on social media liken the disappearing data to book burning in the 1930s.
Asked about the changes to the CDC’s website, the agency said it is part of changes across the Department of Health and Human Services (HHS).
“All changes to HHS and HHS division websites/manuscripts are in accordance with President Trump’s Jan. 20 executive orders,” senior press officer Rosa Norman said in an emailed response.
The Environmental Protection Agency (EPA) has yet to respond to questions from CBC News.
It is not known whether the data still exists on government servers.
Those archiving the data argue that it was paid for with U.S. tax dollars and should be in the public domain, accessible to researchers and everyone else.
The government has argued that the deletions are not necessarily final and that the information can be accessed via the Internet Archive’s Wayback Machine.
Tuesday, a U.S. federal judge granted a temporary order, directing the CDC and the FDA to restore public information on their websites while the courts hear a lawsuit challenging the Trump administration’s decision to remove it.
Internet archives sometimes miss data
Brewster Kahle is the founder of the Internet Archive (IA), which crawls the web and archives copies of websites. His non-profit organization is part of the End of Term Web Archive project which has documented U.S. government websites at the end of each administration since 2004 and launched the Democracy’s Library project, a collection of government research and publications from around the world.
However, the Internet Archive’s crawlers don’t always pick up data sets and databases.
Those working to preserve U.S. government data sets are downloading them and, in many cases, storing them with the help of the Internet Archive.
“The efforts of these co-operating entities has yielded much, much more data being archived this time than other times,” said Kahle. “I think that’s an indication of people being extremely enthusiastic about trying to make sure that the government record is kept whole.”
Kahle said to date, the U.S. government hasn’t gone after government data stored by the Internet Archive.
“That would be highly unusual. We’ve never had anything like that happen,” Kahle said.
However, should that occur, its U.S. data centre is backed up in British Columbia by the Internet Archive Canada and vice versa. Kahle said the Democracy’s Library project is also housed in Canada.
“That’s what libraries do. We’re there to keep a record of what has happened — that’s a role that we play,” said Kahle. “Canada is always there to help out the United States Internet Archive.”
At the University of Guelph, geography professor Eric Nost is working with the Environmental Data Governance Initiative (EDGI) to preserve data from the EPA — particularly related to climate change and environmental justice.
“This data has a lot of importance in terms of being able to track environmental changes, to identify, for instance, what places are most burdened by pollution in the U.S., where the pollution is, where climate hazards exist,” Nost said. “That’s obviously very important to Americans, but it also has real relevance to Canadians as well.”
For example, some Canadian cities are downwind from American factories, he said.
“Having access to what’s coming out of the smokestacks is also really important for us.”
Nost said he knows of at least three other people in Canada also working to archive environmental data. He said his group has prioritized 60 data sets or tools, archived most of them and reconstructed tools like the EPA’s EJScreen.
Nost said his group is also finding that some websites are currently blocked to anyone accessing them from outside the U.S. such as the Federal Emergency Management Agency’s national risk index map.
Matt Price, an associate professor at the University of Toronto who is also working with EDGI, says preserving the data is important because the U.S. is the biggest scientific powerhouse in the world.
“We should care about American data because the American federal government has been the default custodian of large quantities of data that the whole world needs,” Price said.
Jessica Mahr is a Toronto-based employee for the Environmental Policy Innovation Center helping co-ordinate different groups trying to archive U.S. government environmental data. She says the data and tools being removed affects research that informs policy to improve quality of life.
“Without those tools you’re not able to have an informed understanding of who is suffering and then where to provide them with funding or programs that would improve their lives,” Mahr said.