Digitally catacombed within the University of California Riverside’s Center for Bibliographical Studies and Research (CBSR) lies an archive with an estimated 100 million pages of preserved California newspapers.
Spearheading the conservation effort is Brian Geiger, CBSR director. His largest project, the California Digital Newspaper Collection (CDNC), is a repository of historical California newspapers published from 1846 to the present. Publications, such as the Californian, the state’s first newspaper, and the Daily Alta California, its first daily newspaper, call the CDNC home – archived alongside current California newspapers in PDF format to preserve the Golden State’s history.
“The big hole from my perspective in preservation is the fact that contemporary papers aren’t generally preserved in any way,” says Geiger. “Historians and family researchers, decades from now, looking back at the 2000s and 2010s are gonna see this big gap in coverage.”
To try and close that gap, the project is collecting newspaper PDFs from publishers, and “dozens and dozens” of news media publishers have signed on, he says, including two well-known Inland Empire publications, the Press Enterprise in Riverside and the Desert Sun in Palm Springs.
If they [publishers] wanna participate, then we can put those in the CDNC, or if they don’t want them accessible,” Geiger says, “we can embargo them for a certain amount of time and just preserve the PDF so that they’re around for future researchers.”
As a public service and at no cost to the publishers, Geiger and his team can preserve old editions as well.
Geiger sees this work as a public service. He and his team have created a portal where publishers just log into the portal and upload the PDFs. And then every six months the team processes all the PDFs that they’ve collected over the previous six months.
“If you were going to the CDNC and just limit your browsing from 2020 to the present, pretty much everything you see there is born digital. All the recent papers,” Geiger says.
Copyright issues are always a concern when it comes to digitization, says Geiger, especially as ownership in the industry is changing so quickly, it can be a challenge to find who owns copyright.
“We can digitize up through 1963 without infringing on copyright, and then after 1963 it becomes a little bit more complicated,” says Geiger.
Regardless of copyright status, Geiger’s team tries to work with publishers.
“If we digitize beyond ’63, you know, at least reach out to them, and say ‘We’re interested in digitizing your title. Would you be okay with this?’ I don’t wanna step on people’s toes if I can avoid it.”
The CDNC is free to browse by making an online account.
“If there’s a specific topic, you can search by keyword and limit by title, date, whatever you wanna do,” says Geiger. “We have a map of California. You can find titles in a particular county or region of the state.”
A team of four full-time staff and students work meticulously to ensure the pages are properly preserved, he says.
Geiger says they usually work with microfilm but they also work with original newsprint. He says they scan the film to create high resolution TIF images which they then save to become the preservation copies.
“Then you run those TIF images through specialized software that both OCR’s it, so you have texts there that people can search and read, and it also basically zones the page, so you know that this is a paragraph, this is a article, a column, and so forth. It goes through the software and then creates the deliverables that we then ingest into the CDNC and host.”
UCR’s first foray into preserving California newspapers began in the early 1990s with federal project funding to identify all surviving California newspapers, says Geiger.
“That lasted from the early 90s to the early 2000s – before I came on board,” says Geiger, citing the archive’s origin story:
“In the early 2000s, we got wind of the fact that a number of microfilm companies were either going out of business or looking to get rid of their stock of film,” he says.
So, more than 10 years ago, Geiger says, the archive acquired newspaper microfilm from about half a dozen different vendors, he says, citing BMI and Sunnyvale as the biggest. They still exist and eventually became what is called the California Newspaper Microfilm Archive.
It was no small task preserving all the master negatives housed within the Microfilm Archive, but access to all that microfilm paid off a few years later when the CBSR received another federal Library of Congress grant to digitize California newspapers. It’s called Chronicling America.
“It’s still ongoing,” the director says. “We have a grant from them right now.”
Geiger and his team were digitizing about 50 thousand pages a year. Now they are doing probably 5 million pages annually.
“So, the project it’s continued, and it’s grown over the years,” he says. “I’m guessing there’s about 150 million pages of California newspapers that we could do.”
Beyond the CDNC providing access to researchers, academics,and documentarians, Geiger believes preserving papers ensures they’ll be around for generations to come:
“The CDNC is kind of the access side of what we do, but we very much continue to be in the work of preservation and trying to preserve both historic and contemporary papers as best we can. It’s a very big state. There’s a lot to do.”
The CDNC is currently accepting new uploads. Geiger says they even have several high school and college papers that have uploaded or are still regularly uploading. In order to upload, users will need to create a free online account before they can start uploading files to the collection.