California news coalition launches database of police misconduct records

gorodenkoff / iStock
Seven years in the making, a database of police records on misconduct, shootings and use of force causing serious injury or death is now public on the websites of LAist and KQED in San Francisco.
The database is free to access and doesn’t require an account. Readers can search by officer name or keyword, filter by county or agency, and narrow searches to date ranges.
It’s a product of The California Reporting Project, a coalition of news organizations that joined forces to coordinate requests for documents that had been added to the public record under a 2018 law. A 2021 law further expanded the scope of what falls under public record. KQED, LAist and CapRadio in Sacramento were among the project’s founding members.
“[The law] didn’t magically make the documents public or something,” said the project’s director of research, Lisa Pickoff-White. “And so reporters were talking to each other around the state saying, ‘Hey, are you going to be sending these requests?’ … And people quickly realized that we could do this together and be more effective because we would have to send out hundreds of requests.”

Pickoff-White said the database currently holds records on more than 12,000 cases, which span 1965 through 2024. These cases make up about 1.5 million pages of records.
Because it relies on requested records, the database is not exhaustive of all police records related to its scope.
Additionally, documents with details like Social Security numbers and graphic imagery are withheld from the public database, said Pickoff-White. The project is currently considering whether to publish redacted versions of those documents and “assessing the feasibility of how to best balance transparency and people’s privacy,” she added.
And if readers find information they think shouldn’t be public, they can submit feedback asking for it to be removed.
Pickoff-White added that it’s important to realize that an officer’s name appearing in the database doesn’t automatically mean they used force or engaged in misconduct.
Since 2018, the project has received state funding and changed hands. It’s now maintained by The Investigative Reporting Program at UC Berkeley, Stanford’s Big Local News program and UC Berkeley’s Institute for Data Science. Dozens of news organizations are members now.
The project originally got funding from Roc Nation and the Sony Foundation, and it’s now entirely funded by the state. In 2023, the state set aside nearly $7 million over three years for the project to build the public database. That money runs out in June 2026.
Chair of UC Berkeley’s Investigative Reporting Program David Barstow said the project’s leaders don’t yet know how they’re going to sustain the database once the state funding ends in June 2026.
“We are, of course, exploring various options, including seeking more state funding to sustain the database or donations,” he wrote in an email to Current.

The funding covers the operational costs of software engineering and database design, requesting records from nearly 700 agencies across the state, efforts to vet and convert the documents for the database, and efforts to train journalists to use it effectively. It also covers reporting efforts from within Stanford and UC Berkeley, though all other member organizations fund their own reporting efforts.
As records came in, reporters from organizations across the state used them to uncover patterns of police misconduct and use of force. Since 2018, the coalition has published hundreds of stories about the records and the battles to access them.
KQED Editor in Chief Ethan Toven-Lindsey said the data has been, and will continue to be, essential for reporters to “uncover incredibly important stories and issues that’s relevant to our audience.”
“So the reason we will continue to be invested and involved in it is that it’s an incredibly important tool for our criminal justice reporters to have access to,” he added.
Building the database
Pickoff-White said the goal was always to release many of the documents to the public in a database, but the project worked to make documents available to member organizations as it received them.
At the start, reporters made their own records requests and pooled what they received with the project. They started by filing requests at hundreds of agencies, asking for all police records that fell under the new law. Requests cited incidents logged in the California Department of Justice’s Use of Force Incident Reporting database and asked for files related to them.
“Sometimes we disagree with the agency on what incidents qualify,” Pickoff-White said. “We may think that more incidents qualify. And so in some places, like Richmond, California, for instance, we had to sue for those records … to release these [records on] 73 dog bites. And that’s because the law specifies that we can receive cases about great bodily injury. So the question in that case is, ‘Do dog bites count as great bodily injury?’”
The court ruled in the news organizations’ favor, prompting the release of those files.
Most of that data was stored within KQED’s system in the earlier years. In 2021, Stanford’s Big Local News centralized the requests and hired a requests manager. Now, a full-time requests manager files requests on behalf of reporters and follows up on each annually in order to keep acquiring more recent records.
AI was key to building the database from the start, but large language models and generative AI have become far more efficient, allowing the project to work through records much more quickly and efficiently, Pickoff-White said.
In particular, AI sped up the task of sorting files by case. Initially, reporters used AI to transcribe audio and to run optical character recognition programs to convert documents into searchable text. Then reporters would read files and group them based on incident dates and types.
“You can get more information from a case if you’re able to read the coroner’s report and the police report and not just one police report — but for a case, you might get police reports from three different people,” Pickoff-White said. “You also might get a psychology report. We might get a report from a district attorney.”
AI could summarize files, but it wasn’t until the project started work with UC Berkeley’s Data Science Institute in 2023 that the LLMs were advanced enough to effectively extract information about the types of incidents and the dates when they occurred. The project is currently using OpenAI’s GPT-3.5 Turbo and GPT-4o mini, Pickoff-White said.
This made it easier for human reviewers to determine which documents belonged with which cases. Once the files were grouped, they would run the algorithm again to re-extract the data and verify the incident date for the public database. Human reviewers also verify incident types “to ensure that they found a sustained policy violation,” Pickoff-White said.
Pickoff-White and others also worked to use computer vision to flag documents that included graphic images or personal details like Social Security numbers. Reporters in the California Reporting Project can access those documents, which are currently withheld from the public database, if they need other information in the files. Journalistic ethics provide guidelines about dealing with personal details and publishing graphic material.
Reporting from the database
Toven-Lindsey said that the collaborative nature of the construction of the database extended to the reporting that came out of it.
Early on, KQED reporter Sukey Lewis and LAist reporter Annie Gilbertson teamed up with Thomas Peele from the Bay Area News Group and Maya Lau from the Los Angeles Times to report on obstacles to receiving requested records under the new law.
Six months after it went into effect, the group reported in the Times that some agencies hadn’t provided a single record. Others were charging high fees or ignoring court orders to provide documents. Police unions were attempting — and largely failing — to get requests blocked by judges.
Some agencies had also destroyed records days or weeks after the law went into effect. Many records had been years past their retention date requirements, but advocates say the agencies had responsibilities to preserve them as they may have already been subject to pending requests.
Other stories investigated repeated canine bites, patterns of sexual harassment and broken bones resulting from use of force.
Telling these stories, said Toven-Lindsey, is more important than worrying about which news website gets to publish them.
“For public media, our most important and highest use of our journalism … is convincing our members and convincing Californians that becoming members is valuable,” he said. “And so if the journalism we do appears on the LA Times, that isn’t problematic as long as — as this partnership has done — it identifies those reporters and credits KQED correctly.”
In 2021, KQED launched On Our Watch, an investigative reporting podcast exploring cases of police misconduct and use of force incidents. Together they show a larger picture of how the police accountability system works — and doesn’t work — in the state of California.
The second season, “On Our Watch: New Folsom,” focused on California’s most dangerous prison. The 2024 series explored the fates of two whistleblowers in the prison’s elite investigative unit: one killed by a fentanyl overdose and the other dying by suicide after taking a leave from his job to write a book about his decade at the prison he came to see as corrupt.
Toven-Lindsey said the series, which won the 2024 Investigative Reporters and Editors award for best longform journalism in audio, was “illustrative of the incredibly important journalism that can be done out of this database.”
Making the database public is a key step in “continuing to hold departments accountable and continuing the efforts to convince lawmakers and the public that accountability is critical to the operations of a democracy,” he said.