The Daily San Francisco

San Francisco news, every day

News

SF City Records Buried in Duplicate Scans: What Officials and Experts Are Saying

A growing backlog of duplicate digital images in the city's public records systems is drawing scrutiny from archivists, open-government advocates, and tech specialists who say the problem is larger than anyone has admitted.

By San Francisco News Desk · Published 4 July 2026, 11:57 am

3 min read

SF City Records Buried in Duplicate Scans: What Officials and Experts Are Saying
Photo: Photo by Gildo Cancelli on Pexels

San Francisco's Department of Technology and the City Clerk's office are facing mounting pressure to address a years-long accumulation of duplicate scanned images clogging the city's public records infrastructure — a problem that archivists say is consuming server capacity, slowing public records request fulfillment, and undermining the integrity of digital document archives that residents and journalists rely on daily.

The issue surfaced publicly this spring when the San Francisco Public Library's History Center on Larkin Street flagged inconsistencies in its digitized collection, discovering that hundreds of scan batches uploaded between 2021 and 2024 contained redundant image files that had never been deduplicated. The library's digital preservation team noted that the problem was not isolated to one department.

A Problem That Runs Across City Systems

City Hall's public-facing records portal, which handles everything from Planning Department permit documents to Board of Supervisors meeting archives, has long relied on third-party document management software. Technology specialists who work with municipal archives say that without automated deduplication protocols built into ingestion pipelines, agencies that batch-scan paper documents — as San Francisco departments routinely did during the COVID-19 court and office closures of 2020 and 2021 — tend to generate large volumes of identical or near-identical image files that stack up undetected.

Open government advocates at the San Francisco chapter of the League of Women Voters have raised the issue in public comment sessions before the Government Audit and Oversight Committee, arguing that inflated file counts make it harder for the public to locate authoritative versions of city documents. The problem is practical: when a permit record exists in four near-identical scanned versions, a resident searching the Planning Department's online portal at 1650 Mission Street cannot easily determine which version is the official one.

Digital archivists consulted on background — professionals who work with city and county records systems across California — say the San Francisco situation reflects a statewide gap. California's Government Code mandates retention schedules for public records but does not specify technical standards for deduplication or file integrity verification during digitization. That leaves individual departments to set their own practices, with uneven results.

What City Officials Are Being Asked to Do

The Board of Supervisors' Budget and Legislative Analyst office received a formal request in May 2026 to examine the cost of a citywide duplicate-image remediation project. Estimates circulating among city IT staff place the scope of the problem at several terabytes of redundant data across the Department of Building Inspection, the Assessor-Recorder's office at City Hall's Van Ness Avenue complex, and the City Attorney's document repository.

The San Francisco Department of Technology, which oversees the DataSF platform, has not publicly released a timeline or cost figure for remediation. Experts in municipal records management say that for a city of San Francisco's size — with roughly 900,000 residents and dozens of departments generating documents daily — a serious deduplication audit typically runs between $400,000 and $1.2 million depending on the depth of the review and whether legacy systems require manual inspection.

The Lawyers' Committee for Civil Rights of the San Francisco Bay Area, which regularly files California Public Records Act requests on behalf of clients, has noted in public filings that delayed or confused responses to records requests are becoming more frequent. Advocates stop short of attributing those delays solely to duplicate image problems, but say the overall state of city digitization needs independent review.

Mayor Daniel Lurie's administration, which took office in January 2026, has not yet issued a formal policy position on the records infrastructure question. The Department of Technology's next public presentation to the Board of Supervisors' Government Audit and Oversight Committee is scheduled for September 2026, and open-government groups say they intend to push for a specific line item addressing duplicate-image cleanup in the next budget cycle. For residents trying to pull permit histories or planning documents in the meantime, archivists recommend cross-referencing the DataSF open data portal with the specific department's own records counter — and requesting a certified copy when document authenticity matters.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.