San Francisco's effort to move its sprawling municipal archives online has run into a stubborn technical problem: thousands of duplicate images clogging the city's document management systems, slowing public records requests and frustrating the departments that depend on clean data to do their jobs. The issue surfaced publicly this spring when the Controller's Office flagged data quality problems during a routine audit of the city's enterprise content management platform, which handles records for agencies ranging from the Department of Building Inspection to the Office of the Assessor-Recorder.
The timing is uncomfortable. City Hall has staked significant political capital on its open-government agenda, and the backlog of duplicate scans cuts directly against that. San Francisco has processed more than 1.2 million public records requests since 2019, according to figures the Controller's Office has cited in budget presentations, and any drag on that system has real consequences for residents waiting on permit histories, property documents, and case files.
Why It's Happening Now
The duplication problem is partly a product of speed. Several departments rushed scanning operations during the COVID-19 shutdowns, when physical access to City Hall at 1 Dr. Carlton B. Goodlett Place was restricted. Batches of documents were scanned multiple times by different staff members working in parallel, and without a unified deduplication protocol, the redundant files were ingested into the city's Laserfiche system — the records platform used by multiple departments — rather than flagged and discarded.
Technology specialists who work with municipal records systems say this is not unique to San Francisco. But they also say the city's fragmented departmental IT structure makes it harder to fix. The San Francisco Department of Technology, which oversees the city's core infrastructure, has been working with individual departments since late 2025 to implement image-hash matching — a technique that compares files at the binary level to identify exact copies — but the rollout has been uneven.
The Tenderloin-based community legal clinic Lawyers' Committee for Civil Rights of the Bay Area, which regularly submits public records requests on behalf of clients dealing with housing and permitting disputes, has noted longer turnaround times on document-heavy requests over the past several months. Staff there have described cases where the same page appears multiple times in a records response, complicating review and adding time to already slow processes. The clinic did not attribute the delays solely to duplicate images but flagged the pattern as a concern.
What Experts and Officials Are Saying
Archivists and records management professionals who spoke at a March 2026 panel hosted by the California State Association of Counties pointed to the need for front-end quality controls — meaning standards set before a document ever enters a system — rather than back-end cleanup. That cleanup, they warned, is costly. One widely cited benchmark in the field holds that correcting a data error after ingestion costs roughly ten times more than preventing it at the point of capture.
At San Francisco's Budget and Appropriations Committee, members have asked the Department of Technology to report back by the end of the third quarter of 2026 with a plan for clearing the backlog and hardening intake procedures. The committee did not attach additional funding to that request at its June session, leaving the department to work within its existing fiscal year 2026-27 allocation.
The San Francisco Public Library's History Center on Larkin Street, which manages a separate tranche of digitized historical records, has so far avoided the worst of the problem because it adopted an in-house deduplication workflow before its mass scanning phase began. Records librarians there have been asked informally to share their documentation with other city departments.
For residents and businesses waiting on documents — particularly contractors pulling permit histories for properties in SoMa, the Mission, or the Sunset — the practical advice from city technology staff is to specify file dates and document types as precisely as possible when submitting requests through the city's online portal at sf.gov. Narrower requests generate smaller result sets and are less likely to surface duplicate materials. The Department of Technology says a public-facing update on the deduplication timeline is expected before Labor Day.