SF City Agencies Push to Replace Duplicate Images in Digital Records This Week
A coordinated effort to clean up redundant photo files across municipal databases is quietly reshaping how San Francisco stores and retrieves public records.
A coordinated effort to clean up redundant photo files across municipal databases is quietly reshaping how San Francisco stores and retrieves public records.

San Francisco's Department of Technology moved this week to accelerate a long-delayed cleanup of duplicate images embedded across the city's digital infrastructure, with a target of clearing tens of thousands of redundant files from public-facing portals before the end of the third quarter. The effort, which spans multiple city departments and touches everything from permitting records at the Department of Building Inspection to case files at the Human Services Agency, has been in planning since late 2025 but only entered active execution in the days surrounding the July 4 holiday weekend.
The timing matters. San Francisco is in the middle of a broader push to modernize its data systems ahead of a looming deadline tied to state digital-records compliance requirements taking effect January 1, 2027. Duplicate and orphaned image files — scanned permits, case photos, ID documents — have accumulated for years inside legacy databases, inflating storage costs and slowing retrieval times for staff who already work under strained conditions. The city's homelessness response teams, which rely on the Department of Homelessness and Supportive Housing's case management software, have been among the most vocal internal users pushing for the fix, according to city budget documents reviewed this spring.
The Department of Building Inspection, headquartered at 49 South Van Ness Avenue, holds some of the densest image archives in city government — years of scanned permit applications, inspection photos, and contractor license records. Staff there have been testing a duplicate-detection tool developed under a contract with a Mission District software firm since March. The tool uses perceptual hashing, a technique that identifies visually identical or near-identical images even if file names differ, to flag redundant records before a human reviewer confirms deletion.
The San Francisco Public Library's Digital Services unit, operating out of the main branch on Larkin Street at Civic Center, is running a parallel process for its digitized historical photograph collection. Librarians there estimate that roughly 12 percent of roughly 85,000 scanned images in one archival dataset were flagged as potential duplicates after an initial audit completed in May. The library is not deleting flagged images outright but instead consolidating them into a single canonical record with standardized metadata, a distinction that archivists consider important for provenance.
Meanwhile, the city's 311 service system — which processes tens of thousands of photo-attached complaint submissions each year from residents across neighborhoods from the Tenderloin to the Outer Sunset — has its own duplicate-image backlog. Photos of the same pothole, encampment, or broken streetlight submitted by multiple residents within hours of each other pile up in the system, cluttering dashboards used by street crews dispatched from the Department of Public Works yard on Cesar Chavez Street.
Storage is not free. The city spends real money on cloud and on-premises storage contracts, and redundant files compound that cost over time. City budget documents from the current fiscal year, which began July 1, 2026, list the Department of Technology's enterprise data storage line at approximately $4.2 million — a figure that department officials have said publicly they expect to reduce by renegotiating vendor contracts once the deduplication work lowers the city's actual storage footprint.
The work is not glamorous, and it rarely surfaces in the press. But for the residents filing building permits in the Mission, accessing social services in SoMa, or searching digitized photographs at the main library branch, the practical effect of a cleaner, faster records system is tangible. Slower retrieval times during peak hours have been a documented complaint in user surveys the Department of Technology published in early 2026.
City staff say the deduplication effort will move department by department through the summer, with a status report expected before the Board of Supervisors' Budget and Finance Committee in September. Residents who rely on city digital services — particularly those navigating the permitting process or accessing social services case records — can expect incrementally faster load times by fall. The harder work of updating legacy database architecture to prevent future duplication is a separate project, currently without a funded timeline.
How does this story make you feel?
Spread the word
About this article
Published by The Daily San Francisco
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News