San Francisco's municipal agencies are sitting on tens of thousands of duplicate digital images spread across Planning Department servers, SFMTA databases, and the city's centralized DataSF repository — and the problem, according to people who work with those systems, has reached a point where it is actively slowing down public records requests and routine operations.
The issue surfaced publicly this spring when the city's Department of Technology flagged duplicate imagery as a contributing factor in storage cost overruns during a budget review period ending June 30. Redundant files, including duplicated street-condition photos, permit inspection images, and homelessness encampment documentation, were identified as consuming a disproportionate share of cloud storage capacity across multiple departments.
Why This Matters Right Now
San Francisco's digital infrastructure push accelerated sharply after 2022, when Mayor London Breed's administration prioritized moving city records to cloud platforms. That shift brought efficiency gains but also a sprawl problem: departments uploading the same images through different workflows, with no automated system flagging the redundancy. The Planning Department on Seventh Street and the Department of Public Works both use imagery extensively for permit and inspection documentation, and cross-departmental uploads of the same site photos are common.
Archivists at the San Francisco History Center at the Main Library on Larkin Street have been working through their own version of this challenge in the analog-to-digital conversion of historical records. Librarians there have noted that digitization projects from the early 2010s produced multiple scans of the same photographs at different resolutions, creating version-control headaches that have persisted for more than a decade.
The cost dimension is concrete. Cloud storage pricing for government contracts, while lower than consumer rates, still means that every gigabyte of redundant imagery represents real budget dollars. Analysts working on civic technology projects have estimated, in general terms, that duplicate file elimination in mid-size city agencies can reduce storage footprints by 20 to 40 percent — a range that, applied to San Francisco's scale, could translate to meaningful annual savings at a time when the city is managing a multi-hundred-million-dollar budget shortfall.
What Experts and Officials Are Recommending
People familiar with the city's technology operations point to several approaches being discussed. One centers on deploying perceptual hashing tools — software that identifies visually similar images even when file names or metadata differ — across department servers. This kind of automated deduplication has been used by large tech firms based in SoMa and the Civic Center corridor for years, but municipal adoption has lagged.
The nonprofit Code for San Francisco, which runs volunteer civic tech projects out of coworking spaces in the Mid-Market neighborhood, has previously flagged data hygiene as a foundational issue in open government work. Volunteers there have documented cases where the same public dataset image appears under multiple catalog entries on the DataSF portal, making automated analysis unreliable.
The SFMTA, which maintains an extensive archive of street and transit infrastructure photography used for everything from Muni stop planning to Vision Zero documentation, has been in discussions with the Department of Technology about standardizing upload protocols as part of a broader IT reform effort tied to the fiscal year 2026-27 budget cycle. No specific program launch date has been confirmed publicly.
Practically speaking, city vendors and staff photographers working on projects from the Tenderloin to the Bayview have been advised, informally, to adopt consistent file-naming conventions and to check existing databases before uploading new images. That guidance is not yet codified in a formal city policy, according to public meeting records from the city's IT governance committee.
For San Franciscans who interact with public records — journalists, researchers, neighborhood groups filing complaints about street conditions — the practical upshot is that cleanup efforts, if funded and executed, should eventually mean faster responses and more reliable search results on platforms like DataSF. The Department of Technology is expected to present formal deduplication proposals to the Board of Supervisors' Government Audit and Oversight Committee later this summer.