San Francisco's public agencies are sitting on millions of duplicate image files across aging digital systems — and city technology officials say the redundancy is eating into storage budgets, slowing down public records requests, and complicating the shift to modernized databases across departments.
The issue isn't abstract. At the San Francisco Planning Department on Rauch Street, staff processing permit applications have long flagged the problem of duplicate project photographs clogging shared drives — files uploaded multiple times by applicants, scanned twice by clerks, or migrated imperfectly during past system upgrades. Similar complaints have surfaced at the Department of Public Works and within BART's internal asset-management infrastructure, where maintenance crews photograph tunnel sections, escalators, and station components as part of routine inspection logs.
The timing matters. The city is mid-way through a multi-year digital infrastructure overhaul anchored partly by the San Francisco Department of Technology's DataSF initiative, which has pushed agencies to centralize and open their records. Duplicate image files — some appearing dozens of times in the same archive — create classification errors, inflate cloud-storage costs, and undermine the integrity of records that lawyers, journalists, and residents request under California's Public Records Act.
What the Experts Are Saying
Data management specialists who work with Bay Area government clients describe duplicate image replacement — the process of identifying redundant files using hash-matching or perceptual algorithms, then substituting a single authoritative copy — as one of the least glamorous but most consequential forms of digital housekeeping a public agency can undertake. The process typically involves running automated deduplication software against an archive, flagging matches for human review, and then replacing or deleting confirmed duplicates while preserving metadata chains for legal continuity.
For city archivists, the stakes are higher than they might appear. California Government Code Section 34090 governs how municipalities retain and destroy public records. Any automated deletion of image files — even duplicates — must be validated against retention schedules or risk running afoul of state law. That legal constraint is one reason many San Francisco departments have let redundant files accumulate rather than tackle deduplication without a formal policy framework.
The San Francisco Public Library's digital collections team at the Civic Center branch has dealt with a version of this problem for years, particularly within its historical photograph archive, which spans tens of thousands of images of the city's neighborhoods from the Mission District to the Sunset. Librarians there have piloted open-source deduplication tools to identify near-identical scans of the same physical photograph — a common artifact of multi-pass digitization projects run in the 2000s and early 2010s.
Cost and the Path Forward
Cloud storage is not cheap at municipal scale. Standard enterprise-tier object storage used by government agencies runs roughly $0.02 to $0.023 per gigabyte per month, and city systems can hold petabytes of unstructured data. Image files — particularly high-resolution permit photos, surveillance exports, or infrastructure inspection imagery — are among the largest contributors to that footprint. Cutting duplicate image volume by even 20 percent across a large department translates to meaningful recurring savings in a fiscal environment where Mayor Daniel Lurie's administration has inherited a structural budget deficit.
City technology officers have pointed to the expanded use of AI-assisted deduplication tools as part of the broader push to modernize back-end systems before the Department of Technology's next major platform migration, currently planned for the 2027 fiscal year. Those tools can match not just pixel-identical files but visually similar images — useful for catching photographs of the same pothole on Van Ness Avenue uploaded separately by two inspectors using different devices.
For agencies that want to move ahead now, data managers advise starting with a complete file inventory before touching anything, establishing clear retention policy guidance with the City Attorney's Office, and piloting deduplication on a single department archive rather than attempting a citywide sweep. The San Francisco Ethics Commission and the City Administrator's Office have both signaled interest in updated digital records guidance before the end of calendar year 2026. Departments that act first are likely to be better positioned when that guidance lands — and better equipped to handle the next records request without hunting through three copies of the same photograph.