SF's Duplicate Image Problem: What Happens Next and the Key Decisions Ahead
City agencies and nonprofits sitting on redundant digital archives face mounting storage costs and a shrinking window to act before the problem compounds.
City agencies and nonprofits sitting on redundant digital archives face mounting storage costs and a shrinking window to act before the problem compounds.

San Francisco's public agencies and major cultural institutions are staring down a growing crisis in their digital archives: tens of thousands of duplicate image files clogging servers, inflating cloud storage bills, and making it harder for staff to find accurate, up-to-date visual records. The problem has reached a tipping point in mid-2026, with several departments quietly launching internal audits after storage invoices climbed well past what budget managers had projected for the fiscal year ending June 30.
The timing matters. City Hall is already under pressure to trim operational spending amid a projected General Fund shortfall, and IT departments across the Civic Center complex are being asked to justify every line of infrastructure expenditure. Redundant image files — duplicated across shared drives, backup servers, and third-party cloud platforms — represent one of the cleaner places to cut without touching frontline services. But cutting them carelessly can destroy the only surviving copy of a document or photograph, which is why the decisions made in the next 60 to 90 days carry real stakes.
The San Francisco Public Library system, which operates the main branch on Larkin Street and 27 neighborhood branches, maintains digitised historical photograph collections that archivists say have been copied repeatedly as staff migrated between storage platforms over the past decade. The San Francisco Municipal Transportation Agency, headquartered on Van Ness Avenue, faces a similar tangle inside its project documentation folders, where engineering photos from Muni Metro tunnel inspections have been duplicated across departmental drives without a unified file-naming protocol.
The San Francisco Arts Commission, which manages public art installations from the Tenderloin to the Dogpatch, has been piloting a deduplication workflow since March 2026 using open-source tools to flag identical or near-identical image files before archivists make final deletion decisions. That pilot covers roughly 40,000 image files across the Commission's permanent collection database. Staff have described the process internally as slower than expected, partly because automated tools flag visually similar — but not identical — images that still require human review.
The cost pressure is real. Cloud storage pricing from major providers has edged upward through 2025 and into 2026, and agencies running hybrid on-premise and cloud setups are paying for the same bytes twice. For a mid-sized city department managing 500,000 image files with a duplication rate of even 15 percent, that redundancy can translate to thousands of dollars in avoidable annual storage costs — money that department heads are increasingly reluctant to defend in budget hearings.
Three questions will define how this plays out across San Francisco's public sector over the remainder of 2026.
First, will agencies adopt a shared deduplication standard, or proceed independently? The Department of Technology, based on Seventh Street, has the authority to issue citywide digital asset management guidelines but has not yet done so for image archives specifically. A unified protocol would prevent agencies from solving the problem in incompatible ways that create new headaches down the line.
Second, how will institutions handle near-duplicate images — photographs taken seconds apart, or scans of the same document at different resolutions? Automated tools can identify pixel-perfect duplicates with high confidence, but near-duplicates require curatorial judgment. The San Francisco History Center at the Main Library, for instance, holds multiple scans of the same 19th-century photographs made at different points in time. Deleting the lower-resolution version might seem logical, but earlier scans sometimes carry metadata that later ones lost.
Third, who signs off on permanent deletion? Several city departments currently lack a clear chain of authority for that decision. Without a named responsible official and an audit trail, deletion decisions become legally murky — particularly for records subject to the California Public Records Act.
Archivists and IT managers who want to get ahead of this problem should move now to inventory what they have, flag duplicates without deleting them, and escalate the governance question to department heads before the next budget cycle locks spending priorities for fiscal year 2026-27. The window for a deliberate, well-documented approach is open — but not indefinitely.
How does this story make you feel?
Spread the word
About this article
Published by The Daily San Francisco
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News