SF's Duplicate Image Problem: The Key Decisions That Will Define the City's Digital Records Overhaul
City departments are sitting on a sprawling backlog of redundant digital assets — and how they handle it will shape public records access for years.
City departments are sitting on a sprawling backlog of redundant digital assets — and how they handle it will shape public records access for years.

San Francisco's municipal technology offices are facing a defining choice over how to purge, preserve, or consolidate tens of thousands of duplicate digital images stockpiled across at least a dozen city departments, a problem that has quietly grown alongside two decades of uncoordinated scanning drives and cloud migrations. The question now is not whether to act, but who decides what gets kept.
The issue has sharpened this summer because the Department of Technology's citywide cloud contract — routed through the San Francisco Digital Services division at City Hall — comes up for renegotiation before the end of fiscal year 2026. Storage costs tied to redundant assets are a line item that budget analysts at the Controller's Office on Van Ness Avenue have flagged in internal reviews. Duplicated image files inflate storage bills, slow retrieval for public records requests, and complicate the city's move toward a unified open-data portal that the Mayor's Office of Civic Innovation has been piloting since 2024.
The duplication problem is largely structural. When the San Francisco Planning Department digitized its permit archive — a project that ran across multiple fiscal years starting around 2018 — files were saved in overlapping batches by different contractors, with no single deduplication standard applied. The same pattern played out at the San Francisco Public Library's San Francisco History Center at the Civic Center branch, where photographic collections were scanned by volunteer cohorts and staff teams working from different metadata schemas. Staff there have noted the challenge publicly at open community archive meetings, though the library has not released an official count of duplicate records.
The San Francisco Municipal Transportation Agency faces a parallel version of the problem in its engineering asset database, where traffic-signal inspection photographs taken by field crews on Potrero Avenue and at the Transbay Transit Center corridor have accumulated duplicate copies tied to different work-order numbers referencing the same physical inspection event. SFMTA has not disclosed a specific figure for redundant image files, but the agency's 2025 technology audit, a public document, identified image-asset management as an area requiring standardized protocols.
Across city government, estimates from the Digital Services team — cited in a March 2026 budget presentation to the Board of Supervisors' Government Audit and Oversight Committee — put the share of potentially duplicated files across municipal cloud storage at roughly 18 percent of total image assets. At current enterprise storage rates, that fraction represents a non-trivial recurring expense each budget cycle, though the Controller's Office has not published a final dollar figure pending the contract review.
Three choices are now unavoidable. First, city technology leadership must decide which department owns the deduplication standard. Digital Services has drafted a proposed policy, but both the Planning Department and the City Administrator's Office have argued they need department-level control over retention rules, particularly for legally sensitive permit imagery that carries its own records-retention obligations under state law.
Second, the city must settle on a timeline. A phased approach — starting with the Planning Department's Mission District permit archive and the Public Library's photographic collection before expanding citywide — would spread costs but delay the efficiency gains that budget analysts say are needed by the start of fiscal year 2027 in July of next year.
Third, and most consequential for ordinary San Franciscans, is whether deduplicated records will be surfaced on DataSF, the city's open-data platform, in a way that preserves public access. Archivists at the History Center have consistently argued that deletion without a preservation review risks erasing contextual material — a photograph of the Tenderloin in 1978 labeled as a duplicate of another frame from the same roll may carry distinct evidentiary value.
The Board of Supervisors' Technology Committee is expected to hold a hearing on the Digital Services storage proposal before the August recess. Department heads have until mid-July to submit written responses to the draft policy. Whatever framework emerges will set the template not just for image files, but for how San Francisco manages the broader explosion of unstructured digital data accumulating across city government — a problem that will only compound as AI-driven documentation tools produce more visual records faster than any manual review process can handle.
How does this story make you feel?
Spread the word
About this article
Published by The Daily San Francisco
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News