San Francisco's municipal technology infrastructure is sitting on a sprawling archive of duplicate digital images — redundant photographs, scanned permits, and replicated visual records spread across at least a dozen city departments — and the people responsible for cleaning it up are now facing a hard deadline and harder choices.
The problem matters now because city agencies including the Department of Building Inspection and the Planning Department have been accelerating their document digitization efforts since 2024, when the Board of Supervisors approved an expanded digital-records mandate tied to the housing production emergency. More documents flowing in faster means duplicate imagery compounds faster too, bloating storage costs and slowing down staff searches at a moment when permit processing speed is under intense public scrutiny.
At the San Francisco Department of Technology's offices on 1 South Van Ness Avenue, administrators have been working with vendor contracts to assess the scope of the redundancy problem. The Planning Department's public portal, which covers projects from the Tenderloin to Dogpatch, currently surfaces duplicate image records in search results frequently enough that planning staff flag it as a workflow issue in internal review cycles. The San Francisco Public Library's digital collections program, headquartered at the Main Branch on Larkin Street, faces the same structural challenge with its historical photograph archive, where automated ingestion tools have created overlapping entries in the city's linked open-data catalog.
What the Backlog Actually Costs
Cloud storage is not free. San Francisco's citywide IT budget, approved in the current fiscal cycle, allocates funds for data storage across departments, and redundant image files directly inflate those line items. Industry benchmarks suggest that municipal digital archives with poor deduplication protocols can carry storage overhead of 20 to 40 percent above what a cleaned dataset would require — meaning the city may be paying for a significant fraction of storage it does not need. The Department of Technology has not released a specific figure for San Francisco's redundancy rate, but the scale of the digitization push since 2024 makes the exposure substantial.
Two approaches are on the table. The first is a retroactive deduplication sweep — running automated hash-matching tools across existing archives to identify and quarantine duplicate files before human review. The second is a prospective fix: updating ingest protocols at the point of upload so that future scans and photographs are checked against existing records before they enter the system. Most technology administrators favor the prospective approach as cheaper and less disruptive, but it leaves the existing backlog unresolved, meaning staff at agencies like the Department of Building Inspection on Inspection Street near SoMa will continue pulling redundant records from searches in the near term.
The Decisions Ahead
Three choices will define how this plays out over the next 12 months. First, the Department of Technology needs to decide by the end of the current fiscal year — June 30, 2027 — whether to issue a new request for proposals for a citywide deduplication platform or extend existing vendor arrangements. Second, individual departments need to determine which image categories get priority: permit photographs tied to active housing projects carry a stronger urgency argument than historical library scans, but the library's digitization team has its own grant deadlines. Third, the city needs to settle on a governance model — either a centralized image registry that all departments write to, or a federated approach where each department maintains its own archive with shared metadata standards.
The centralized registry model has the stronger technical argument. It eliminates duplication at the source and makes cross-agency searches more reliable, which matters for projects spanning both Planning and DBI, as is common in the Mission District and along the Central SoMa Plan corridor. The federated model is politically easier, because departments guard their data systems jealously, but it pushes the deduplication burden permanently onto staff time rather than automation.
For residents and developers watching permit timelines, the practical takeaway is simple: if the city chooses the faster, cheaper prospective fix without clearing the existing backlog, search results on public portals will remain cluttered with duplicate images well into 2028. The harder path — a full retroactive sweep combined with a new ingest protocol — costs more upfront but produces a cleaner system faster. The Board of Supervisors has not yet scheduled hearings on the matter, but the housing production pressure alone may force the timeline.