San Francisco's municipal digital infrastructure is carrying a hidden weight: thousands of duplicate images buried inside city databases, department servers, and public-facing portals — the product of more than a decade of siloed record-keeping, rushed digitization projects, and the absence of any citywide standard for how photographs and scanned documents get filed, named, or verified before upload. The problem has reached a point where several departments are now actively auditing their holdings and, in some cases, contracting outside vendors to run deduplication sweeps.
The timing matters. The city's Department of Technology has been pushing since early 2025 to consolidate municipal cloud storage contracts under a single framework, partly in response to ballooning annual licensing costs. When agencies independently store redundant assets — the same permit photo appearing five times across three systems, or a property scan duplicated between the Planning Department on South Van Ness Avenue and the Assessor-Recorder's office at City Hall — those bytes add up into real dollars. Storage waste is not an abstraction; it shows up in renewal invoices.
How the Duplication Built Up Over Time
The roots go back to the early 2010s, when San Francisco made a major push to digitize paper records held in physical archives across the city. That effort was decentralized by design — each department managed its own scanning contracts, its own vendor relationships, its own naming conventions. The Recreation and Parks Department, the Public Works bureau, and the Human Services Agency each built their own document management workflows with minimal cross-talk. Files migrated between platforms when departments upgraded software, and each migration created opportunities for copies to propagate without anyone noticing or caring enough to clean house.
The city's shift toward cloud storage accelerated the problem rather than solving it. When departments moved onto platforms like Microsoft Azure and Google Cloud between 2018 and 2022, legacy files were often bulk-uploaded rather than catalogued. A 2023 internal review by the Controller's Office — whose findings were summarized in a publicly available budget report — flagged redundant digital asset storage as a contributor to unanticipated IT cost overruns across at least four major departments. The Controller's Office did not specify a dollar figure for image duplication specifically, but the broader storage inefficiency finding prompted the Department of Technology to begin scoping a remediation program.
The San Francisco Public Library's digitization program, which covers historical photographs held at the main branch on Larkin Street in the Civic Center, encountered its own version of the issue. Librarians discovered that batches of photographs from the Western Neighborhoods Project and other community archiving efforts had been uploaded multiple times as volunteers and staff worked in parallel without a shared asset registry. The library has been working since 2024 to reconcile those holdings, according to publicly posted project documentation on its digital collections page.
What Deduplication Actually Involves — and What Comes Next
Replacing or removing a duplicate image is not as simple as hitting delete. In government systems, images are often attached to records — a building permit, an incident report, a park maintenance log — meaning a duplicate file may be the only copy linked to a particular entry in a database. Delete the wrong instance and you break the record chain. The correct approach requires first identifying which copy is the canonical version, then updating all database references to point to that single file before the redundant copies are safely purged. That process, applied at scale, requires either significant staff time or a third-party tool capable of running hash-matching algorithms across disparate storage environments.
Several city departments are expected to include deduplication line items in their fiscal year 2026-27 budget requests, which go before the Board of Supervisors this fall. For residents and businesses that interact with city permitting and records systems — particularly in high-volume neighborhoods like SoMa, the Mission, and the Tenderloin, where development and social services generate large document loads — a cleaner backend should eventually mean faster response times on records requests filed under the California Public Records Act. The practical payoff will take time, but the audits now underway represent the first systematic attempt to count the cost of years of uncoordinated growth.