San Francisco's Office of Digital Services confirmed this spring that a systematic audit of the city's public-facing web infrastructure had uncovered tens of thousands of duplicate image files spread across more than a dozen department websites, a problem that officials trace back to a decade of uncoordinated digitization efforts and at least three separate content management system migrations. The finding, which emerged from a broader review of the city's DataSF platform and department-level portals, has forced a reckoning with how municipal technology decisions get made — and who pays when they don't cohere.
The timing matters. San Francisco is in the middle of an aggressive push to consolidate city services under a unified digital infrastructure, part of Mayor Daniel Lurie's broader mandate to cut administrative overhead following the fiscal shortfalls that defined the post-pandemic Breed years. Duplicate image storage is not glamorous, but it is expensive. Cloud hosting costs for redundant assets have been folded into departmental IT budgets that are already under review ahead of the fiscal year 2026-27 appropriations cycle, which the Board of Supervisors is expected to finalize later this month.
How the Redundancy Built Up
The roots of the problem run through three distinct periods. The first came between 2014 and 2018, when individual departments — including the San Francisco Department of Public Health, the Planning Department on South Van Ness Avenue, and the Recreation and Parks Department — each contracted separately for website overhauls. There was no centralized image library, no shared asset management standard, and no requirement to cross-reference uploads against existing files. Staff at each agency uploaded photos independently, often duplicating images of City Hall, the Ferry Building, and neighborhood streetscapes that already existed in other departmental folders.
The second wave hit during the COVID-19 emergency period beginning in 2020. Departments pivoted rapidly to digital-first communications, often using contractors hired on emergency procurement terms. The city's 311 portal, the Department of Emergency Management's public dashboards, and the SF.gov redesign project — which launched in phases beginning in late 2020 — each imported legacy image libraries without deduplication protocols. By some internal estimates cited in the spring audit summary, the SF.gov migration alone carried forward files from at least four predecessor systems.
The third and most recent layer came from the AI content boom that reshaped city communications between 2024 and early 2026. Several departments began using AI-assisted tools to generate and publish content at higher volume, including images resized or reformatted for different display contexts. Without automated hashing or perceptual duplicate detection baked into the publishing workflow, near-identical images accumulated rapidly — the same photograph of, say, Dolores Park or the Caltrain station at 4th and King appearing in multiple resolutions across multiple pages.
What the Cleanup Looks Like — and What It Costs
The Office of Digital Services is now working with vendors to implement automated deduplication across the city's primary content management system, which runs on a Drupal-based platform. The contract for that work, awarded in late May 2026, is valued at roughly $340,000 according to the city's public procurement portal — a figure that city technology staff have described as modest relative to the ongoing hosting overhead it is meant to eliminate.
The San Francisco Public Library system, which maintains its own digital collections through a separate archive at the main branch on Larkin Street, began its own deduplication project in January 2025 after identifying redundancy in its digitized photograph collection, which includes historical images of the Tenderloin, the Fillmore, and the Embarcadero waterfront dating to the early twentieth century. Library staff completed that project in approximately six months using open-source tools.
For city departments that have not yet been audited, the Office of Digital Services is expected to issue updated asset management guidelines by September 2026, requiring all new image uploads to pass through a centralized library check before publication. Departments that contract for external web development will be required under the new rules to include deduplication compliance as a deliverable. It is a bureaucratic fix, but the kind that — had it existed a decade ago — might have saved the city a significant chunk of what it is now spending to clean up the mess.