San Francisco's public institutions are sitting on a growing backlog of duplicate digital images — redundant photographs, scanned documents, and archival visuals spread across city servers — and the decisions made in the next several months will determine whether the problem gets fixed or quietly festers into a much larger budget headache. The issue has surfaced as the San Francisco Public Library's San Francisco History Center on Larkin Street and the city's Department of Technology have each flagged ballooning cloud storage expenditures tied in part to duplicated media files across municipal databases.
The timing matters. San Francisco is in the middle of an aggressive push to digitize civic records — part of the broader open-government drive accelerated under the city's DataSF program — and the AI boom has supercharged both the volume of image generation and the demand for clean, well-catalogued datasets. Every duplicated file isn't just a storage cost. It degrades search accuracy, slows retrieval, and undermines the integrity of public archives that researchers, journalists, and city planners rely on daily.
Where the Problem Lives
Three city entities have the clearest stake in getting this right. The San Francisco Public Library system, which operates 28 branch locations citywide, maintains digitized photo collections through its SF Digital Collections portal. The San Francisco Arts Commission, headquartered at 401 Van Ness Avenue, manages an image library tied to its Civic Art Collection — a publicly owned portfolio that includes works installed everywhere from the Embarcadero to Glen Park. And the Planning Department's environmental review division holds thousands of scanned site photographs attached to permits and CEQA filings going back decades.
Each of these systems grew largely in isolation. When the city moved aggressively toward cloud infrastructure between 2019 and 2022, files were migrated without systematic deduplication. The result is a patchwork: the same historic photograph of, say, the Fillmore District in the 1960s can exist in four separate repositories under slightly different filenames, tagged inconsistently, and billed to three separate departmental budgets.
Municipal cloud storage contracts in cities of comparable scale typically run between $2 million and $6 million annually, with duplicate and orphaned files accounting for a meaningful share of avoidable spend — though the precise figure for San Francisco's holdings has not been made public. What is clear is that city IT officials have identified digital asset management as a priority line item in the Fiscal Year 2026–27 budget cycle, which the Board of Supervisors is scheduled to finalize before August 1.
The Decisions That Will Define the Outcome
The most consequential near-term choice is whether the city adopts a centralized deduplication tool applied across all departments, or allows each agency to run its own cleanup independently. A centralized approach through the Department of Technology would produce consistent metadata standards and reduce redundancy across the whole system, but it requires interagency cooperation that has historically been difficult to achieve in San Francisco's balkanized civic bureaucracy. The Planning Department alone operates on a different content management platform than the Library system.
Vendors offering AI-assisted image deduplication — a category that has expanded rapidly since 2024 — have already pitched the Department of Technology, according to procurement filings posted to the city's public contract portal. Several tools can flag near-duplicate images, not just exact copies, which matters for photographic archives where the same scene was shot multiple times from slightly different angles.
The SF Digital Services team, which oversees the city's technology modernization efforts out of City Hall, is expected to release draft guidelines for a unified digital asset policy by September. Whether those guidelines carry any enforcement weight, or simply serve as recommendations that agencies can ignore, is the central governance question still unresolved.
For residents who care about civic transparency — and for researchers at institutions like the UCSF Library or the California Historical Society on Jackson Street, which frequently cross-reference city holdings — the practical stakes are real. A cleaner, deduplicated archive is faster to search, cheaper to maintain, and more useful as a public resource. The decisions made between now and the end of summer will set the baseline for how well San Francisco manages its digital heritage for years to come.