The Daily San Francisco

San Francisco news, every day

News

San Francisco's Digital Archive Faces a Reckoning Over Duplicate Images: What Happens Next

City agencies and cultural institutions must now decide how to audit, deduplicate, and preserve thousands of redundant digital files before storage costs spiral further out of control.

By San Francisco News Desk · Published 4 July 2026, 12:06 pm

3 min read

San Francisco's Digital Archive Faces a Reckoning Over Duplicate Images: What Happens Next
Photo: Photo by Airam Dato-on on Pexels

San Francisco's public institutions are sitting on a growing backlog of duplicate digital images — redundant photographs, scanned documents, and archival visuals spread across city servers — and the decisions made in the next several months will determine whether the problem gets fixed or quietly festers into a much larger budget headache. The issue has surfaced as the San Francisco Public Library's San Francisco History Center on Larkin Street and the city's Department of Technology have each flagged ballooning cloud storage expenditures tied in part to duplicated media files across municipal databases.

The timing matters. San Francisco is in the middle of an aggressive push to digitize civic records — part of the broader open-government drive accelerated under the city's DataSF program — and the AI boom has supercharged both the volume of image generation and the demand for clean, well-catalogued datasets. Every duplicated file isn't just a storage cost. It degrades search accuracy, slows retrieval, and undermines the integrity of public archives that researchers, journalists, and city planners rely on daily.

Where the Problem Lives

Three city entities have the clearest stake in getting this right. The San Francisco Public Library system, which operates 28 branch locations citywide, maintains digitized photo collections through its SF Digital Collections portal. The San Francisco Arts Commission, headquartered at 401 Van Ness Avenue, manages an image library tied to its Civic Art Collection — a publicly owned portfolio that includes works installed everywhere from the Embarcadero to Glen Park. And the Planning Department's environmental review division holds thousands of scanned site photographs attached to permits and CEQA filings going back decades.

Each of these systems grew largely in isolation. When the city moved aggressively toward cloud infrastructure between 2019 and 2022, files were migrated without systematic deduplication. The result is a patchwork: the same historic photograph of, say, the Fillmore District in the 1960s can exist in four separate repositories under slightly different filenames, tagged inconsistently, and billed to three separate departmental budgets.

Municipal cloud storage contracts in cities of comparable scale typically run between $2 million and $6 million annually, with duplicate and orphaned files accounting for a meaningful share of avoidable spend — though the precise figure for San Francisco's holdings has not been made public. What is clear is that city IT officials have identified digital asset management as a priority line item in the Fiscal Year 2026–27 budget cycle, which the Board of Supervisors is scheduled to finalize before August 1.

The Decisions That Will Define the Outcome

The most consequential near-term choice is whether the city adopts a centralized deduplication tool applied across all departments, or allows each agency to run its own cleanup independently. A centralized approach through the Department of Technology would produce consistent metadata standards and reduce redundancy across the whole system, but it requires interagency cooperation that has historically been difficult to achieve in San Francisco's balkanized civic bureaucracy. The Planning Department alone operates on a different content management platform than the Library system.

Vendors offering AI-assisted image deduplication — a category that has expanded rapidly since 2024 — have already pitched the Department of Technology, according to procurement filings posted to the city's public contract portal. Several tools can flag near-duplicate images, not just exact copies, which matters for photographic archives where the same scene was shot multiple times from slightly different angles.

The SF Digital Services team, which oversees the city's technology modernization efforts out of City Hall, is expected to release draft guidelines for a unified digital asset policy by September. Whether those guidelines carry any enforcement weight, or simply serve as recommendations that agencies can ignore, is the central governance question still unresolved.

For residents who care about civic transparency — and for researchers at institutions like the UCSF Library or the California Historical Society on Jackson Street, which frequently cross-reference city holdings — the practical stakes are real. A cleaner, deduplicated archive is faster to search, cheaper to maintain, and more useful as a public resource. The decisions made between now and the end of summer will set the baseline for how well San Francisco manages its digital heritage for years to come.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.