The Daily San Francisco

San Francisco news, every day

News

SF's Digital Archives Have a Duplicate Image Problem. Here's What Officials and Experts Are Saying About Fixing It

City agencies and preservationists are grappling with bloated, redundant photo databases — and debating who should pay to clean them up.

By San Francisco News Desk · Published 4 July 2026, 11:51 am

3 min read

SF's Digital Archives Have a Duplicate Image Problem. Here's What Officials and Experts Are Saying About Fixing It
Photo: Photo by Jakub Zerdzicki on Pexels

San Francisco's public agencies are sitting on tens of thousands of duplicate digital images across government databases, archival servers, and permit-tracking systems — and a growing chorus of city technologists, archivists, and open-government advocates says the redundancy is costing money, slowing access to public records, and undermining the reliability of the city's digital infrastructure.

The problem has been building for years, but it has sharpened lately as the city's Department of Technology pushes a broader cloud-migration initiative and as the Office of the City Clerk works to digitize decades of paper records held at City Hall, at 1 Dr. Carlton B. Goodlett Place. Storage costs for unmanaged, redundant image files have drawn scrutiny from the Budget and Legislative Analyst's office, which reviews departmental IT spending as part of the annual appropriations cycle.

Why This Is Landing on Desks Now

The immediate pressure comes from two directions. First, SFMTA's Muni transit division has been building out a real-time camera monitoring network across its bus and rail lines as part of a safety modernization program — a process that generates enormous volumes of image data and has exposed the absence of any citywide deduplication standard. Second, the San Francisco Public Library's San Francisco History Center, housed in the main branch on Larkin Street, is mid-way through a multi-year effort to digitize its photographic collections and has flagged that inconsistent file-naming conventions across contributing agencies are creating duplicate ingestion at scale.

Experts in digital asset management say the city is far from alone. Municipal governments across the country have struggled with this since the early shift to digital photography in the 2000s produced image files that were cheap to create, easy to copy, and rarely audited. But San Francisco's particular challenge is the sheer number of siloed systems — planning, public works, police, the port — that each run separate document management platforms with no shared deduplication layer sitting on top.

Representatives of the city's Department of Technology have described the problem in general terms during public IT governance meetings, characterizing it as a known technical debt issue tied to legacy procurement decisions made before cloud storage became standard. No department head has publicly committed to a specific remediation timeline.

What the Experts and Advocates Are Recommending

Digital preservation specialists consulting with the San Francisco Public Library Foundation — a nonprofit that funds programs at all 28 branch libraries — have pushed for the adoption of a perceptual hashing standard, a technique that identifies visually identical or near-identical images without requiring exact file matches. The approach is already used by large media organizations and social platforms to flag redundant assets automatically.

OpenSF, a civic-tech advocacy group that monitors city data transparency, has argued in public comment sessions that any deduplication project must be paired with a public audit of what images are being retained, how long they are kept, and under what legal authority — particularly for images captured by city surveillance systems. The group notes that California's Public Records Act, amended most recently by Proposition 59 in 2004, gives residents standing to challenge opaque retention practices.

The price tag for a serious remediation effort is not trivial. Cloud-based deduplication and metadata standardization projects for mid-sized municipal archives have run between $400,000 and $1.2 million depending on scope, according to published procurement records from comparable jurisdictions including Denver and Boston. San Francisco's situation is complicated by the number of departments involved, each of which would require separate negotiation and potentially separate contract vehicles under the city's purchasing rules.

The Board of Supervisors' Government Audit and Oversight Committee has not yet scheduled a formal hearing on the matter, but staff for at least two supervisors have requested briefings from the Department of Technology before the fall budget-implementation cycle begins in September 2026. If those conversations produce a formal directive, departments could be required to submit deduplication plans as part of their annual technology assessments — a process that already touches every major city agency. Without that mandate, advocates say, the problem will keep compounding one file at a time.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.