The Daily San Francisco

San Francisco news, every day

News

San Francisco's Aging Digital Archives Are Riddled With Duplicate Images — Here's What Comes Next

City agencies and cultural institutions face a fork in the road as they decide whether to invest in AI-driven deduplication tools or keep patching a fragmented system that wastes staff time and public storage dollars.

By San Francisco News Desk · Published 4 July 2026, 11:40 am

4 min read

San Francisco's public digital infrastructure has a quiet but costly problem: thousands of duplicate images clogging municipal photo archives, city planning databases, and the collections of publicly funded cultural institutions — and no unified policy yet for cleaning them up. The issue came into sharper focus this spring when staff at the San Francisco Public Library's main branch on Larkin Street began a systematic audit of its digitized photograph collection, discovering that some historical images had been ingested as many as four or five times across different catalog systems over the past decade.

The timing matters. The city is in the middle of a broader push to modernize its digital infrastructure, and several key contracts — including one covering data storage services for the Department of Technology — are up for renewal before the end of fiscal year 2027. How officials decide to handle duplicate image removal will shape those procurement decisions and set a precedent for how San Francisco manages its exploding volume of visual data.

The Scale of the Problem Across City Institutions

The San Francisco Public Library is not alone. Staff at the San Francisco Arts Commission, which maintains a public art registry covering more than 4,000 works citywide, have flagged similar redundancy issues in the archive of installation photographs that document murals, sculptures, and temporary installations from the Tenderloin to the Dogpatch. The Planning Department's environmental review database, which holds photographic evidence attached to thousands of permit applications filed through the Permit Center at 49 South Van Ness Avenue, has also accumulated duplicate files at a rate that strains its allocated cloud storage.

Duplicate images are not just an aesthetic nuisance. Redundant files consume server space that costs real money, slow down search and retrieval functions used daily by city employees and members of the public, and complicate compliance with California Public Records Act requests. When a records request arrives and staff must manually sift through multiple near-identical versions of the same photograph to determine which is the authoritative copy, the labor costs add up fast. The city's Department of Technology did not respond to a request for comment by publication time.

Several San Francisco-based technology firms — including companies operating out of the cluster of AI startups along Market Street and in SoMa — now offer automated deduplication pipelines that use perceptual hashing and machine-learning classifiers to identify visually similar images at scale. These tools can typically process tens of thousands of images in hours rather than the weeks it would take a human archivist. Licensing costs for enterprise-grade solutions generally run between roughly $15,000 and $80,000 annually depending on volume, according to publicly available pricing from vendors in the space, though municipal contracts are usually negotiated at custom rates.

Key Decisions Ahead — and Who Gets to Make Them

The most consequential near-term decision is whether the city will centralize its image archive governance under a single agency or continue letting each department run its own solution. The Mayor's Office of Civic Innovation has convened at least one working session this year on broader data standardization, but no formal policy covering image deduplication has been announced publicly as of July 4, 2026.

The San Francisco Public Library's digital services team is expected to present findings from its Larkin Street audit to the Library Commission sometime before September. That report will likely become the clearest public benchmark the city has for understanding the scope of the problem across all departments — and it could set the agenda for what procurement officials prioritize when the Department of Technology's storage contracts come up for bid.

For institutions watching from the outside, the Prelinger Library in SoMa — a privately held research archive with deep ties to San Francisco's documentary history — has already moved through its own deduplication process and may offer a practical model for smaller collections. City archivists and department IT leads who want to get ahead of the curve have until the fall budget amendment window to flag deduplication projects for funding. After that, the next realistic entry point is the full budget cycle beginning in January 2027 — which means the decisions made in the next 90 days will define the city's digital housekeeping for years to come.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.