The Daily San Francisco

San Francisco news, every day

News

San Francisco's Digital Archive Has Thousands of Duplicate Images. Here's What Happens Next.

City agencies and cultural institutions face a mounting backlog of redundant digital files—and the decisions they make this summer could shape how San Francisco's visual history is stored and accessed for decades.

By San Francisco News Desk · Published 4 July 2026, 11:58 am

3 min read

San Francisco's Digital Archive Has Thousands of Duplicate Images. Here's What Happens Next.
Photo: Photo by Deane Bayas on Pexels

San Francisco's public digital infrastructure has a clutter problem. Across city-managed repositories, libraries, and civic tech platforms, duplicate image files have accumulated for years—redundant photographs, scanned documents, and archival visuals that eat up server space, slow down search functions, and complicate public access to records. The question now is who decides what gets deleted, what gets kept, and who pays for the cleanup.

The issue has sharpened this summer as the city's Department of Technology rolls into a budget cycle under pressure. Housing agencies digitizing Mission District planning records, Muni uploading surveillance and infrastructure photos, and the San Francisco Public Library's San Francisco History Center on Larkin Street—all of them feed into shared or parallel storage systems where duplicate image management has largely been handled ad hoc, if at all.

Why This Moment Matters

The timing is pointed. San Francisco's broader push to modernize its civic tech stack has accelerated since 2024, driven partly by federal infrastructure grants and partly by pressure from the Controller's Office to cut operational costs. Cloud storage is not free. Enterprise-grade storage for large image libraries can run well above $50,000 annually for a mid-sized city agency, depending on volume and redundancy protocols, and duplicates compound those costs directly.

At SF Digital Services, the team that manages the city's resident-facing web infrastructure at City Hall and beyond, engineers have flagged duplicate image handling as a structural issue in internal working documents reviewed by this reporter. The San Francisco Recreation and Parks Department, which maintains image libraries for more than 220 parks and facilities, has separately acknowledged a backlog in its digital asset management system, though the department has not issued a public timeline for resolving it.

The San Francisco Public Library's History Center holds one of the most consequential collections at stake. Digitized photographs dating to the Gold Rush era live alongside more recent scans, and volunteers and staff have flagged duplicate entries that create confusion in the online catalog. Librarians there have been working with the Internet Archive, based in the Richmond District on Funston Avenue, on protocols for deduplication—but a finalized policy has not been adopted as of this week.

The Decisions Ahead

Three questions will define the outcome over the next six months. First: who has deletion authority? In most city agencies, no single office holds clear jurisdiction over purging image files from shared systems. The City Administrator's Office and the Department of Technology have overlapping roles, and without explicit policy, individual departments default to keeping everything—which is how duplicates accumulate in the first place.

Second: will the city invest in automated deduplication software, or rely on manual review? Commercial tools from vendors who work with municipal governments can identify near-duplicate images using hash-matching and perceptual algorithms, but licensing costs vary widely. A pilot program at the Planning Department, which processes thousands of permit-related property photographs annually at its offices on Mission Street, could serve as a test case for citywide rollout.

Third: what counts as a true duplicate versus a meaningful variant? A photograph taken from the same angle on different dates may look identical but carry distinct evidentiary value—particularly for infrastructure documentation, police records, or environmental assessments. Cultural institutions like the History Center are especially cautious here. Deleting the wrong file in an archival context is not a recoverable error.

The Board of Supervisors' Government Audit and Oversight Committee is scheduled to hear a broader digital infrastructure update later this summer, which city technology staff say could include discussion of storage efficiency. Advocacy groups focused on open government, including the San Francisco chapter of civic tech organization Code for America, have pushed for transparent retention policies that give the public a voice before mass deletions occur. The Fourth of July holiday gives agencies a brief pause before the real work resumes Tuesday morning—and the decisions made in the weeks that follow will be difficult to undo.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.