The Daily San Francisco

San Francisco news, every day

News

SF's Duplicate Image Problem: The Key Decisions That Will Define the City's Digital Record

City agencies and cultural institutions are sitting on backlogs of redundant digital assets—and the choices they make this summer will shape how San Francisco's history gets preserved and accessed for decades.

By San Francisco News Desk · Published 4 July 2026, 11:40 am

3 min read

San Francisco's public institutions are confronting a growing crisis hiding in plain sight: tens of thousands of duplicate digital images clogging the city's archival systems, municipal databases, and cultural repositories, with no unified policy yet in place to decide what gets kept, what gets deleted, and who gets to make that call.

The issue has sharpened this year as several major Bay Area institutions—including the San Francisco Public Library's San Francisco History Center at Larkin Street and the San Francisco Arts Commission—have moved to modernize their digital asset management systems. The upgrades are exposing just how badly redundant files have accumulated, and are forcing administrators to make permanent decisions about irreplaceable visual records.

The stakes are high in a city that has experienced catastrophic losses of physical archives before, most notably in the 1906 earthquake and fire. When something gets deleted from a digital record without a proper deduplication review, there is often no paper backup waiting in a Civic Center storage room.

What the Backlog Actually Looks Like

The problem is not unique to San Francisco, but the city's particular mix of legacy government systems and rapidly scaled tech-adjacent infrastructure has made the backlog unusually acute. The San Francisco Municipal Transportation Agency, which manages Muni and operates out of offices near 1 South Van Ness Avenue, has been migrating decades of photo documentation—fleet records, infrastructure surveys, incident photography—into consolidated servers since early 2025. Sources familiar with public records processes say deduplication protocols during migrations of that scale routinely surface duplicate rates of 20 to 40 percent in legacy image libraries, though no official figure for the SFMTA's specific migration has been published.

At the California Historical Society on Mission Street in SoMa, staff have been working through a similar challenge with digitized photograph collections. The institution holds one of the most significant visual archives of California and San Francisco history in existence. When duplicate scans exist—sometimes created during multiple digitization passes of the same physical item—the wrong deletion decision can erase metadata, resolution quality, or color calibration information that distinguishes one version from another.

The broader Bay Area cultural sector has been watching how San Francisco handles these questions carefully. The Internet Archive, headquartered on Funston Avenue in the Richmond District, has long argued that storage costs are dropping fast enough—cloud storage now runs well under $0.03 per gigabyte per month on major platforms—that aggressive deletion policies carry more risk than reward for public institutions managing historical materials.

The Decisions Coming This Fall

The San Francisco City Administrator's Office is expected to circulate draft guidelines for municipal digital asset retention before the end of the third quarter of 2026. Those guidelines will likely set retention thresholds, define what qualifies as a true duplicate versus a variant, and establish whether deletion decisions require human sign-off or can be automated by software.

Each of those choices carries consequence. Automated deduplication tools work well on commercial photography but perform poorly on archival materials where two images that look identical may carry different provenance metadata. Institutions that have adopted fully automated workflows—including some in Chicago and New York—have later reported accidental losses that required costly recovery efforts.

For city officials, the practical question is whether to invest in supervised review workflows, which are slower and require trained staff, or to push through automated solutions that fit tighter budget timelines. The Department of Technology, based on South Van Ness, has been the lead agency on cross-departmental digitization since 2023, and its posture on this question will carry significant weight in the upcoming policy draft.

Community advocates, including groups focused on preserving Mission District and Chinatown neighborhood histories, have separately pushed for public comment periods before any deletion protocols are finalized. Their concern: that visual records of communities historically underrepresented in official archives are disproportionately at risk when deletion decisions get made quickly and without cultural context review. The City Administrator's office has not yet confirmed whether a public comment window will be included in the fall process.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.