The Daily San Francisco

San Francisco news, every day

News

SF's City Archives Face a Decision Point on Duplicate Image Cleanup: What Happens Next

A sprawling backlog of duplicate photographs in San Francisco's municipal digital collections is forcing a choice between costly manual review and AI-assisted sorting — and the clock is ticking.

By San Francisco News Desk · Published 4 July 2026, 12:06 pm

4 min read

SF's City Archives Face a Decision Point on Duplicate Image Cleanup: What Happens Next
Photo: Photo by Karam Alani on Pexels

San Francisco's Department of Technology holds tens of thousands of digitized photographs across municipal archives, planning records, and public works documentation — and a significant share of those files are duplicates. The problem is not new, but a scheduled infrastructure migration set for the fourth quarter of 2026 has put the question of how to handle redundant images directly in front of city administrators. The decision they make in the next few months will shape how public records are stored, searched, and retrieved for years.

The stakes are practical, not bureaucratic. When the San Francisco Planning Department or the Office of Community Investment and Infrastructure needs to pull project photos from the Mission District or the Tenderloin for an environmental review, duplicate-laden archives slow retrieval, inflate storage costs, and create version-control problems that can complicate legal and compliance filings. The city's digital transformation push, which accelerated under Mayor Daniel Lurie's administration following London Breed's tenure, has exposed these backlogs rather than solved them.

The Options on the Table

Three approaches are being weighed by city IT staff and departmental records managers. The first is a full manual audit — labor-intensive, slow, and expensive, but considered the most legally defensible for permanent public records. The second uses perceptual hashing software, a technique that generates a short fingerprint for each image and flags near-identical files automatically. The third, gaining traction after the city's broader AI pilot programs launched in early 2026, involves machine-learning classifiers trained to distinguish meaningful variations — a building at different construction stages, say — from true duplicates that offer no additional informational value.

San Francisco Public Library's History Center on Larkin Street, which maintains one of the city's most heavily accessed photographic collections, has already piloted a limited version of the hashing approach for its digitized San Francisco History Association holdings. Staff there found the tool effective at flagging obvious duplicates but less reliable when dealing with slightly different scans of the same original print — a common problem with legacy collections digitized in multiple batches over the past two decades.

The San Francisco Municipal Transportation Agency faces a parallel version of the same issue. SFMTA's engineering and maintenance divisions accumulate inspection photographs from Muni Metro stations, overhead wire infrastructure, and street-level signals. Internal document management guidelines require retention of inspection records, but do not clearly define when two nearly identical photos of the same broken signal head count as one record or two. That ambiguity has led to storage bloat that, according to city budget documents reviewed for fiscal year 2025-2026, contributes to cloud storage expenditures across departments totaling millions of dollars annually — though the precise figure attributable to duplicate images specifically has not been publicly broken out.

Key Decisions Ahead

The most consequential choice is governance, not technology. City Archivist staff and the Department of Technology must agree on a retention threshold — how similar does a duplicate need to be before it can be flagged for deletion without a human sign-off? Set the threshold too low and legitimate variations get tossed. Set it too high and the cleanup effort becomes meaningless.

A working group that includes representation from the Planning Department, SFMTA, and the City Attorney's office is expected to present draft guidelines by September 2026, ahead of the Q4 migration window. If those guidelines don't land on time, the migration proceeds with the duplicate problem intact, pushing the issue into 2027 and adding complexity to any future public records requests filed under the California Public Records Act.

For departments that interact directly with San Francisco neighborhoods — particularly those managing visual documentation of housing projects in the Western Addition or infrastructure work along the Central Freeway corridor — the outcome affects how quickly staff can respond to records requests and how accurately they can reconstruct project histories. The Tenderloin Housing Clinic and similar organizations that routinely file records requests for planning documentation have a direct stake in whether city archives are clean and searchable or cluttered with redundant files that slow response times.

The September deadline is the number to watch. If the working group delivers clear retention rules on schedule, the city has a viable path to a cleaner archive before the year ends. If it slips, the Q4 migration becomes a missed opportunity that administrators will spend 2027 trying to undo.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.