San Francisco's Department of Technology holds tens of thousands of digitized photographs across municipal archives, planning records, and public works documentation — and a significant share of those files are duplicates. The problem is not new, but a scheduled infrastructure migration set for the fourth quarter of 2026 has put the question of how to handle redundant images directly in front of city administrators. The decision they make in the next few months will shape how public records are stored, searched, and retrieved for years.
The stakes are practical, not bureaucratic. When the San Francisco Planning Department or the Office of Community Investment and Infrastructure needs to pull project photos from the Mission District or the Tenderloin for an environmental review, duplicate-laden archives slow retrieval, inflate storage costs, and create version-control problems that can complicate legal and compliance filings. The city's digital transformation push, which accelerated under Mayor Daniel Lurie's administration following London Breed's tenure, has exposed these backlogs rather than solved them.
The Options on the Table
Three approaches are being weighed by city IT staff and departmental records managers. The first is a full manual audit — labor-intensive, slow, and expensive, but considered the most legally defensible for permanent public records. The second uses perceptual hashing software, a technique that generates a short fingerprint for each image and flags near-identical files automatically. The third, gaining traction after the city's broader AI pilot programs launched in early 2026, involves machine-learning classifiers trained to distinguish meaningful variations — a building at different construction stages, say — from true duplicates that offer no additional informational value.
San Francisco Public Library's History Center on Larkin Street, which maintains one of the city's most heavily accessed photographic collections, has already piloted a limited version of the hashing approach for its digitized San Francisco History Association holdings. Staff there found the tool effective at flagging obvious duplicates but less reliable when dealing with slightly different scans of the same original print — a common problem with legacy collections digitized in multiple batches over the past two decades.
The San Francisco Municipal Transportation Agency faces a parallel version of the same issue. SFMTA's engineering and maintenance divisions accumulate inspection photographs from Muni Metro stations, overhead wire infrastructure, and street-level signals. Internal document management guidelines require retention of inspection records, but do not clearly define when two nearly identical photos of the same broken signal head count as one record or two. That ambiguity has led to storage bloat that, according to city budget documents reviewed for fiscal year 2025-2026, contributes to cloud storage expenditures across departments totaling millions of dollars annually — though the precise figure attributable to duplicate images specifically has not been publicly broken out.
Key Decisions Ahead
The most consequential choice is governance, not technology. City Archivist staff and the Department of Technology must agree on a retention threshold — how similar does a duplicate need to be before it can be flagged for deletion without a human sign-off? Set the threshold too low and legitimate variations get tossed. Set it too high and the cleanup effort becomes meaningless.
A working group that includes representation from the Planning Department, SFMTA, and the City Attorney's office is expected to present draft guidelines by September 2026, ahead of the Q4 migration window. If those guidelines don't land on time, the migration proceeds with the duplicate problem intact, pushing the issue into 2027 and adding complexity to any future public records requests filed under the California Public Records Act.
For departments that interact directly with San Francisco neighborhoods — particularly those managing visual documentation of housing projects in the Western Addition or infrastructure work along the Central Freeway corridor — the outcome affects how quickly staff can respond to records requests and how accurately they can reconstruct project histories. The Tenderloin Housing Clinic and similar organizations that routinely file records requests for planning documentation have a direct stake in whether city archives are clean and searchable or cluttered with redundant files that slow response times.
The September deadline is the number to watch. If the working group delivers clear retention rules on schedule, the city has a viable path to a cleaner archive before the year ends. If it slips, the Q4 migration becomes a missed opportunity that administrators will spend 2027 trying to undo.