The Daily San Francisco

San Francisco news, every day

News

SF City Agencies Push to Overhaul Duplicate Photo Records as Digital Archives Balloon Out of Control

Officials, archivists, and tech specialists are warning that San Francisco's government databases are drowning in redundant imagery — and the cleanup bill is only going up.

By San Francisco News Desk · Published 4 July 2026, 12:16 pm

3 min read

SF City Agencies Push to Overhaul Duplicate Photo Records as Digital Archives Balloon Out of Control
Photo: Photo by Brett Sayles on Pexels

San Francisco's municipal agencies are sitting on tens of thousands of duplicate digital images spread across Planning Department servers, SFMTA databases, and the city's centralized DataSF repository — and the problem, according to people who work with those systems, has reached a point where it is actively slowing down public records requests and routine operations.

The issue surfaced publicly this spring when the city's Department of Technology flagged duplicate imagery as a contributing factor in storage cost overruns during a budget review period ending June 30. Redundant files, including duplicated street-condition photos, permit inspection images, and homelessness encampment documentation, were identified as consuming a disproportionate share of cloud storage capacity across multiple departments.

Why This Matters Right Now

San Francisco's digital infrastructure push accelerated sharply after 2022, when Mayor London Breed's administration prioritized moving city records to cloud platforms. That shift brought efficiency gains but also a sprawl problem: departments uploading the same images through different workflows, with no automated system flagging the redundancy. The Planning Department on Seventh Street and the Department of Public Works both use imagery extensively for permit and inspection documentation, and cross-departmental uploads of the same site photos are common.

Archivists at the San Francisco History Center at the Main Library on Larkin Street have been working through their own version of this challenge in the analog-to-digital conversion of historical records. Librarians there have noted that digitization projects from the early 2010s produced multiple scans of the same photographs at different resolutions, creating version-control headaches that have persisted for more than a decade.

The cost dimension is concrete. Cloud storage pricing for government contracts, while lower than consumer rates, still means that every gigabyte of redundant imagery represents real budget dollars. Analysts working on civic technology projects have estimated, in general terms, that duplicate file elimination in mid-size city agencies can reduce storage footprints by 20 to 40 percent — a range that, applied to San Francisco's scale, could translate to meaningful annual savings at a time when the city is managing a multi-hundred-million-dollar budget shortfall.

What Experts and Officials Are Recommending

People familiar with the city's technology operations point to several approaches being discussed. One centers on deploying perceptual hashing tools — software that identifies visually similar images even when file names or metadata differ — across department servers. This kind of automated deduplication has been used by large tech firms based in SoMa and the Civic Center corridor for years, but municipal adoption has lagged.

The nonprofit Code for San Francisco, which runs volunteer civic tech projects out of coworking spaces in the Mid-Market neighborhood, has previously flagged data hygiene as a foundational issue in open government work. Volunteers there have documented cases where the same public dataset image appears under multiple catalog entries on the DataSF portal, making automated analysis unreliable.

The SFMTA, which maintains an extensive archive of street and transit infrastructure photography used for everything from Muni stop planning to Vision Zero documentation, has been in discussions with the Department of Technology about standardizing upload protocols as part of a broader IT reform effort tied to the fiscal year 2026-27 budget cycle. No specific program launch date has been confirmed publicly.

Practically speaking, city vendors and staff photographers working on projects from the Tenderloin to the Bayview have been advised, informally, to adopt consistent file-naming conventions and to check existing databases before uploading new images. That guidance is not yet codified in a formal city policy, according to public meeting records from the city's IT governance committee.

For San Franciscans who interact with public records — journalists, researchers, neighborhood groups filing complaints about street conditions — the practical upshot is that cleanup efforts, if funded and executed, should eventually mean faster responses and more reliable search results on platforms like DataSF. The Department of Technology is expected to present formal deduplication proposals to the Board of Supervisors' Government Audit and Oversight Committee later this summer.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.