The Daily San Francisco

San Francisco news, every day

News

How San Francisco's City Agencies Ended Up Drowning in Duplicate Digital Images — and What They're Doing About It

Years of siloed tech systems, pandemic-era digitization sprints, and a chronic shortage of records staff left city departments with redundant photo archives costing real money to store and maintain.

By San Francisco News Desk · Published 4 July 2026, 12:27 pm

3 min read

How San Francisco's City Agencies Ended Up Drowning in Duplicate Digital Images — and What They're Doing About It
Photo: Photo by Pixabay on Pexels

San Francisco's Department of Technology has been quietly working through a sprawling housekeeping problem that built up over more than a decade: tens of thousands of duplicate digital images clogging storage servers across at least a dozen city agencies, from the Planning Department on Inspections Street to the Municipal Transportation Agency's operations center at 1 South Van Ness. The redundancy is not trivial. Storage costs for city government infrastructure have climbed steadily, and duplicated files inflate those bills without adding any functional value.

The problem did not appear overnight. It is the product of at least three distinct historical forces that converged on San Francisco's public-sector IT infrastructure and left it looking, in the words of one city budget document from fiscal year 2024-25, like a system that had grown "organically rather than by design."

Three Waves That Created the Mess

The first wave came in the mid-2000s, when agencies began digitizing paper records independently, each using whatever scanning software their individual IT contractors recommended. The Planning Department, the Department of Building Inspection at 49 South Van Ness, and the Recreation and Parks Department all stood up separate document management systems with no shared taxonomy and no deduplication protocols. A single permit-related photograph could end up saved in three places — the inspector's local drive, a departmental server, and a citywide archive — with slightly different file names each time.

The second wave hit during the COVID-19 pandemic. Between 2020 and 2022, city agencies accelerated digitization efforts to support remote work, often under emergency procurement rules that bypassed standard IT review. The Controller's Office noted in its fiscal year 2022 performance report that departments had onboarded new cloud storage subscriptions at a pace that outstripped the city's ability to audit what was being uploaded. Images from field inspections, housing surveys, and public health outreach campaigns piled into cloud buckets without any systematic check for existing copies.

The third wave is the AI transition. Starting in 2023, several San Francisco agencies began piloting machine-learning tools — including programs managed through the city's partnership with the Mayor's Office of Housing and Community Development — that required large labeled image datasets for training. Staff compiled those datasets by pulling from whatever archives were accessible, often duplicating files again in the process. By early 2025, the Department of Technology estimated that duplicate and near-duplicate images accounted for a measurable share of the city's total unstructured data storage load, though a precise citywide figure has not been published in any document reviewed for this article.

The Reckoning Arrives at City Hall

The push to actually fix the problem traces to a Board of Supervisors budget hearing in March 2025, where Supervisor Ahsha Safaí pressed the Department of Technology on why storage line items in several departmental budgets had grown faster than the agencies' workloads. That hearing prompted an internal audit, the results of which were circulated to agency heads by June 2025.

The audit found that the MTA alone held multiple copies of traffic-camera stills and bus-camera footage frames that had been processed, archived, and never purged. The department has since begun a rolling deduplication project using open-source tooling, targeting its oldest archives first — files dating to before 2015.

San Francisco's experience mirrors what happened in other large American cities that digitized fast without building governance frameworks to match. The difference here is the scale of the AI buildout happening simultaneously. The city's Digital Services team, based at City Hall, is now drafting a records retention policy specifically for image files, with a public comment period expected to open in fall 2026.

For residents, the practical upshot is straightforward: public records requests involving photographs — common in planning disputes and building-code enforcement cases in neighborhoods like the Mission and the Outer Sunset — should become faster to fulfill once duplicates are cleared and archives are indexed properly. The deduplication work is unglamorous, slow, and largely invisible. But it is the necessary ground-clearing before any of the city's more ambitious digital government projects can be built on solid footing.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.