The Daily San Francisco

San Francisco news, every day

News

SF Officials and Tech Experts Say Duplicate Image Problem Is Costing City Agencies Real Money

From the Department of Building Inspection to Muni's maintenance yards, redundant digital files are quietly draining storage budgets and slowing down the city's AI ambitions.

By San Francisco News Desk · Published 4 July 2026, 11:35 am

4 min read

SF Officials and Tech Experts Say Duplicate Image Problem Is Costing City Agencies Real Money
Photo: Committee on Ways and Means / Public domain (Wikimedia Commons)

San Francisco's push to digitize public records — accelerated sharply after the pandemic — has produced an unexpected mess: city agencies are sitting on millions of duplicate image files, inflating cloud storage costs and complicating the AI-readiness projects that Mayor Daniel Lurie's administration has been championing since January. The problem, long dismissed as routine data housekeeping, is now drawing serious attention from city technology officers and private-sector consultants who work with municipal governments.

The timing matters. San Francisco is mid-way through a broader digital infrastructure overhaul tied to the 2025 budget cycle, and the Department of Technology has been under pressure to justify cloud spending that has grown substantially since the city began migrating records off aging on-premise servers. Duplicate image files — document scans, permit photos, inspection records — compound that cost at every renewal cycle with major cloud vendors.

Where the Problem Shows Up

The Department of Building Inspection on Otis Street in SoMa is one of the sharpest examples. Permit applications routinely generate multiple image attachments — site photos, architectural drawings, compliance records — and when applicants resubmit corrected filings, earlier versions frequently remain in the system rather than being flagged or purged. The same dynamic plays out at SF Public Works, which manages tens of thousands of infrastructure inspection records tied to streets, sidewalks and drainage systems across neighborhoods from the Excelsior to the Richmond District.

At Muni, technicians working with the Transit Management Center near Cesar Chavez Street deal with a parallel version of the challenge. Camera stills pulled from vehicle incident reports get stored across multiple departmental folders without a unified deduplication protocol, according to a Department of Technology briefing document circulated to the SF Board of Supervisors' Government Audit and Oversight Committee earlier this year. The document did not name specific cost figures but described the duplication issue as a contributing factor to storage overruns.

Private technology firms that have pitched AI-assisted records management to the city say duplicate images are a front-line obstacle. The reason is mechanical: machine-learning models trained on city data perform worse when the same image appears dozens of times with slightly different metadata, skewing training datasets and producing unreliable outputs. Companies competing for city contracts under the Office of Contract Administration's procurement portal have flagged the issue in formal responses to requests for proposals issued in the first half of 2026.

What Experts Are Recommending

The consensus among records management specialists who have worked with Bay Area municipal governments is that deduplication needs to happen before — not after — agencies commit to AI tool deployments. The general approach involves perceptual hashing, a technique that identifies visually identical or near-identical images even when file names and metadata differ. Several vendors have demonstrated the method to the Department of Technology's innovation team at City Hall, at Civic Center.

The San Francisco Public Library system, which digitized roughly 1.2 million historical photographs through its San Francisco History Center on Larkin Street, undertook its own deduplication effort in 2023 and found that approximately 18 percent of its digitized image catalog contained files that were exact or near-exact copies. That figure — drawn from a library system presentation to the City Librarian's advisory council — gives a rough benchmark for the scale other departments might expect when they begin auditing their own holdings.

Cost is the practical lever. Cloud storage pricing from major providers has not dropped as steeply as the industry projected five years ago, and city IT budgets have not kept pace with the volume of files being generated by expanded permitting, transit monitoring and social services documentation. Every percentage point reduction in stored image volume translates to a direct budget saving at the next contract renewal.

City officials have not announced a formal deduplication program as of Independence Day 2026, but the Department of Technology is expected to include image-management standards in an updated data governance policy due to the Mayor's Office before the end of the third quarter. Departments that begin internal audits now — cataloging what they hold, where duplicates cluster, and which vendors have access to redundant files — will be better positioned to comply quickly and avoid the scramble that typically accompanies new citywide mandates.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.