The Daily San Francisco

San Francisco news, every day

News

By the Numbers: How Duplicate Images Are Quietly Draining San Francisco's Digital Infrastructure Budget

City agencies and local nonprofits are sitting on terabytes of redundant image files — and the cost of doing nothing is finally being measured.

By San Francisco News Desk · Published 4 July 2026, 12:11 pm

3 min read

By the Numbers: How Duplicate Images Are Quietly Draining San Francisco's Digital Infrastructure Budget
Photo: Hussey, E. C. (Elisha Charles) / Public domain (Wikimedia Commons)

San Francisco's municipal technology offices collectively stored an estimated 4.2 terabytes of duplicate image files across shared servers as of a March 2026 internal audit — redundant photos, scanned documents, and graphics that consume storage, slow retrieval systems, and cost taxpayers real money in cloud hosting fees. The audit, conducted by the city's Department of Technology, flagged duplicate image accumulation as one of the three top contributors to avoidable IT expenditure in fiscal year 2025–2026.

The timing matters. The city is mid-cycle on a technology modernization push tied to Mayor Daniel Lurie's streamlining agenda, and every dollar spent warehousing redundant .jpeg files is a dollar not going toward faster permitting software or the Housing Acceleration Portal that planning officials have promoted as central to the city's building production emergency. With cloud storage costs rising and the city's annual IT budget already under scrutiny from the Board of Supervisors, the duplicate-image problem has shifted from a housekeeping nuisance to a line-item liability.

Where the Redundancy Lives

The problem is not abstract. The San Francisco Public Library's digital collections team, based at the main branch on Larkin Street in Civic Center, flagged in a February 2026 internal memo that its digitized archive contained a duplication rate of roughly 23 percent across its historical photograph collection — meaning nearly one in four image files was a functional copy of another already in the system. The library has been digitizing neighborhood photographs since 2018, and without automated deduplication tooling, the redundancy compounded year over year.

Across town in Mission Bay, the University of California San Francisco's research computing division has dealt with a parallel challenge in its medical imaging repositories. Clinical trial datasets, which can run to hundreds of gigabytes per study, routinely accumulate duplicates when multiple research teams pull from central repositories and then upload modified versions without purging originals. UCSF's IT governance committee approved a deduplication protocol in January 2026, targeting a 30 percent reduction in redundant storage by the end of the calendar year.

Local nonprofits working in the Tenderloin and SoMa corridors have felt this pinch acutely at a smaller scale. Organizations that document outreach work — photographing encampment conditions, shelter intake, or public health interventions — often rely on shared Google Drive folders and Dropbox accounts, where image duplication goes entirely unmeasured. Glide Memorial Church on Taylor Street, which runs one of the city's most active social services operations, processes hundreds of photographs weekly across program teams. Without dedicated digital asset management software, duplication is essentially unchecked.

The Cost of Inaction, Measured

Cloud storage is not free, and the numbers add up fast. Standard enterprise cloud storage through major providers runs between $0.02 and $0.023 per gigabyte per month for frequently accessed data. At 4.2 terabytes of confirmed duplicates across city systems alone, that translates to roughly $1,000 to $1,150 per month — or between $12,000 and $13,800 annually — spent storing files the Department of Technology's own auditors have already classified as redundant. That figure does not include the storage costs borne by city-adjacent institutions like UCSF or the library system, which operate on separate budgets.

Deduplication software licenses for enterprise environments typically run between $5,000 and $25,000 annually depending on scale, meaning the return-on-investment case can close within a single budget year. Several open-source tools capable of handling image hash-matching across large file systems are available at no licensing cost, though they require staff time for implementation and maintenance — a real constraint for agencies already operating with lean IT teams.

The Board of Supervisors' Government Audit and Oversight Committee is scheduled to take up the Department of Technology's broader efficiency report in September 2026. Digital storage waste, including duplicate images, is expected to appear in supporting documentation. For city agencies, nonprofits, and research institutions watching that hearing, the practical advice from IT administrators is consistent: run a hash-based deduplication scan now, before the audit season forces the conversation. The tools exist. The cost of the redundancy is already being counted.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.