The Daily San Francisco

San Francisco news, every day

News

The Numbers Behind SF's Duplicate Image Problem: How Redundant Digital Assets Are Costing City Agencies Millions

San Francisco's sprawling network of public agencies is sitting on a mountain of duplicate digital images — and the bill for storing and managing them keeps climbing.

By San Francisco News Desk · Published 4 July 2026, 11:44 am

3 min read

San Francisco's Department of Technology estimates the city's unified digital asset infrastructure now hosts more than 14 million image files across shared servers maintained out of its Civic Center operations hub — and a significant portion of those files are exact or near-exact duplicates. The redundancy isn't just a storage headache. It translates directly into wasted budget, slower public-facing websites, and stalled modernization efforts at agencies that are already stretched thin.

The timing matters because the city's 10-year Digital Equity and Infrastructure Plan, adopted in 2023, set a hard target of reducing unnecessary data storage costs by 20 percent before the end of fiscal year 2026-27. With Fourth of July weekend marking the midpoint of the fiscal calendar, city tech teams are under pressure to show measurable progress — and duplicate image replacement has become a central metric in that accounting.

What the Data Actually Shows

Across the San Francisco Municipal Transportation Agency, the Department of Public Health, and the Office of Economic and Workforce Development — all of which maintain separate content management systems — internal audits conducted in early 2026 identified duplication rates ranging from 18 percent to as high as 34 percent of stored image assets, according to budget documents reviewed by The Daily San Francisco. For SFMTA alone, whose digital properties span the sfmta.com platform and internal operations tools used at the Muni Metro Embarcadero Station operations center, that means tens of thousands of redundant image files occupying cloud storage that costs the agency an estimated $0.023 per gigabyte per month under its current AWS contract.

The math adds up fast. A single high-resolution transit map image duplicated 400 times across departmental folders — a pattern auditors flagged repeatedly — consumes the same storage as thousands of unique documents. Multiply that across 50-plus city departments and the cumulative cost runs well into six figures annually, purely for files that should have been replaced or consolidated years ago.

The San Francisco Public Library's digital archive program, housed at the Main Library on Larkin Street, ran its own deduplication pilot in the first quarter of 2026 covering its online photo collections. The pilot removed 62,000 duplicate image records from a database of roughly 310,000 files — a 20 percent reduction — and cut the library's cloud storage bill for that archive by $8,400 on an annualized basis, according to figures the library submitted to the Budget and Legislative Analyst's office in April.

What Comes Next for City Systems

The Department of Technology is now piloting an automated duplicate-detection tool across three agencies — Building Inspection, Planning, and the Recreation and Parks Department — with the Rec and Parks rollout focused initially on image libraries managed out of McLaren Lodge in Golden Gate Park. The tool uses perceptual hashing, a technique that flags visually identical or near-identical images even when file names or metadata differ, which standard deduplication software misses.

For the broader tech sector watching from SoMa and Mission Bay, where AI infrastructure companies have made perceptual hashing a competitive battleground, the city's adoption signals a real municipal market. Several firms with offices along Brannan Street and in the Dogpatch have already submitted proposals under the city's small-business contracting portal.

For residents and small nonprofits navigating the city's grant and permitting portals — systems that frequently time out because of bloated backend asset libraries — the practical payoff could be faster load times and fewer errors when uploading documents. The Department of Technology says it expects the three-agency pilot to be complete by October 31, 2026, with a citywide rollout assessment due to the Board of Supervisors by January 2027. Whether the 20 percent cost-reduction target gets hit on time depends almost entirely on how quickly the deduplication tool can be trained on each agency's unique file-naming conventions — and how many legacy images were uploaded, forgotten, and uploaded again.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.