The Daily San Francisco

San Francisco news, every day

News

By the Numbers: How Duplicate Images Are Quietly Draining SF City Websites and Agency Databases

Redundant digital assets cost San Francisco municipal agencies thousands of staff hours and real storage dollars each year — and a reckoning is overdue.

By San Francisco News Desk · Published 4 July 2026, 11:51 am

3 min read

By the Numbers: How Duplicate Images Are Quietly Draining SF City Websites and Agency Databases
Photo: Swaysgood, Susan, Mrs / Public domain (Wikimedia Commons)

San Francisco's city agencies collectively manage tens of thousands of digital image files across dozens of public-facing websites, internal databases, and archival systems — and a significant share of those files are exact or near-exact duplicates. The Department of Technology, which oversees the city's IT infrastructure from its offices on South Van Ness Avenue, has flagged duplicate image accumulation as one of the leading causes of bloated storage costs in its annual infrastructure reviews. The problem is neither glamorous nor headline-grabbing, but the numbers behind it are harder to ignore than the agencies that inherit them.

The timing matters. San Francisco is in the middle of a citywide digital modernization push, partly driven by a broader effort to cut administrative overhead as the city confronts a projected structural budget deficit that the Budget and Legislative Analyst's Office has placed in the hundreds of millions of dollars over the next two fiscal cycles. Cleaning up redundant digital assets — images uploaded twice, stock photography saved in multiple folders, archival photos duplicated across SharePoint and legacy content management systems — is one of the lower-profile line items in that effort. But IT officials have argued internally that it compounds costs at scale.

What the Storage Bills Actually Show

Cloud storage pricing gives the problem a concrete shape. Standard object storage through major providers runs roughly $0.023 per gigabyte per month. That sounds trivial until you account for the scale at which city agencies operate. The San Francisco Municipal Transportation Agency alone manages photo libraries tied to dozens of active capital projects, from the Van Ness Bus Rapid Transit corridor to the ongoing Geary Boulevard improvement work. When project teams upload progress photos without deduplication protocols, libraries balloon. A single construction project can generate several hundred gigabytes of overlapping imagery within 18 months, according to general patterns documented in public sector IT audits from comparable cities.

The city's 311 Customer Service Center, which processes service requests and stores associated imagery — pothole photos, graffiti documentation, encampment reports — has seen its image database grow substantially since the platform expanded to accept mobile uploads in 2019. Duplicate submissions, where residents photograph the same issue minutes apart, are a known and documented phenomenon in 311-style systems nationally. San Francisco's version, run through the SF311 app, does not currently deploy automated hash-based deduplication at the point of upload, which means identical or near-identical JPEG files accumulate in the backend.

The Fix Is Algorithmic — and Already Proven Elsewhere

Deduplication at the database level is not experimental technology. Perceptual hashing — algorithms that generate a fingerprint for each image and flag near-matches — has been standard in enterprise content management since the mid-2010s. The San Francisco Public Library's digital collections team at the Civic Center branch piloted a modest version of this approach in 2023 when it digitized the San Francisco History Center's photograph archives, reducing a preliminary scan batch by roughly 18 percent through automated duplicate flagging before human review. The library is a small-scale example, but it proves the city's own workforce can implement the method.

The Department of Technology is reportedly weighing whether to mandate deduplication standards as part of the next iteration of the citywide Digital Services Strategy, which was last updated in 2024. Adoption would require agencies to retrofit existing content management platforms — no trivial task for departments still running legacy systems — but the cost-benefit math, based on storage pricing and staff-hour audits, favors action. For an agency managing 10 terabytes of imagery with a 15 percent duplication rate, eliminating redundant files could cut annual storage costs by several thousand dollars. Multiplied across a dozen agencies, the savings become material.

For city residents, the practical upshot is straightforward: if you're filing a 311 request, filing a single clear photo rather than multiple angles of the same pothole on 24th Street or a broken streetlight near Dolores Park saves the city's systems real processing overhead. For agency IT managers, the push now is to treat duplicate images the way the city has begun treating duplicate vendor payments — as a solvable, quantifiable leak worth plugging before the next budget cycle closes.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.