By the Numbers: How Duplicate Images Are Quietly Draining SF City Websites and Agency Databases
Redundant digital assets cost San Francisco municipal agencies thousands of staff hours and real storage dollars each year — and a reckoning is overdue.
Redundant digital assets cost San Francisco municipal agencies thousands of staff hours and real storage dollars each year — and a reckoning is overdue.

San Francisco's city agencies collectively manage tens of thousands of digital image files across dozens of public-facing websites, internal databases, and archival systems — and a significant share of those files are exact or near-exact duplicates. The Department of Technology, which oversees the city's IT infrastructure from its offices on South Van Ness Avenue, has flagged duplicate image accumulation as one of the leading causes of bloated storage costs in its annual infrastructure reviews. The problem is neither glamorous nor headline-grabbing, but the numbers behind it are harder to ignore than the agencies that inherit them.
The timing matters. San Francisco is in the middle of a citywide digital modernization push, partly driven by a broader effort to cut administrative overhead as the city confronts a projected structural budget deficit that the Budget and Legislative Analyst's Office has placed in the hundreds of millions of dollars over the next two fiscal cycles. Cleaning up redundant digital assets — images uploaded twice, stock photography saved in multiple folders, archival photos duplicated across SharePoint and legacy content management systems — is one of the lower-profile line items in that effort. But IT officials have argued internally that it compounds costs at scale.
Cloud storage pricing gives the problem a concrete shape. Standard object storage through major providers runs roughly $0.023 per gigabyte per month. That sounds trivial until you account for the scale at which city agencies operate. The San Francisco Municipal Transportation Agency alone manages photo libraries tied to dozens of active capital projects, from the Van Ness Bus Rapid Transit corridor to the ongoing Geary Boulevard improvement work. When project teams upload progress photos without deduplication protocols, libraries balloon. A single construction project can generate several hundred gigabytes of overlapping imagery within 18 months, according to general patterns documented in public sector IT audits from comparable cities.
The city's 311 Customer Service Center, which processes service requests and stores associated imagery — pothole photos, graffiti documentation, encampment reports — has seen its image database grow substantially since the platform expanded to accept mobile uploads in 2019. Duplicate submissions, where residents photograph the same issue minutes apart, are a known and documented phenomenon in 311-style systems nationally. San Francisco's version, run through the SF311 app, does not currently deploy automated hash-based deduplication at the point of upload, which means identical or near-identical JPEG files accumulate in the backend.
Deduplication at the database level is not experimental technology. Perceptual hashing — algorithms that generate a fingerprint for each image and flag near-matches — has been standard in enterprise content management since the mid-2010s. The San Francisco Public Library's digital collections team at the Civic Center branch piloted a modest version of this approach in 2023 when it digitized the San Francisco History Center's photograph archives, reducing a preliminary scan batch by roughly 18 percent through automated duplicate flagging before human review. The library is a small-scale example, but it proves the city's own workforce can implement the method.
The Department of Technology is reportedly weighing whether to mandate deduplication standards as part of the next iteration of the citywide Digital Services Strategy, which was last updated in 2024. Adoption would require agencies to retrofit existing content management platforms — no trivial task for departments still running legacy systems — but the cost-benefit math, based on storage pricing and staff-hour audits, favors action. For an agency managing 10 terabytes of imagery with a 15 percent duplication rate, eliminating redundant files could cut annual storage costs by several thousand dollars. Multiplied across a dozen agencies, the savings become material.
For city residents, the practical upshot is straightforward: if you're filing a 311 request, filing a single clear photo rather than multiple angles of the same pothole on 24th Street or a broken streetlight near Dolores Park saves the city's systems real processing overhead. For agency IT managers, the push now is to treat duplicate images the way the city has begun treating duplicate vendor payments — as a solvable, quantifiable leak worth plugging before the next budget cycle closes.
How does this story make you feel?
Spread the word
About this article
Published by The Daily San Francisco
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News