The Daily San Francisco

San Francisco news, every day

News

SF's Digital Housekeeping Crisis: The Numbers Behind the City's Duplicate Image Problem

From city agency servers to nonprofit databases, redundant image files are eating storage budgets and slowing down the public-facing tech infrastructure San Francisco depends on.

By San Francisco News Desk · Published 4 July 2026, 11:28 am

3 min read

San Francisco's public agencies and nonprofits are sitting on a quiet data problem that is costing real money. Across city departments — from the Department of Technology's facilities on Bryant Street to the San Francisco Public Library's digital collections division — duplicate image files account for a measurable share of bloated storage costs, and the numbers are finally getting attention as budget pressures tighten heading into fiscal year 2027.

The issue matters now because San Francisco is in the middle of an accelerated push to digitize public records, housing permit applications, and social services documentation. The city's DataSF program, which maintains open data portals used by researchers and residents alike, has seen its hosted dataset count grow substantially since 2020. More uploads, more image attachments, more redundancy — and at cloud storage rates that have held between $0.02 and $0.08 per gigabyte per month depending on the tier and vendor, even modest duplication across a large archive adds up fast.

What the Data Actually Shows

Industry benchmarks from digital asset management research suggest that duplicate and near-duplicate image files can represent anywhere from 20 to 40 percent of total image storage in large institutional repositories that lack automated deduplication workflows. For a mid-sized city agency storing several hundred terabytes of documents and photographs — a realistic figure for a department like San Francisco's Planning Department, which processes thousands of permit applications annually — that redundancy could translate to tens of thousands of dollars in avoidable annual storage costs.

The San Francisco Arts Commission, which maintains an archive of public art documentation including photographs of the more than 4,000 works in the city's collection, is one organization where the duplicate image problem is structurally baked in. Event photos, press images, and installation documentation are routinely submitted by multiple contractors, artists, and staff members without a central deduplication check. The Tenderloin-based nonprofit TechEquity Collaborative flagged similar issues in a 2024 working group examining how smaller Bay Area nonprofits manage digital infrastructure — noting that redundant files were a consistent drain on limited IT budgets at organizations that couldn't afford dedicated data engineers.

At the San Francisco Main Library on Larkin Street, the Digital Collections team has been working since 2023 to catalog and clean historical photograph archives, a project that involves identifying thousands of duplicate scans created during earlier digitization runs. Deduplication tools — some open-source, some licensed — are central to that workflow, but they require staff time and technical expertise that many smaller branches and partner institutions lack.

Tools, Costs, and What Agencies Are Doing

Automated duplicate detection software ranges widely in price and capability. Open-source tools like dupeGuru are free but require manual review at scale. Commercial digital asset management platforms marketed to municipal governments — including products from vendors like Bynder and Canto — typically run between $15,000 and $60,000 annually for enterprise licensing, depending on storage volume and user count. For a city department already squeezed by the budget dynamics playing out at City Hall, that upfront cost can stall adoption even when the long-term savings are clear on paper.

The Department of Technology, which operates the city's core IT infrastructure from its Seventh Street offices, has incorporated deduplication as a standard recommendation in its cloud migration guidance documents. But adoption across the roughly 50 city departments and dozens of funded nonprofit contractors is uneven.

Practically speaking, organizations looking to address the problem should start with a storage audit before purchasing any tool. Free utilities can generate hash-based reports — matching files by content rather than file name — that give a clear picture of actual redundancy rates before any dollar is spent on a commercial solution. For city-contracted nonprofits operating out of neighborhoods like the Mission or SoMa, that first audit step costs nothing but time and can produce immediate, actionable savings. The fiscal year 2027 budget cycle, which San Francisco departments will begin formal planning for this fall, is the natural forcing function for agencies that have been putting off this particular piece of digital housekeeping.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.