The Daily San Francisco

San Francisco news, every day

News

San Francisco's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

City agencies and nonprofits across San Francisco are spending tens of thousands of dollars annually managing redundant photo files, and new data shows the problem is getting worse as AI-era storage demands balloon.

By San Francisco News Desk · Published 4 July 2026, 11:28 am

3 min read

San Francisco's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
Photo: Wikimedia Commons / Public domain (Wikimedia Commons)

San Francisco's public agencies collectively store an estimated 40 to 60 percent redundant image files across their digital asset systems, according to a review of municipal technology audits and open-records filings from the past fiscal year. The duplication problem is not a housekeeping nuisance. It is eating through storage budgets at a time when the city is already under fiscal pressure, and departments from the SF Planning Commission on Mission Street to the San Francisco Public Library's digital collections unit at Larkin and Fulton are all wrestling with the same underlying data management failure.

The issue has sharpened over the past 18 months as AI-driven workflows — image tagging, accessibility compliance tools, automated content ingestion — have accelerated the rate at which digital assets pour into city servers. Every time a contractor submits a permit application with attached site photos, every time a Muni communications staffer uploads a press image for a new fleet announcement, the probability of redundant files landing in an unmanaged repository goes up. Without systematic deduplication protocols, storage costs compound quietly until a budget review forces the conversation.

What the Storage Bills Actually Show

Cloud storage is not cheap at institutional scale. Enterprise-tier object storage — the kind used by city agencies and large nonprofits — runs roughly $0.02 to $0.023 per gigabyte per month on major platforms. A department sitting on 50 terabytes of images, a quarter of which are true duplicates or near-duplicates, is wasting the equivalent of $3,000 to $3,500 a year in raw storage alone before factoring in bandwidth, backup redundancy, and the staff hours required to manually sort through assets. Multiply that across a dozen city departments, and the annual waste estimate climbs past $100,000 with relative ease.

The San Francisco Arts Commission, which maintains a growing digital archive of public art installations across neighborhoods from the Excelsior to the Embarcadero, has been piloting a deduplication workflow since late 2025. The commission manages image records tied to more than 4,000 individual public artworks. When similar programs at comparable municipal arts agencies have run automated deduplication sweeps, they have found redundancy rates between 35 and 55 percent in legacy collections — numbers consistent with what technology officers at SF Digital Services have described in public budget presentations as a systemic challenge across city infrastructure.

The nonprofit sector in San Francisco faces the same arithmetic. Organizations like Tenderloin Housing Clinic, which documents housing conditions and legal casework across hundreds of SRO properties in the Tenderloin and South of Market, accumulate photographic evidence files at high volume. A single building inspection might generate 80 to 120 images, many of them near-identical frames shot seconds apart. Without automated duplicate detection — tools that now cost as little as $500 to $2,000 per year for mid-sized organizations — those files pile up in shared drives indefinitely.

The Fix Exists. Adoption Has Lagged.

Deduplication technology is mature and relatively inexpensive. Hash-based detection, which compares unique file fingerprints, catches exact duplicates in seconds. Perceptual hashing algorithms can identify near-duplicates — the same photo in two different resolutions, or a cropped versus uncropped version — with accuracy rates above 90 percent in most commercial implementations. The City of San Francisco's Department of Technology published updated digital asset management guidelines in March 2026, recommending that all agencies integrate deduplication scanning into standard ingest workflows by the end of fiscal year 2026-27.

For residents and organizations trying to get ahead of the problem now, several steps are immediately actionable. Google Photos, Adobe Lightroom, and open-source tools like dupeGuru offer free or low-cost deduplication scanning for collections under 100,000 files. Larger operations should evaluate enterprise digital asset management platforms — many of which bundle deduplication as a standard feature — before the city's fiscal year deadline arrives in June 2027. The SF Public Library's Technology and Society program at the main branch on Larkin Street periodically runs digital literacy workshops that cover basic file management, and the next session is scheduled for late July 2026. Attending one costs nothing. Ignoring the storage bill eventually does.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.