The Daily San Francisco

San Francisco news, every day

News

SF's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

City agencies, nonprofits, and tech-adjacent institutions across San Francisco are spending millions managing redundant image files, and new data is starting to quantify exactly how bad the problem has become.

By San Francisco News Desk · Published 4 July 2026, 11:36 am

4 min read

SF's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
Photo: Dall, William Healey, 1845-1927 / Public domain (Wikimedia Commons)

San Francisco's public agencies and cultural institutions collectively store tens of thousands of duplicate digital images across fragmented servers, costing the city and its nonprofit partners a combined estimated hundreds of thousands of dollars annually in redundant storage contracts — a problem that specialists in digital asset management say has quietly metastasized since remote work scattered file-handling responsibilities across departments beginning in 2020.

The issue matters acutely right now because the city is mid-way through a series of infrastructure modernization pushes, including digitization contracts tied to the San Francisco Public Library's ongoing archive expansion at the Main Branch on Larkin Street and the San Francisco Arts Commission's efforts to catalog public art holdings. Both programs depend on clean, deduplicated image databases. When those databases are bloated with redundant files, search times slow, staff hours stack up, and storage bills climb — all at a moment when city department budgets are under pressure.

The Scope of the Problem, by the Numbers

Digital storage is cheap in isolation. A single terabyte of cloud storage runs roughly $20 to $25 per month through major commercial providers. The problem is scale. A mid-size city agency managing a photographic archive can accumulate duplicate-image rates of 30 to 40 percent across its holdings when staff upload assets without a central deduplication protocol — meaning that for every 100 images stored, up to 40 are redundant copies of files already in the system. Multiply that across a dozen departments, each running separate contracts, and the waste compounds fast.

The San Francisco Public Utilities Commission, which manages a substantial internal media library for infrastructure documentation, and the Office of Community Investment and Infrastructure, which photographs redevelopment sites across neighborhoods including Hunters Point and Mission Bay, both maintain large digital image repositories. Neither agency has publicly disclosed its precise storage expenditure, and requests for those figures were not returned by deadline. But comparable municipal operations in cities of similar size have documented duplicate-image overhead consuming 15 to 20 percent of total digital storage budgets, according to published case studies from the Coalition for Networked Information, a Washington-based nonprofit that tracks institutional data management.

The financial hit isn't only in storage costs. Staff time spent manually identifying and removing duplicate files — a process that at institutions lacking automated deduplication tools still often happens by hand — can run to dozens of hours per quarter per department. At average city employee compensation rates in San Francisco, which the Controller's Office has reported averaging above $120,000 annually in fully-loaded salary and benefits for mid-grade administrative positions, that labor cost is not trivial.

What Institutions Are Doing About It

The San Francisco Museum of Modern Art on Third Street undertook a database audit of its digital image holdings in 2024 and found significant redundancy in its rights-and-reproduction files, though the museum has not released specific figures publicly. The Internet Archive, headquartered on Funston Avenue in the Richmond District and one of the most consequential digital preservation organizations in the world, has publicly documented its use of hash-based deduplication — a method that identifies identical files by generating a unique numerical fingerprint for each image — as a core component of its petabyte-scale storage strategy.

Hash-based deduplication and AI-assisted image recognition tools are increasingly accessible, with enterprise-grade platforms now offering deduplication services starting around $500 per month for institutional users. Several San Francisco-based startups, operating out of offices in SoMa and the Mid-Market corridor, have built products targeting exactly this market, riding the broader AI infrastructure wave that has partially offset tech-sector layoffs in the city since 2024.

For city agencies and nonprofits looking to get ahead of the problem, the practical path starts with a storage audit — a full inventory of image assets by file type, upload date, and department of origin. The San Francisco Digital Services office, which coordinates technology standards across city departments from its offices at 1 Dr. Carlton B. Goodlett Place in Civic Center, has been developing data governance guidelines that could eventually include deduplication standards. Departments that wait for a mandated policy risk compounding costs further. The longer duplicate files accumulate, the more expensive the eventual cleanup becomes — both in staff hours and in the forensic work required to determine which version of a given image is the authoritative one worth keeping.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.