San Francisco's public agencies and cultural institutions collectively store tens of thousands of duplicate digital images across fragmented servers, costing the city and its nonprofit partners a combined estimated hundreds of thousands of dollars annually in redundant storage contracts — a problem that specialists in digital asset management say has quietly metastasized since remote work scattered file-handling responsibilities across departments beginning in 2020.
The issue matters acutely right now because the city is mid-way through a series of infrastructure modernization pushes, including digitization contracts tied to the San Francisco Public Library's ongoing archive expansion at the Main Branch on Larkin Street and the San Francisco Arts Commission's efforts to catalog public art holdings. Both programs depend on clean, deduplicated image databases. When those databases are bloated with redundant files, search times slow, staff hours stack up, and storage bills climb — all at a moment when city department budgets are under pressure.
The Scope of the Problem, by the Numbers
Digital storage is cheap in isolation. A single terabyte of cloud storage runs roughly $20 to $25 per month through major commercial providers. The problem is scale. A mid-size city agency managing a photographic archive can accumulate duplicate-image rates of 30 to 40 percent across its holdings when staff upload assets without a central deduplication protocol — meaning that for every 100 images stored, up to 40 are redundant copies of files already in the system. Multiply that across a dozen departments, each running separate contracts, and the waste compounds fast.
The San Francisco Public Utilities Commission, which manages a substantial internal media library for infrastructure documentation, and the Office of Community Investment and Infrastructure, which photographs redevelopment sites across neighborhoods including Hunters Point and Mission Bay, both maintain large digital image repositories. Neither agency has publicly disclosed its precise storage expenditure, and requests for those figures were not returned by deadline. But comparable municipal operations in cities of similar size have documented duplicate-image overhead consuming 15 to 20 percent of total digital storage budgets, according to published case studies from the Coalition for Networked Information, a Washington-based nonprofit that tracks institutional data management.
The financial hit isn't only in storage costs. Staff time spent manually identifying and removing duplicate files — a process that at institutions lacking automated deduplication tools still often happens by hand — can run to dozens of hours per quarter per department. At average city employee compensation rates in San Francisco, which the Controller's Office has reported averaging above $120,000 annually in fully-loaded salary and benefits for mid-grade administrative positions, that labor cost is not trivial.
What Institutions Are Doing About It
The San Francisco Museum of Modern Art on Third Street undertook a database audit of its digital image holdings in 2024 and found significant redundancy in its rights-and-reproduction files, though the museum has not released specific figures publicly. The Internet Archive, headquartered on Funston Avenue in the Richmond District and one of the most consequential digital preservation organizations in the world, has publicly documented its use of hash-based deduplication — a method that identifies identical files by generating a unique numerical fingerprint for each image — as a core component of its petabyte-scale storage strategy.
Hash-based deduplication and AI-assisted image recognition tools are increasingly accessible, with enterprise-grade platforms now offering deduplication services starting around $500 per month for institutional users. Several San Francisco-based startups, operating out of offices in SoMa and the Mid-Market corridor, have built products targeting exactly this market, riding the broader AI infrastructure wave that has partially offset tech-sector layoffs in the city since 2024.
For city agencies and nonprofits looking to get ahead of the problem, the practical path starts with a storage audit — a full inventory of image assets by file type, upload date, and department of origin. The San Francisco Digital Services office, which coordinates technology standards across city departments from its offices at 1 Dr. Carlton B. Goodlett Place in Civic Center, has been developing data governance guidelines that could eventually include deduplication standards. Departments that wait for a mandated policy risk compounding costs further. The longer duplicate files accumulate, the more expensive the eventual cleanup becomes — both in staff hours and in the forensic work required to determine which version of a given image is the authoritative one worth keeping.