The Daily San Francisco

San Francisco news, every day

News

SF's Digital Archives Are Full of Duplicate Images — Here's What Officials and Experts Are Saying About Fixing It

City agencies, archivists, and tech sector veterans are weighing in on a quiet but costly data problem plaguing San Francisco's public records and cultural institutions.

By San Francisco News Desk · Published 4 July 2026, 11:36 am

4 min read

SF's Digital Archives Are Full of Duplicate Images — Here's What Officials and Experts Are Saying About Fixing It
Photo: Mercantile Library Association (San Francisco, Calif.) Moore, Horace H / Public domain (Wikimedia Commons)

San Francisco's public digital repositories are carrying a hidden burden: thousands of duplicate images cluttering city databases, slowing retrieval times, and burning through storage budgets that are already under pressure. From the San Francisco Public Library's digital archive on Larkin Street to the Planning Department's permit photo systems on Stevenson Street, the redundancy problem has become impossible to ignore — and officials are starting to say so publicly.

The issue is more urgent than it sounds. The city's Department of Technology, which oversees shared infrastructure for dozens of municipal agencies, has been consolidating cloud storage contracts since early 2025 as part of a broader cost-reduction effort tied to the post-pandemic budget crunch. When storage bills reflect terabytes of repeated image files — the same permit photo uploaded four times, the same scanned historical document duplicated across three departments — the waste becomes a budget line item, not just a tidiness problem.

What the Experts Are Pointing To

Archivists at the San Francisco History Center, housed inside the Main Library at Civic Center, have been contending with duplicate digitisation for years. The problem accelerated during the pandemic, when remote workers uploaded files without the benefit of centralised quality control. Digital preservation specialists in the field generally point to the absence of a unified deduplication policy — a written standard requiring agencies to run hash-based comparison checks before any image is added to a shared repository — as the root cause. No such citywide policy currently exists in San Francisco's published technology standards, according to the Department of Technology's publicly available policy library as of this writing.

The tech sector has plenty to say. Veterans of companies like Dropbox, which was founded in San Francisco and still maintains offices in the SoMa district, have long evangelised deduplication as a baseline engineering practice. The principle is straightforward: assign each image a unique cryptographic fingerprint and refuse to store a second copy if that fingerprint already exists. Applied to municipal systems, the approach could theoretically reduce redundant storage by a substantial fraction — independent studies of comparable municipal archives in cities like Chicago and New York have found duplicate rates ranging from 15 to 40 percent of total image holdings, depending on how long the archive has been growing without automated checks.

At the local government level, the SF Controller's Office released a technology efficiency audit in March 2026 covering fiscal year 2024-25 that flagged cloud storage as one of three areas where inter-agency coordination failures were generating preventable costs. The audit did not quantify the specific cost of image duplication, but it recommended that the Department of Technology develop asset deduplication standards by the end of calendar year 2026. That deadline is now six months away.

Practical Pressure From the Ground Up

The pressure is also coming from institutions that handle San Francisco's visual cultural record. The San Francisco Museum of Modern Art, on Third Street in SoMa, runs its own digital asset management system separate from city infrastructure, but curators and registrars at institutions like SFMOMA have long dealt with the same problem in-house. Industry groups like the Museum Computer Network have published deduplication guidelines since at least 2019, and archivists at smaller organisations — including community archives in the Tenderloin and the Mission District — often lack the staff or software budget to implement them.

For city residents, the practical stakes show up in slow public records searches and, occasionally, in permit disputes where conflicting copies of the same inspection photograph create confusion about which version is official. The Planning Department processes tens of thousands of permit applications per year, many of them accompanied by multiple image uploads from contractors and homeowners.

If the Department of Technology meets the Controller's March 2026 recommendation and finalises a deduplication standard before the end of the year, agencies would likely be given a 12-to-18-month implementation window. That puts any meaningful reduction in redundant storage — and the savings that come with it — sometime in 2027 or 2028. In the meantime, archivists and tech policy advocates say the most useful thing individual departments can do is audit their own holdings and stop uploading files without first checking whether the image already exists in the system.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.