The Daily San Francisco

San Francisco news, every day

News

SF's Digital Archive Reckoning: Key Decisions Ahead as Duplicate Image Crisis Forces City's Hand

San Francisco's public agencies and cultural institutions are staring down a costly, time-sensitive cleanup of redundant digital image libraries — and the choices made this summer will shape records management for years.

By San Francisco News Desk · Published 4 July 2026, 12:16 pm

3 min read

SF's Digital Archive Reckoning: Key Decisions Ahead as Duplicate Image Crisis Forces City's Hand
Photo: Photo by Brett Sayles on Pexels

San Francisco's city departments and public institutions are sitting on a sprawling, largely unaudited mess of duplicate digital images — redundant files spread across servers maintained by agencies from the Department of Public Works to the San Francisco Public Library system — and a growing consensus among records managers and civic technologists says the window to fix it cheaply is closing fast.

The issue has sharpened because of two converging pressures. City Hall's ongoing push to consolidate IT infrastructure under the Department of Technology's unified cloud migration program, which began phasing in earnest after a 2024 budget resolution, has exposed just how fragmented municipal image storage has become. At the same time, the AI tools now being deployed to tag, index, and surface archival photographs are producing garbage outputs when trained on duplicate-laden datasets — turning what was once a housekeeping problem into an operational one.

The Scale of the Problem Across City Agencies

The San Francisco Public Library, which maintains the San Francisco History Center at the main branch on Larkin Street in Civic Center, has publicly acknowledged ongoing digitization work across its legacy photographic collections. The History Center holds tens of thousands of physical prints and negatives, many of which have been digitized in multiple passes over the past two decades — sometimes by different vendors using different resolution standards — resulting in duplicate or near-duplicate image files sitting in separate folders with inconsistent metadata. The library's digital collections portal, accessible through SFPL.org, currently flags some duplicate entries in its public-facing catalog.

The San Francisco Arts Commission, headquartered on Van Ness Avenue, maintains a separate digital registry of publicly commissioned murals and sculptures — a collection that overlaps with photography archives held by the Planning Department and the Office of Economic and Workforce Development. Sources familiar with city IT operations, speaking in general terms about public records consolidation, have described the duplication problem as systemic rather than isolated, though no official audit figure has been made public.

Private institutions face similar crossroads. The Internet Archive, based on Funston Avenue in the Richmond District, has long grappled with deduplication at enormous scale across its Wayback Machine and media collections. Its open-source deduplication tooling — developed in-house and available publicly — has become a reference model for smaller Bay Area nonprofits and municipal archivists weighing their own cleanup strategies.

What the Next Six Months Will Decide

Three decisions are now on the table for San Francisco's public agencies, and each carries real fiscal consequences. First, whether to pursue automated deduplication using AI-assisted hashing tools — which can process large image libraries at relatively low per-unit cost but require upfront licensing or engineering time — or to contract out manual review, which is slower and more expensive but produces cleaner metadata. Second, which agency takes the lead. The Department of Technology has the infrastructure mandate but not the archival expertise; the Public Library and the Arts Commission have domain knowledge but limited server budgets. Third, whether the city adopts a single authoritative image repository before the next fiscal year begins July 1, 2027, or continues allowing each department to manage its own storage.

The fiscal math is not trivial. Cloud storage costs for municipal governments have risen sharply since 2023, and duplicate files directly inflate monthly bills. A department storing 10 terabytes of images when 3 terabytes would suffice after deduplication is paying a real, recurring premium — though the city has not publicly released per-agency storage spending figures.

For residents and researchers, the practical stakes run from the mundane to the genuinely significant. Journalists and historians pulling photographs from city archives through Sunshine Ordinance requests increasingly encounter inconsistent file quality and mislabeled duplicates. Community groups in the Mission District and Chinatown, which have partnered with the Public Library on neighborhood history digitization projects, have reported receiving duplicate image sets in response to data requests.

The Department of Technology is expected to release updated IT consolidation guidelines before the end of the third quarter of 2026. Whether those guidelines include binding deduplication standards for image libraries — rather than leaving the question to individual departments — may prove to be the most consequential single decision in this process.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.