The Daily San Francisco

San Francisco news, every day

News

SF's Digital Archive Reckoning: The Key Decisions Ahead on Duplicate Image Replacement

City agencies and cultural institutions are being forced to confront a backlog of redundant digital assets—and the choices they make this summer will shape how San Francisco manages its visual record for years.

By San Francisco News Desk · Published 4 July 2026, 11:57 am

3 min read

SF's Digital Archive Reckoning: The Key Decisions Ahead on Duplicate Image Replacement
Photo: Photo by Tom Fisk on Pexels

San Francisco's public institutions are sitting on sprawling digital libraries riddled with duplicate images, and the window to fix it is narrowing. Several city departments, including the San Francisco Public Library's San Francisco History Center at Larkin Street and the Office of Digital Services housed at City Hall, face a converging set of deadlines this fall around storage contracts, budget cycles, and upgraded content management platforms that will force a decision: clean the archives now, or keep paying to store the same photograph dozens of times over.

This is not a minor housekeeping problem. Cloud storage costs for municipal governments have risen sharply since 2023, and institutions that put off deduplication work are increasingly paying for that delay in real dollars. The San Francisco Arts Commission, which maintains a permanent collection database spanning thousands of public art installations across neighborhoods from the Tenderloin to the Excelsior, has been piloting an AI-assisted image deduplication tool since January 2026 as part of a broader digital infrastructure review. A final report on that pilot is expected before the commission's September budget presentation.

Why the Timing Matters

The pressure is partly technical and partly fiscal. San Francisco's Department of Technology signed a multi-year cloud services agreement in late 2024 that includes tiered pricing based on storage volume. Under that structure, departments that reduce their stored data footprint before the agreement's first annual review—scheduled for October 2026—can lock in lower per-gigabyte rates for the following contract year. That creates a direct financial incentive to act before summer ends.

At the same time, the San Francisco Public Library is mid-migration on its digital collections platform, moving roughly 1.2 million digitized items to a new Contentful-based system. Librarians and archivists at the Main Branch on Larkin Street have flagged that migrating duplicate assets inflates both the timeline and the cost of the project. According to internal project documentation reviewed by this reporter, duplicate or near-duplicate images account for an estimated 15 to 22 percent of certain photographic collections—a figure that translates to thousands of hours of potential staff time spent manually reconciling records if automated tools are not deployed first.

The San Francisco Museum of Modern Art on Third Street, while a private institution, is navigating a related set of choices. SFMOMA's digital team has been evaluating perceptual hashing technology—a method that identifies visually similar images even when file names or metadata differ—as part of an upgrade to its collections management software. A switch to a new platform is scheduled for the first quarter of 2027, which means procurement decisions need to be finalized by November at the latest.

What Comes Next

The decisions ahead are less about whether to replace duplicate images and more about who owns the process and which tools get chosen. Three questions will define outcomes through the end of 2026.

First, will the Department of Technology issue centralized guidance for city agencies, or will each department procure deduplication solutions independently? A fragmented approach risks incompatibility between systems and higher aggregate costs. The department has been asked by the City Administrator's Office to deliver a recommendation by August 15.

Second, how will institutions handle cases where duplicates have accumulated conflicting metadata—different captions, dates, or rights information attached to what is functionally the same image? The San Francisco History Center at the Main Library on Larkin Street considers this its most labor-intensive challenge, because getting the metadata wrong could mean misdating or misattributing historically significant photographs of neighborhoods like the Fillmore or Chinatown.

Third, and most consequentially for the long term: what governance structure will prevent duplicates from piling up again? The Arts Commission's pilot included a draft policy requiring staff to run deduplication checks before any new batch upload, a procedural change that costs almost nothing but requires consistent enforcement.

None of these decisions are technically complex. All of them require someone to make a call before October. The budget clock is already running.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.