The Daily San Francisco

San Francisco news, every day

News

SF's Aging Digital Archive Faces a Crossroads: The Key Decisions Ahead on Duplicate Image Replacement

City agencies and cultural institutions are wrestling with how to overhaul sprawling photo libraries riddled with duplicate files — and the choices made this summer will shape public records access for years.

By San Francisco News Desk · Published 4 July 2026, 11:57 am

3 min read

SF's Aging Digital Archive Faces a Crossroads: The Key Decisions Ahead on Duplicate Image Replacement
Photo: Photo by Stephen Leonardi on Pexels

San Francisco's public institutions are sitting on a problem that has quietly ballooned alongside the city's tech ambitions: thousands of duplicate images lodged inside municipal and cultural digital archives, degrading search results, inflating storage costs, and slowing the public's ability to access accurate records. The question now is who pays to fix it, which technology gets the contract, and how fast the cleanup can realistically happen.

The issue has sharpened this year because several city departments — including the San Francisco Public Library's San Francisco History Center at Civic Center and the San Francisco Arts Commission — are in the middle of multi-year digitization pushes. When duplicate images pile up inside those systems, cataloguers waste hours manually weeding out near-identical scans, and members of the public pulling research records through the library's online portal can end up with redundant results that bury the file they actually need.

Why the Timing Matters

The broader context is a city that spent heavily on digitization grants after 2020 and is now reckoning with the maintenance burden those investments created. The San Francisco Public Library received federal Library Services and Technology Act funding to accelerate its digitization program, and the Arts Commission has been cataloguing public art installations across neighborhoods from the Tenderloin to Dogpatch. Neither institution built robust duplicate-detection workflows into those original project scopes, according to public procurement documents reviewed for this article.

Commercial deduplication tools now exist that can scan collections of several hundred thousand images and flag likely duplicates using perceptual hashing — a technique that catches near-matches even when file names, formats, or resolutions differ. Vendors pitching to Bay Area cultural institutions typically quote per-image processing rates or annual licensing fees that, for a mid-size archive of roughly 200,000 files, can run anywhere from $15,000 to upward of $80,000 depending on the depth of the audit and integration requirements. Those numbers are not trivial for departments already managing flat or reduced materials budgets in fiscal year 2025-26.

The decision isn't purely technical. Duplicate replacement raises preservation policy questions: when two near-identical scans of a 1906 earthquake photograph exist, which version gets designated the canonical record? The lower-resolution version might carry metadata the higher-resolution scan lacks. Delete the wrong file and that context is gone permanently. The San Francisco History Center, which holds one of the most-accessed municipal photograph collections on the West Coast, has flagged exactly this kind of metadata-loss risk in internal workflow reviews.

The Decisions Ahead

Three choices will define how this unfolds over the next six to twelve months. First, city procurement officials need to decide whether to run a unified RFP covering multiple departments or let each institution negotiate its own vendor contract — a fragmented approach that historically produces incompatible systems and higher per-unit costs. Second, institutions must settle on a governance standard for what counts as a true duplicate versus an acceptable variant, a policy question that requires input from archivists, not just IT staff. Third, and most practically, project managers need to determine whether deduplication runs happen on live collections — risking brief access outages for the public — or on mirrored copies, which requires additional server capacity the city may need to lease.

The Arts Commission is expected to bring a recommendation to its full commission meeting in September 2026. The Public Library's digital initiatives team has a planning deadline tied to the renewal of its current content management system contract, which sources familiar with the procurement calendar say falls in early 2027. That window is tight. Running a competitive bid process, completing vendor evaluation, and then executing even a partial deduplication of a large archive typically takes six to nine months when done properly.

For San Franciscans who use these archives — genealogists pulling Mission District building permits, journalists requesting historical photographs, students researching Chinatown's commercial history — the practical stakes are straightforward: better deduplication means cleaner search results and faster access. Getting the governance and procurement decisions right this summer is what makes that outcome possible.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.