The Daily San Francisco

San Francisco news, every day

News

SF's Digital Archive Push Reaches a Fork in the Road: What Happens Next With the City's Duplicate Image Problem

Years of overlapping digitization efforts have left San Francisco's public records repositories bloated with redundant files — and the decisions made this summer will shape how the city manages its visual history for decades.

By San Francisco News Desk · Published 4 July 2026, 12:12 pm

3 min read

SF's Digital Archive Push Reaches a Fork in the Road: What Happens Next With the City's Duplicate Image Problem
Photo: Photo by Karam Alani on Pexels

San Francisco's municipal digitization program has a problem hiding in plain sight. Thousands of duplicate images — redundant scans of the same photographs, planning documents, and historical records — are clogging city-managed storage systems, driving up costs and complicating public access to archives held by institutions from the San Francisco Public Library's History Center on Larkin Street to the Planning Department's Civic Center offices. Officials must now decide whether to pursue an automated deduplication overhaul, a manual curatorial review, or some hybrid approach — and the clock is running.

The timing matters for a specific reason. The city is approaching the end of a multi-year digitization contract cycle, with several technology service agreements tied to the broader San Francisco Digital Services initiative up for renewal or rebid before the end of fiscal year 2026-27. Those contract decisions will lock in the infrastructure — and the storage architecture — that either solves the duplicate image problem or embeds it deeper into the system. Letting the renewal window pass without addressing deduplication would mean paying to store, back up, and index the same files repeatedly for another contract term.

The problem is concentrated in a handful of departments. The San Francisco Recreation and Parks Department, which manages photographic records for over 220 parks and properties across the city, conducted an internal audit earlier this year and identified significant redundancy across its digital asset folders, according to city planning documents reviewed by The Daily San Francisco. The San Francisco Arts Commission's public art archive on McAllister Street faces a similar issue, with multiple scanning campaigns over the past decade producing near-identical image files with inconsistent metadata tags — making search and retrieval unreliable for researchers and city staff alike.

The Technical Fork: Automated Tools Versus Human Review

Two camps have emerged inside city government. One argues for deploying perceptual hashing software — tools that identify visually similar images even when file names and metadata differ — to flag duplicates automatically across the San Francisco Department of Technology's centralized storage environment. Vendors have pitched this approach as fast and cost-efficient; one proposal circulating internally suggests a system could process the city's backlog in under 90 days. The counterargument, pushed by archivists at the San Francisco Public Library and some staff at the California Historical Society on Jackson Street, is that automation risks flagging historically distinct images as duplicates simply because they look alike. Two photographs of, say, the Embarcadero taken one minute apart can tell very different stories.

The cost differential between approaches is not trivial. Cloud storage pricing for government entities has risen sharply since 2023, and maintaining duplicate files at scale carries a real budget line. Industry benchmarks from public-sector technology procurement reports suggest that unmanaged digital redundancy can inflate storage costs by 20 to 40 percent in large municipal archives — a range city budget analysts are reportedly using as a working estimate, though the city has not released its own official figure publicly.

Key Decisions Coming This Summer

Three decisions will define what happens next. First, the Department of Technology must finalize its vendor recommendation for the next-generation digital asset management platform by September 2026 — a deadline set in the current service agreement. Second, the Budget and Legislative Analyst's office is expected to publish a report this fall examining overall efficiency in city digitization spending, which will put numbers on the table that department heads cannot easily ignore. Third, the San Francisco Public Library Commission has a regularly scheduled policy review in October that is expected to include a discussion of standards for image deduplication across the library's digital collections.

Advocates for the city's archival community are watching carefully. The question isn't simply about storage efficiency — it's about what gets preserved, what gets deleted, and who makes that call. Getting the governance structure right, including clear sign-off protocols before any files are purged, will matter as much as the technology chosen. The decisions made in the next 90 days will be difficult to unwind later.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.