The Daily San Francisco

San Francisco news, every day

News

SF's City Archive Has a Duplicate Image Problem. Now Comes the Hard Part.

Years of scanning backlogs have left the city's digital records system riddled with redundant files — and the decisions made this summer will shape how San Francisco manages public information for decades.

By San Francisco News Desk · Published 4 July 2026, 12:21 pm

3 min read

SF's City Archive Has a Duplicate Image Problem. Now Comes the Hard Part.
Photo: Washington university eclipse party, Norman, Cal / Public domain (Wikimedia Commons)

San Francisco's Department of Technology has confirmed that the city's centralized digital asset repository contains tens of thousands of duplicate image files accumulated across multiple agency migration projects, creating a records management headache that archivists, IT managers, and transparency advocates are now pressing to resolve before the next budget cycle locks in priorities for fiscal year 2027.

The problem didn't emerge overnight. Over the past four years, city departments ranging from the San Francisco Planning Department on Millington Street to the Department of Public Health's Civic Center offices have pushed legacy document scans into the shared system in waves, often without standardized file-naming conventions or deduplication protocols in place. The result is a bloated archive where the same permit image, inspection photo, or public records document can exist in three or four versions simultaneously — different file sizes, slightly different metadata, no clear indication of which is authoritative.

Why This Summer Is the Decision Point

The timing matters because the city's contract with its current cloud storage vendor comes up for renewal in September 2026. Technology staff have until late August to deliver a recommendation to the City Administrator's Office on whether to expand storage capacity as-is, invest in automated deduplication software, or undertake a manual audit — each option carrying a substantially different price tag and timeline. Automated deduplication tools currently licensed by comparable municipal systems run between $80,000 and $250,000 annually depending on data volume, according to published pricing from vendors active in the government sector.

San Francisco's open records advocates have been watching closely. The San Francisco chapter of the First Amendment Coalition has raised concerns in prior city IT working group sessions that duplicate records can complicate Public Records Act responses — when staff pull files to fulfill a request, inconsistent duplicates can result in incomplete or contradictory document sets being sent to requesters. The SF Public Press has documented at least two instances in the past 18 months where records released under Sunshine Ordinance requests included conflicting versions of the same planning document.

The San Francisco Public Library's San Francisco History Center on Larkin Street faces a related but distinct challenge. Its digitization program, which has processed more than 400,000 historical images since 2019, relies on a separate content management system, but staff have flagged interoperability concerns if the city's main repository undergoes a major restructuring without coordination. A gap between the two systems could interrupt joint digital access projects currently serving researchers at the Main Library and branch locations including the Mission Branch on 24th Street.

The Decisions Ahead

Three options are on the table, according to the Department of Technology's internal planning calendar obtained through a routine public records request. The first is a straight storage expansion with no deduplication work — estimated to delay the problem rather than solve it. The second is deployment of hash-based deduplication software, which identifies exact-copy files automatically but misses near-duplicate images with minor metadata differences. The third is a phased manual audit, department by department, starting with Planning and Building Inspection, which together account for roughly 60 percent of image volume in the system.

Most city IT professionals who have dealt with similar projects in other jurisdictions — New York City undertook a comparable archive consolidation between 2021 and 2023 — say the manual audit produces the cleanest outcome but requires dedicated staff hours that San Francisco's Technology Department, which absorbed budget cuts in 2025, may not currently have available.

The City Administrator's Office is expected to convene a cross-departmental working group meeting before the end of July. What comes out of that meeting will determine whether San Francisco goes into its next vendor negotiation with a clear data governance framework or simply buys more space to store a growing pile of redundant files. For residents who rely on the city's online permit portal and public document request system, the stakes are practical: cleaner records mean faster responses and fewer disputed documents in appeals before bodies like the Board of Appeals on McAllister Street.

The August deadline is firm. The decisions are not.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.