The Daily San Francisco

San Francisco news, every day

News

How San Francisco's Digital Archives Ended Up Full of the Same Photo Twice — And Why It's Finally Getting Fixed

A sprawling, years-long problem with duplicate images clogging city agency databases and public-record systems has pushed planners, archivists, and tech contractors toward a reckoning.

By San Francisco News Desk · Published 4 July 2026, 12:23 pm

3 min read

How San Francisco's Digital Archives Ended Up Full of the Same Photo Twice — And Why It's Finally Getting Fixed
Photo: Photo by Gintare K. on Pexels

San Francisco's public agencies are sitting on a digital mess years in the making: thousands of duplicate images embedded in permit databases, planning department records, and public-facing portals, redundant files that eat storage, confuse staff, and slow down systems that residents depend on to track housing projects and city contracts. The problem, long dismissed as a low-priority nuisance, has grown expensive enough that the city's Department of Technology is now piloting an automated deduplication program targeting files stored across three major platforms — the Planning Department's online permit tracker, the city's open-data portal at data.sfgov.org, and the internal records management system used by the Department of Building Inspection.

The timing matters. San Francisco is in the middle of a state-mandated push to approve roughly 82,000 new housing units by 2031 under its Regional Housing Needs Allocation obligations. Every bottleneck in the permit pipeline — including clogged databases where inspectors or planners upload site photographs multiple times because systems fail to flag redundancies — adds friction to a process already under intense pressure from Sacramento.

How the Problem Built Up Over a Decade

The roots of the duplication crisis trace back to a series of platform migrations that began around 2014, when the city moved from a legacy permit-tracking system to the current Accela-based platform. Images uploaded during the transition were often copied rather than transferred, and the problem compounded every time the Department of Building Inspection or the Planning Department — headquartered on Mission Street near Seventh — onboarded new staff who weren't trained in proper file protocols. Field inspectors working the Tenderloin, SoMa, and the outer Sunset routinely uploaded photos of the same property multiple times when network connections dropped mid-upload, not knowing the first attempt had partially succeeded.

By 2022, an internal audit — described in a Department of Technology budget presentation to the Board of Supervisors that year — found that image files accounted for a disproportionate share of storage costs across city systems, though the city has not publicly released the specific dollar figure tied to duplicate image storage alone. The San Francisco Controller's Office has noted in broader technology infrastructure reviews that redundant data management is among the recurring inefficiencies identified across departments.

The problem wasn't unique to permit systems. The San Francisco Public Library's digital collections portal, hosted through a vendor partnership finalized in 2019, accumulated duplicate scans of historical photographs from the San Francisco History Center at the main branch on Larkin Street. Librarians there have spent portions of their archival budget — funded partly through the Friends of the San Francisco Public Library — on manual cleanup work that technologists say could be largely automated.

What the Fix Actually Looks Like

The Department of Technology's current pilot runs deduplication algorithms against image libraries using perceptual hashing, a method that catches near-identical photos even when file names or metadata differ. The approach is well-established in commercial cloud platforms and has been deployed by peer cities including New York and Chicago in their open-data infrastructure overhauls. San Francisco's pilot, which launched in the first quarter of 2026, focuses initially on planning and building inspection records from 2018 onward — the years with the highest volume of uploads and the densest overlap.

For residents, the practical payoff is a permit portal that loads faster and returns cleaner search results when they look up a property address. For city staff, it means fewer storage overruns and less manual triage. Housing advocates who regularly use the Planning Department's online tools to track development projects in the Mission or Chinatown have long complained that redundant images make property histories harder to parse.

The Department of Technology has indicated it plans to expand the deduplication effort to additional agency systems in the second half of 2026, with a full after-action report expected before the fiscal year closes in June 2027. For now, the work is unglamorous — a bureaucratic cleanup of a problem that accumulated quietly across thousands of ordinary transactions — but in a city trying to build its way out of a housing shortage, every system that works cleanly instead of sluggishly counts.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.