The Daily San Francisco

San Francisco news, every day

News

SF's Digital Archives Are Full of Duplicate Images — and Fixing That Problem Could Transform How the City Shares Its Own History

A quiet but consequential cleanup of redundant photo files across city databases is reshaping how San Franciscans access public records, neighborhood histories, and government services online.

By San Francisco News Desk · Published 4 July 2026, 12:06 pm

3 min read

SF's Digital Archives Are Full of Duplicate Images — and Fixing That Problem Could Transform How the City Shares Its Own History
Photo: Photo by Tom Fisk on Pexels

San Francisco's municipal digital infrastructure is carrying a weight most residents never think about: tens of thousands of duplicate images clogging government databases, slowing public-facing portals, and quietly degrading the accuracy of online records that everyday San Franciscans rely on for everything from permit searches to historical research. The city's Department of Technology has been working through a staged deduplication effort since early 2026, targeting redundant image files embedded across platforms including SF.gov, the Planning Department's online permit portal, and the city's open data repository at DataSF.

The timing matters. San Francisco has spent the past two years accelerating a push toward digital-first government services — a shift driven partly by pandemic-era necessity and partly by pressure from City Hall to cut costs as the city grappled with a projected budget shortfall exceeding $800 million for the fiscal year beginning July 1, 2025. Bloated file systems are not a trivial side effect of that push. They increase cloud storage costs, slow page-load times, and — most critically — create situations where different versions of the same image appear in different databases, some labeled differently, some outdated, leaving residents and city staff unsure which record is authoritative.

What Duplicate Images Actually Cost Communities

The problem shows up in concrete, locally felt ways. Residents searching property records through the San Francisco Assessor-Recorder's Office online system have long encountered mismatched parcel images — scanned documents uploaded multiple times under slightly different file names during system migrations. Community organizations in the Tenderloin and SoMa, where housing advocacy groups use public permit and inspection records to monitor building conditions, have flagged cases where duplicate inspection photos created confusion about whether a reported violation had been resolved or was still open.

The San Francisco Public Library's digital collections, managed through its San Francisco History Center at the main branch on Larkin Street, face a version of the same challenge. The History Center's digitization program has produced thousands of archival images of neighborhoods like the Fillmore, the Mission, and Chinatown — but duplicate uploads from multiple digitization rounds mean cataloguers spend significant time on cleanup rather than expanding the archive. The library system has not published a specific figure on staff hours lost, but the institutional pattern is common across public library systems that have gone through rapid digitization cycles.

For the city's homelessness response infrastructure, the issue is more than academic. The Department of Homelessness and Supportive Housing uses photo documentation inside its Coordinated Entry system to verify shelter bed availability and site conditions. When duplicate or mislabeled images appear in that system, case workers have reported uncertainty about whether a photo reflects current or past conditions at a given site — a functionally real problem at facilities like the Navigation Centers in the Mission District and on 13th Street.

What the Deduplication Push Means Going Forward

The Department of Technology's current effort uses automated hash-matching tools — software that assigns each image file a unique fingerprint and flags files sharing identical fingerprints for review — alongside manual audits for near-duplicate images that differ slightly due to compression or cropping. The city began piloting this approach in January 2026 on the Planning Department's online records system before expanding it to other platforms.

For residents, the practical payoff is faster, more reliable access to records they have a right to see. San Franciscans filing appeals with the Board of Appeals on Polk Street, checking permit histories for buildings in the Sunset or the Excelsior, or digging into neighborhood history through Digital SF collections should notice fewer dead links, fewer mismatched records, and faster search results as the cleanup proceeds through the second half of 2026.

City officials have indicated the deduplication effort will extend to the Recreation and Parks Department's internal photo databases and eventually to Muni's customer-facing service alert imagery. Residents who encounter broken or mismatched image records on any SF.gov platform can report them directly through the city's 311 service — online, by phone, or through the SF311 app — which routes complaints to the relevant department's digital services team.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.