The Daily San Francisco

San Francisco news, every day

News

SF's Digital Records Push Hits a Snag: Duplicate Images Are Clogging City Databases, Officials Say

From the Planning Department to the Public Library, San Francisco agencies are grappling with ballooning storage costs and data integrity problems caused by redundant image files — and experts say the fix is overdue.

By San Francisco News Desk · Published 4 July 2026, 11:57 am

3 min read

SF's Digital Records Push Hits a Snag: Duplicate Images Are Clogging City Databases, Officials Say
Photo: Photo by Vlada Karpovich on Pexels

San Francisco's multi-year push to digitize public records has produced an unintended mess: thousands of duplicate image files sitting inside city databases, inflating storage costs and slowing down the very systems designed to make government more efficient. The problem, which officials across several departments have been quietly addressing since early 2026, is now drawing sharper scrutiny as the city looks to cut administrative overhead amid a budget deficit projected to exceed $800 million over the next two fiscal years.

The timing is pointed. Mayor Daniel Lurie, who took office in January after defeating London Breed, has made operational efficiency a centerpiece of his early agenda. His administration's Department of Technology — headquartered on Seventh Street in SoMa — is conducting a citywide audit of digital asset management practices, and the duplicate-image problem keeps surfacing as a recurring, fixable drain on resources.

What Officials and Experts Are Saying

The San Francisco Planning Department, which maintains tens of thousands of scanned permit documents, site photos, and environmental review images for properties across neighborhoods from the Sunset District to the Tenderloin, is among the agencies most affected. Staff there have described a workflow in which the same image — say, a facade photograph submitted with a building permit application — can be uploaded multiple times by different staff members or automated ingestion pipelines, with no deduplication check triggering at the back end. The department declined to provide a specific figure for how many duplicate files currently exist, but acknowledged the audit is ongoing.

Technology consultants who work with municipal governments say the problem is neither rare nor especially complex to solve, but that cities frequently delay action because the data sits in legacy systems that are expensive to migrate. A perennial challenge is that different departments — Planning, Public Works, the Recreation and Park Department — often use different document management platforms that don't communicate with one another, so a single image of, say, Dolores Park can exist in three separate silos simultaneously.

The San Francisco Public Library's digitization program, which has been scanning historical photographs of the city's neighborhoods through its San Francisco History Center on Larkin Street, has dealt with similar redundancy issues. Librarians there have used manual review processes and open-source hashing tools to flag likely duplicates, a labor-intensive approach that specialists say is adequate for archival work but doesn't scale to the volume of operational records flowing through Planning or the Department of Building Inspection.

The Cost Question and What Comes Next

Cloud storage is cheap in absolute terms — roughly $0.02 per gigabyte per month on major platforms — but agencies dealing with hundreds of thousands of high-resolution image files can accumulate meaningful costs, particularly when those files are redundantly backed up across multiple tiers. Beyond storage, the bigger problem is search and retrieval: staff trying to locate the correct version of a document lose time winnowing through duplicates, and in departments where permit timelines are already under political pressure, that friction has consequences for applicants waiting on approvals in neighborhoods like the Mission and Western Addition.

Digital records specialists contacted for background say the most effective interventions combine automated perceptual hashing — a technique that identifies visually identical or near-identical images even if their file names or metadata differ — with clear governance rules about who can upload what and where. Several California counties have adopted such systems in the past three years, and the City and County of San Francisco's own Department of Technology has the authority to mandate standards across agencies, though it has historically moved cautiously on cross-departmental mandates.

The Lurie administration's technology audit is expected to produce preliminary findings before the end of summer 2026. Agencies will then have a window to propose remediation plans. For residents and businesses waiting on permits, environmental reviews, or records requests, the practical upshot is that cleaner databases should eventually mean faster responses — though the timeline for any visible improvement depends entirely on how quickly individual departments act on whatever the audit recommends.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.