The Daily San Francisco

San Francisco news, every day

News

SF's Digital Records Push Sparks Debate Over Duplicate Image Cleanup: What Officials and Experts Are Saying

As city agencies race to digitize decades of paper files, a less glamorous but costly problem has surfaced — thousands of duplicate scanned images clogging government databases and inflating storage budgets.

By San Francisco News Desk · Published 4 July 2026, 11:57 am

3 min read

SF's Digital Records Push Sparks Debate Over Duplicate Image Cleanup: What Officials and Experts Are Saying
Photo: Photo by Tom Fisk on Pexels

San Francisco's Department of Technology has quietly become ground zero for a citywide argument over how to handle duplicate images embedded in municipal digital records — a problem officials say is costing the city real money and slowing down public-facing services from permit processing at the Planning Department on Mission Street to case management at the Department of Public Health.

The issue gained traction this spring after an internal audit flagged redundant image files across several city databases, including those maintained by the Assessor-Recorder's Office, which has been digitizing property documents dating back to the 1970s. When scanning equipment ingests old files in batches, identical or near-identical image files frequently get saved multiple times — sometimes dozens of times — without any automated filter catching the duplication before storage.

Why It Matters Right Now

The timing is not accidental. San Francisco City Hall committed in its fiscal year 2025–2026 budget to accelerating a digital-first records initiative, directing several departments to reduce physical file storage and migrate to cloud-based platforms. That migration has exposed just how bloated some of those digital archives already are. Cloud storage costs money — and duplicated images multiply that cost with no corresponding public benefit.

Experts in government technology say San Francisco is far from alone. Municipalities across the country have run into the same trap: digitization programs move fast, quality-control protocols move slow. But the scale here is notable. The city operates more than 50 departments, and several of them — including the San Francisco Fire Department and the Office of the City Attorney — maintain their own independent records systems that were not designed to talk to each other, let alone flag redundant files.

The San Francisco Public Library's digital collections team at the Civic Center branch has dealt with this problem in a different context for years, using open-source deduplication tools to manage its historical photograph archive. Librarians there have described the process as labor-intensive even with automation, requiring human review to distinguish true duplicates from images that are similar but meaningfully different — a distinction that matters enormously in legal and property records.

What Officials and Technologists Are Recommending

Within City Hall, the conversation has split roughly into two camps. One group, centered around the Department of Technology's infrastructure division, is pushing for an enterprise-level deduplication layer that would sit across all city storage systems and run continuously. The other camp, which includes voices from the City Administrator's Office, argues for a slower, department-by-department audit before any automated system is deployed — citing the risk that aggressive deduplication algorithms could delete images that only appear identical but carry different metadata or are linked to different legal records.

Independent technologists familiar with public-sector digitization projects point to hash-based deduplication as the most reliable current method. The approach generates a unique digital fingerprint for each image file; if two files share the same fingerprint, they are mathematically identical and one can safely be removed. The catch is implementation cost and the need for staff training. Licensing and deployment of enterprise deduplication software for an organization the size of San Francisco city government can run well into six figures annually.

The Assessor-Recorder's Office, which processes roughly 80,000 real estate document recordings per year according to figures it has published on its public dashboard, has already begun a pilot using automated tools to flag suspected duplicates before routing them for human review. That pilot, launched in March 2026, is being watched by other departments as a potential model.

For residents and small business owners navigating city permitting — particularly those dealing with the Planning Department's backlog along the Embarcadero corridor and in the Tenderloin — the practical stakes are real. Duplicate records can cause systems to return conflicting file versions, slowing down permit reviews and triggering additional staff hours to reconcile discrepancies.

The Department of Technology has signaled it will present findings from a broader assessment to the Board of Supervisors' Government Audit and Oversight Committee before the end of the third quarter of 2026. Until a citywide policy is in place, individual departments are being advised to freeze large-scale batch uploads and conduct manual spot-checks on any archive added since January 2025.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.