The Daily San Francisco

San Francisco news, every day

News

SF City Departments Are Drowning in Duplicate Digital Files. Here's What Officials and Experts Are Saying About the Fix.

A growing consensus among city technologists, archivists, and housing advocates says San Francisco's image duplication problem is wasting public resources and slowing critical services.

By San Francisco News Desk · Published 4 July 2026, 11:51 am

3 min read

SF City Departments Are Drowning in Duplicate Digital Files. Here's What Officials and Experts Are Saying About the Fix.
Photo: Panama-Pacific International Exposition (1915 : San Francisco, Calif.) / Public domain (Wikimedia Commons)

San Francisco's municipal technology offices are sitting on a problem that sounds mundane but carries real cost: thousands of duplicate digital images clogging city databases, slowing permit processing, and eating up server storage that taxpayers fund. Officials from the Department of Technology and the Planning Department have flagged the issue this year as agencies accelerate their push to digitize records from the Tenderloin to the Bayview.

The timing matters. San Francisco is under a court-enforced housing production mandate and has committed to processing building permits faster across neighborhoods including the Outer Sunset and the Mission District. When planning staff search for site photos or parcel images and retrieve dozens of identical files, that slows the workflow. It is not a hypothetical delay — city IT staff and planning officials have described duplicate image buildup as a direct bottleneck in the permit queue backlog that has drawn scrutiny from the Board of Supervisors.

What Officials Are Saying

The San Francisco Department of Technology, which manages the city's central data infrastructure from its offices on Seventh Street, has been working with the Office of Digital Services on what it internally calls a data hygiene initiative. The effort targets redundant files across shared drives used by at least six city agencies, including the Department of Building Inspection and the Recreation and Parks Department. Officials have not yet released a public-facing report on the scope of duplication, but the initiative is listed on the department's fiscal year 2026 work plan, which is a public document.

Archivists at the San Francisco History Center, housed inside the Main Library on Larkin Street, have watched the problem from a different angle. Digital preservation specialists there have long argued that without a consistent deduplication protocol, agencies risk both bloat and data loss — the two failure modes happening simultaneously when staff manually delete files without checking whether a version exists elsewhere. The History Center has used a checksum-based verification system for its own collections since 2019, and technologists in the broader civic tech community have pointed to that program as a model worth scaling.

On the private side, firms operating out of the SOMA district that contract with city agencies on document management systems have begun pitching AI-assisted deduplication tools. These tools use perceptual hashing — a technique that identifies visually identical or near-identical images even when file names or metadata differ — to flag duplicates for human review before deletion. Pricing for enterprise-grade systems of this kind typically runs between $40,000 and $120,000 annually for a mid-sized government client, according to publicly available vendor pricing sheets from companies in the civic software space.

What the Data Shows — and What Comes Next

San Francisco's city government manages an estimated 47 terabytes of unstructured digital data across its core agencies, a figure cited in the Department of Technology's 2025 annual infrastructure report. Even a conservative estimate of 15 percent duplication — a figure consistent with benchmarks from comparable urban administrations — would represent more than seven terabytes of redundant storage, with associated costs in licensing, backup, and processing time.

The Board of Supervisors' Government Audit and Oversight Committee is scheduled to hold a hearing in August on digital records management. Advocates from OpenSF, a civic transparency group that monitors city data practices from offices in the Mission, have submitted public comment urging the committee to include deduplication standards in any updated citywide records retention policy.

For city residents filing permits for an accessory dwelling unit in the Excelsior or submitting documentation to the Planning Department's online portal, the practical advice from civic tech advocates is straightforward: upload files once, label them clearly, and avoid resubmitting the same images under different file names. That last step, minor as it sounds, compounds the problem at the database level. The city, for its part, is unlikely to solve the underlying infrastructure issue before the August hearing. But officials say the conversation is at least now happening in rooms where decisions get made.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.