San Francisco's municipal technology offices are sitting on a problem that sounds mundane but carries real cost: thousands of duplicate digital images clogging city databases, slowing permit processing, and eating up server storage that taxpayers fund. Officials from the Department of Technology and the Planning Department have flagged the issue this year as agencies accelerate their push to digitize records from the Tenderloin to the Bayview.
The timing matters. San Francisco is under a court-enforced housing production mandate and has committed to processing building permits faster across neighborhoods including the Outer Sunset and the Mission District. When planning staff search for site photos or parcel images and retrieve dozens of identical files, that slows the workflow. It is not a hypothetical delay — city IT staff and planning officials have described duplicate image buildup as a direct bottleneck in the permit queue backlog that has drawn scrutiny from the Board of Supervisors.
What Officials Are Saying
The San Francisco Department of Technology, which manages the city's central data infrastructure from its offices on Seventh Street, has been working with the Office of Digital Services on what it internally calls a data hygiene initiative. The effort targets redundant files across shared drives used by at least six city agencies, including the Department of Building Inspection and the Recreation and Parks Department. Officials have not yet released a public-facing report on the scope of duplication, but the initiative is listed on the department's fiscal year 2026 work plan, which is a public document.
Archivists at the San Francisco History Center, housed inside the Main Library on Larkin Street, have watched the problem from a different angle. Digital preservation specialists there have long argued that without a consistent deduplication protocol, agencies risk both bloat and data loss — the two failure modes happening simultaneously when staff manually delete files without checking whether a version exists elsewhere. The History Center has used a checksum-based verification system for its own collections since 2019, and technologists in the broader civic tech community have pointed to that program as a model worth scaling.
On the private side, firms operating out of the SOMA district that contract with city agencies on document management systems have begun pitching AI-assisted deduplication tools. These tools use perceptual hashing — a technique that identifies visually identical or near-identical images even when file names or metadata differ — to flag duplicates for human review before deletion. Pricing for enterprise-grade systems of this kind typically runs between $40,000 and $120,000 annually for a mid-sized government client, according to publicly available vendor pricing sheets from companies in the civic software space.
What the Data Shows — and What Comes Next
San Francisco's city government manages an estimated 47 terabytes of unstructured digital data across its core agencies, a figure cited in the Department of Technology's 2025 annual infrastructure report. Even a conservative estimate of 15 percent duplication — a figure consistent with benchmarks from comparable urban administrations — would represent more than seven terabytes of redundant storage, with associated costs in licensing, backup, and processing time.
The Board of Supervisors' Government Audit and Oversight Committee is scheduled to hold a hearing in August on digital records management. Advocates from OpenSF, a civic transparency group that monitors city data practices from offices in the Mission, have submitted public comment urging the committee to include deduplication standards in any updated citywide records retention policy.
For city residents filing permits for an accessory dwelling unit in the Excelsior or submitting documentation to the Planning Department's online portal, the practical advice from civic tech advocates is straightforward: upload files once, label them clearly, and avoid resubmitting the same images under different file names. That last step, minor as it sounds, compounds the problem at the database level. The city, for its part, is unlikely to solve the underlying infrastructure issue before the August hearing. But officials say the conversation is at least now happening in rooms where decisions get made.