San Francisco's Department of Technology confirmed this week that a long-running project to consolidate the city's digital document archives has been disrupted by a significant duplicate image problem — one that officials say affects tens of thousands of scanned files stored across at least three separate municipal platforms.
The timing is awkward. The city has spent the better part of two years pushing agencies to migrate paper-era records into a centralized system, part of a broader open-government initiative that the Mayor's Office of Civic Innovation has championed since early 2024. The duplicate image issue — where the same scanned permit, planning map, or public health form was uploaded multiple times under different file names — is now slowing retrieval speeds and inflating storage costs on the city's cloud infrastructure contracts.
What Happened This Week
Technicians working with the San Francisco Planning Department's digital records unit identified the core problem on Monday, July 1, when routine quality checks flagged an error rate that sources familiar with the project described as substantially higher than the acceptable threshold. The Planning Department's archive, which covers building permits and environmental impact documents for neighborhoods from the Tenderloin to the Excelsior, reportedly contains duplicate image files running back to at least 2019, when the initial digitization push began under a federal grant program.
The San Francisco Public Library's San Francisco History Center at Larkin Street, which maintains a separate but linked digital collection of civic records, has also been drawn into the review. Library staff were notified mid-week that a subset of images cross-uploaded to the shared portal would be temporarily unavailable while technicians run automated deduplication scripts. The outage affects a portion of the publicly searchable catalog but does not impact the physical archive itself.
The problem is not unique to San Francisco. Cities that accelerated digitization during and after the pandemic years — under pressure from state mandates and federal infrastructure grants — frequently encountered the same issue: multiple departments scanning the same documents independently, without a unified naming protocol or hash-verification system to catch redundant uploads before they compound. But for a city that has marketed itself as a global technology hub, the optics are uncomfortable, particularly as the tech sector here has pivoted hard toward AI-driven workflow automation in 2025 and 2026.
Storage Costs and the Fix
Cloud storage is not cheap at municipal scale. The city's current contract with its primary cloud infrastructure vendor — details of which are publicly available through the Controller's Office procurement portal — runs into the tens of millions of dollars annually when accounting for all departments. Duplicate image files, especially large-format scans of architectural drawings and environmental maps, consume disproportionate storage volume. A single set of high-resolution permit drawings for a mid-rise project on, say, Folsom Street in SoMa can run to several gigabytes; duplicate copies of the same file multiply that cost without adding any informational value.
The Department of Technology has engaged a contractor to run a phased deduplication process. Phase one, expected to wrap by July 18, targets the Planning Department's permit archive — the largest single repository affected. Phase two will address shared files between Planning and the Department of Building Inspection, whose offices at 49 South Van Ness Avenue maintain parallel digital records. A third phase covering the Public Library's cross-linked holdings has no confirmed start date as of Friday.
For members of the public who use the city's online portal to pull historical permits or planning records — a process heavily relied upon by contractors, real estate attorneys, and neighborhood advocacy groups in places like the Mission District and the Richmond — some searches may return incomplete results or temporary error messages through at least mid-July. The Department of Technology's service desk is advising users who need urgent documents to submit direct records requests by email, which staff are processing manually in the interim.
City officials have not publicly stated whether the deduplication work will require any budget amendment or draw on contingency funds. The Controller's Office did not respond to a request for comment before publication. The fuller accounting of the project's cost impact is expected to surface in the department's next quarterly report, due in late August.