San Francisco city administrators moved this week to accelerate a long-delayed cleanup of duplicate images embedded in public-facing databases and internal records systems, with the Department of Technology flagging the issue as a direct drag on server costs and data retrieval speeds across multiple agencies.
The push matters now because the city has been consolidating legacy IT infrastructure under a broader digital modernization drive that began gaining momentum in early 2025. As agencies migrated older file systems onto shared cloud architecture, duplicate image files — some dating back to permit applications filed more than a decade ago at 1 Dr. Carlton B. Goodlett Place — began multiplying across shared directories, inflating storage bills and slowing query times for staff handling everything from building permits to public health records.
Where the Problem Has Been Worst
Planning Department staff identified the duplication issue as particularly acute in the city's online permit portal, which serves contractors and property owners filing applications for projects across neighborhoods from the Tenderloin to the Outer Sunset. Images uploaded to support permit applications — site photos, engineering diagrams, elevation drawings — were being saved in multiple formats to redundant folders rather than being linked to a single master record. The result was thousands of near-identical files occupying server space without any corresponding benefit to end users.
The San Francisco Public Library's digital collections division, headquartered at the Larkin Street main branch, reported a parallel problem in its archival digitization program. Photographs scanned from the San Francisco History Center's physical collection were in some cases stored three or four times over in different resolution variants, with no automated deduplication process in place to flag and consolidate them. Librarians working on the project said the backlog had grown steadily since a digitization grant period ended in late 2024.
At the Department of Public Health, which manages image assets tied to clinical documentation and public communications campaigns, internal reviews this spring identified more than 40,000 image files flagged as potential duplicates in a single shared drive environment. The exact cost impact is still being assessed, but cloud storage pricing for the city's primary vendor contract — renewed in January 2026 — is structured so that redundant data directly increases the monthly bill.
What This Week's Effort Involves
The Department of Technology deployed automated deduplication scripts across three pilot agency environments beginning July 1, using hash-based comparison tools that match files by content rather than file name. The approach catches duplicates even when images have been renamed or reformatted, a common problem when staff download and re-upload assets from personal devices or email attachments.
The Civic Innovation Office at City Hall is coordinating with vendors to establish a city-wide digital asset management policy, something administrators say has been discussed since at least 2022 but never formally adopted. A draft policy circulated this spring would require all departments to route image uploads through a centralized repository with deduplication enabled by default before files are saved to any shared drive.
For San Francisco residents, the practical effect is likely to be faster load times on city-facing web portals, including the SF311 service request system and the Planning Department's public-facing permit tracker on the city's main sfgov.org domain. Both have drawn complaints about sluggish performance during peak usage hours.
The deduplication pilot is scheduled to run through August 31, after which the Department of Technology will report findings to the Board of Supervisors' Government Audit and Oversight Committee. If the pilot clears measurable benchmarks — including a targeted reduction in redundant storage consumption — the rollout is expected to expand to all 53 city departments before the end of the 2026 fiscal year. Agencies that have not yet audited their image libraries are being asked to complete a self-assessment survey by July 18.