San Francisco's city government is sitting on tens of thousands of duplicate image files spread across at least a dozen municipal departments, a problem that has compounded quietly for years and is now forcing an expensive, time-consuming reckoning. The Department of Technology — headquartered at 1 South Van Ness Avenue — confirmed earlier this year that a formal audit of shared city servers found significant redundancy in stored visual assets, a category that includes everything from planning photographs and public works documentation to promotional imagery used by agencies like SF Travel and the Recreation and Parks Department.
The duplication problem did not arrive overnight. It is the accumulated residue of decisions made, and not made, across two decades of city IT governance.
How the Files Multiplied
The story begins in the early 2000s, when individual departments began digitising paper records independently, often with no coordination from a central authority. The Mayor's Office of Housing and Community Development, for instance, maintained its own image directories separate from the Planning Department — even when both agencies were photographing the same Mission District development sites for overlapping compliance purposes. By the time the city's unified digital asset management initiative was proposed under the DataSF program in 2019, individual agencies had already built siloed storage habits that proved difficult to unwind.
The tech sector's boom-and-bust rhythm in San Francisco made things worse. When venture-backed startups flooded SoMa and the Financial District through the mid-2010s, city departments scrambled to hire contractors with varying IT standards. Files were migrated, duplicated in the process, and left on old servers that were never fully decommissioned. Then came the COVID-19 pandemic. The shift to remote work in March 2020 pushed staff to email and share files through personal cloud accounts and consumer platforms, creating a parallel universe of stored imagery that IT administrators are still attempting to map.
DataSF, the city's open data office, has flagged data hygiene as a priority since at least 2021, but image deduplication — technically distinct from structured database cleanup — received less attention than higher-profile projects involving homelessness service data and permit tracking. Storage costs, meanwhile, were not zero. Cloud storage contracted through state-negotiated agreements still runs the city measurable annual expenses per terabyte, and inflated storage footprints affect backup times, security audit scope, and the workload of archivists at the San Francisco History Center, which operates out of the Main Library on Larkin Street.
The Pressure to Act Now
Two forces are pushing the issue to the surface in 2026. First, the AI boom has made image libraries newly valuable. City departments exploring machine learning tools for planning review, infrastructure inspection, and public safety analysis need clean, deduplicated training datasets. Feeding a computer vision model redundant imagery degrades its performance and inflates the compute costs the city pays to cloud vendors. Second, Mayor Daniel Lurie's administration has signalled a tighter approach to municipal IT spending as part of broader efforts to identify budget efficiencies — a priority that has accelerated review of storage contracts that sailed through unexamined for years.
The Department of Technology has begun piloting deduplication software on server clusters maintained for the Public Works department, focusing initially on roughly 15 years of construction-site documentation. The Recreation and Parks Department, which manages imagery from more than 220 parks across the city, is expected to join a second phase of the project later this year.
For city staff, the practical advice is straightforward: departments should stop creating new shared drives outside the citywide SharePoint and approved cloud environments, and any image uploads to public-facing portals — including the city's planning map tools — should route through a single credentialing system that flags near-duplicates before saving. The Department of Technology is expected to release updated file management guidelines before the end of the third quarter of 2026.
The audit, the new guidelines, and the deduplication pilot together represent the most coordinated effort the city has made to address a problem that built up slowly, invisibly, and expensively across administrations dating back to the Newsom era. Getting out from under it will take longer than anyone in City Hall is likely to advertise.