The Daily San Francisco

San Francisco news, every day

News

How San Francisco's Digital Archives Ended Up Buried in Duplicate Images — and What It's Costing the City

Decades of poor file management, rapid agency growth, and a patchwork of IT systems have left city departments drowning in redundant visual data, with no easy way out.

By San Francisco News Desk · Published 4 July 2026, 12:06 pm

3 min read

How San Francisco's Digital Archives Ended Up Buried in Duplicate Images — and What It's Costing the City
Photo: Photo by K on Pexels

San Francisco's city government is sitting on tens of thousands of duplicate image files spread across at least a dozen municipal departments, a problem that has compounded quietly for years and is now forcing an expensive, time-consuming reckoning. The Department of Technology — headquartered at 1 South Van Ness Avenue — confirmed earlier this year that a formal audit of shared city servers found significant redundancy in stored visual assets, a category that includes everything from planning photographs and public works documentation to promotional imagery used by agencies like SF Travel and the Recreation and Parks Department.

The duplication problem did not arrive overnight. It is the accumulated residue of decisions made, and not made, across two decades of city IT governance.

How the Files Multiplied

The story begins in the early 2000s, when individual departments began digitising paper records independently, often with no coordination from a central authority. The Mayor's Office of Housing and Community Development, for instance, maintained its own image directories separate from the Planning Department — even when both agencies were photographing the same Mission District development sites for overlapping compliance purposes. By the time the city's unified digital asset management initiative was proposed under the DataSF program in 2019, individual agencies had already built siloed storage habits that proved difficult to unwind.

The tech sector's boom-and-bust rhythm in San Francisco made things worse. When venture-backed startups flooded SoMa and the Financial District through the mid-2010s, city departments scrambled to hire contractors with varying IT standards. Files were migrated, duplicated in the process, and left on old servers that were never fully decommissioned. Then came the COVID-19 pandemic. The shift to remote work in March 2020 pushed staff to email and share files through personal cloud accounts and consumer platforms, creating a parallel universe of stored imagery that IT administrators are still attempting to map.

DataSF, the city's open data office, has flagged data hygiene as a priority since at least 2021, but image deduplication — technically distinct from structured database cleanup — received less attention than higher-profile projects involving homelessness service data and permit tracking. Storage costs, meanwhile, were not zero. Cloud storage contracted through state-negotiated agreements still runs the city measurable annual expenses per terabyte, and inflated storage footprints affect backup times, security audit scope, and the workload of archivists at the San Francisco History Center, which operates out of the Main Library on Larkin Street.

The Pressure to Act Now

Two forces are pushing the issue to the surface in 2026. First, the AI boom has made image libraries newly valuable. City departments exploring machine learning tools for planning review, infrastructure inspection, and public safety analysis need clean, deduplicated training datasets. Feeding a computer vision model redundant imagery degrades its performance and inflates the compute costs the city pays to cloud vendors. Second, Mayor Daniel Lurie's administration has signalled a tighter approach to municipal IT spending as part of broader efforts to identify budget efficiencies — a priority that has accelerated review of storage contracts that sailed through unexamined for years.

The Department of Technology has begun piloting deduplication software on server clusters maintained for the Public Works department, focusing initially on roughly 15 years of construction-site documentation. The Recreation and Parks Department, which manages imagery from more than 220 parks across the city, is expected to join a second phase of the project later this year.

For city staff, the practical advice is straightforward: departments should stop creating new shared drives outside the citywide SharePoint and approved cloud environments, and any image uploads to public-facing portals — including the city's planning map tools — should route through a single credentialing system that flags near-duplicates before saving. The Department of Technology is expected to release updated file management guidelines before the end of the third quarter of 2026.

The audit, the new guidelines, and the deduplication pilot together represent the most coordinated effort the city has made to address a problem that built up slowly, invisibly, and expensively across administrations dating back to the Newsom era. Getting out from under it will take longer than anyone in City Hall is likely to advertise.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.