San Francisco's Department of Technology has been working since January 2026 to purge tens of thousands of duplicate digital images clogging the city's centralized asset management system — a problem that, by the department's internal estimates, inflates cloud storage costs and slows permit processing times across at least a dozen city agencies. The cleanup effort, quietly folded into a broader $4.2 million data modernization contract awarded to a Mission District firm called Civic Data Solutions, marks one of the more ambitious municipal deduplication projects attempted on the West Coast.
The timing matters. San Francisco's Planning Department has been under pressure since Mayor Daniel Lurie took office to accelerate housing permit approvals, and staff have flagged that redundant image files — duplicate property photos, repeated scans of building inspection reports — were creating bottlenecks in the city's online permitting portal. When the same document image exists in a system four or five times under different filenames, staff searching records waste time verifying which version is authoritative. In a city trying to build its way out of a housing shortage, that friction carries a real price.
The San Francisco Public Library's digital archive team on Larkin Street ran into the same problem first. Librarians attempting to digitize the San Francisco History Center's photographic collection discovered that volunteer scanning sessions over several years had produced an estimated 18,000 duplicate image files across the archive's shared drives. The library partnered with the Internet Archive, whose offices sit on Funston Avenue in the Richmond District, to develop a hashing-based detection tool that could flag near-identical images even when file names and metadata differed. That tool is now being adapted for citywide use.
How San Francisco Compares
Other cities have grappled with the same issue, with mixed results. Amsterdam's municipal digital archive office completed a deduplication sweep of its urban planning image library in 2024, reducing stored files by roughly 34 percent over eight months, according to a case study published by the European Municipal Digital Alliance. Seoul's city government, managing records for a metro population of nearly 10 million, deployed an AI-assisted deduplication pipeline in 2023 that the city said cut storage overhead by $1.1 million annually. London's Government Digital Service has published open guidance on image deduplication for borough councils since 2022, though uptake has been uneven across the 32 boroughs.
San Francisco's approach differs in one notable way: rather than treating deduplication as a one-time purge, Civic Data Solutions is building a continuous detection layer directly into the city's content management workflows. Any image uploaded to the Planning Department's portal or SFMTA's maintenance database will be automatically checked against a hash index before being stored. The goal is prevention, not just remediation. SFMTA's photo database, which documents everything from Muni overhead wire inspections to Van Ness Bus Rapid Transit corridor maintenance, had accumulated more than 60,000 files by late 2025, with internal audits suggesting a duplication rate above 20 percent.
What Comes Next
The deduplication contract runs through December 2026, with a progress review scheduled for September at City Hall. If the pilot phases covering the Planning Department and SFMTA meet their benchmarks, the Department of Technology intends to extend the system to the Department of Building Inspection and the Recreation and Parks Department, whose Golden Gate Park operations office maintains its own sprawling photo archive of facility maintenance records.
For residents, the practical upside may eventually show up in faster permit turnaround times on the city's SF311 portal and in smoother responses to public records requests, which currently require staff to manually sort through redundant files. For city budget officials watching every line item, the storage savings alone — projected at roughly $280,000 annually once the system is fully deployed — are not trivial at a moment when San Francisco is closing a structural deficit that has strained departmental budgets for two consecutive fiscal years. Other cities, particularly those with similarly complex legacy database architectures, are paying close attention to whether San Francisco's continuous-detection model proves more durable than the periodic-purge strategies most municipal governments still rely on.