San Francisco's Department of Technology has been running a low-profile but technically ambitious program since early 2025 to identify and replace duplicate images embedded in public-facing city databases — property records, permitting portals, and the open data repository at DataSF. The effort, which the department has been piloting with machine-learning tools procured through a contract reviewed by the Board of Supervisors last October, is now being studied by civic technology offices in London and Singapore as a potential model.
The timing is not accidental. Cities worldwide are drowning in redundant digital assets accumulated over two decades of digitisation pushes. Duplicate imagery bloats storage costs, slows query times on public portals, and — in the context of property records and planning applications — can create genuine legal ambiguities when two slightly different versions of a site photograph exist in the same file. For a city like San Francisco, where a single Mission District building permit can carry dozens of attached images spanning multiple revision cycles, the problem compounds fast.
What San Francisco Is Actually Doing
The city's current approach centres on a deduplication pipeline built on top of its existing Salesforce-based permitting infrastructure and connected to the Planning Department's Accela system. Images flagged as duplicates — identified by perceptual hash comparison rather than pixel-perfect matching, which allows for minor compression differences — are not deleted outright. Instead, they are archived to cold storage at the city's Pier 39-adjacent data facility and replaced with a canonical version tagged with a provenance timestamp. The Department of Technology declined to provide an on-record spokesperson for this story, but the program's framework was outlined in a memo published to the Board of Supervisors' Land Use and Transportation Committee docket in November 2025.
The San Francisco Public Library's digital collections team at the Civic Center branch has run a parallel, smaller-scale version of this work since 2023, manually reviewing roughly 4,200 images in the San Francisco Historical Photograph Collection for duplicates introduced during a 2019 scanning project. Librarians there have replaced or consolidated about 900 of those files, according to figures presented at a March 2026 California Library Association regional meeting.
Costs are real. Cloud cold storage for archived duplicates currently runs the city approximately $0.004 per gigabyte per month under its Google Cloud agreement, a figure that sounds trivial until you account for the Planning Department's archive, which the November 2025 memo estimated at 14 terabytes and growing at roughly 800 gigabytes per quarter.
How This Compares Globally
London's Government Digital Service began a similar deduplication effort across the UK's national planning portal in late 2024, though that program operates at a scale that makes direct comparison difficult — the UK portal processes applications from more than 300 local authorities. Singapore's Urban Redevelopment Authority has gone furthest, integrating duplicate-image detection directly into its CorpPass submission gateway so that duplicates are rejected at the point of upload rather than cleaned up after the fact. That upstream approach, which Singapore implemented in phases between 2023 and 2025, has reportedly cut post-submission image review time significantly, though the URA has not published precise figures publicly.
Tokyo's approach is more fragmented. Individual ward offices manage their own digitised records, and there is no citywide deduplication standard, which civic technology researchers at Keio University have flagged as an emerging governance gap in papers circulating at international urban data conferences this year.
Amsterdam's city archive, the Stadsarchief, completed a full deduplication sweep of its public photograph holdings in 2024 using open-source tooling, and has shared its methodology documentation under a Creative Commons licence — documentation that DataSF staff have reportedly referenced, though again no city official made that claim on the record for this story.
For residents and businesses in San Francisco, the practical near-term effect is modest but concrete: the Planning Department's public portal at sfplanning.org is scheduled to move to its rebuilt image-management backend by the fourth quarter of 2026, which the department projects will reduce average document load times on property search pages. Anyone filing permits through the Permit Center at 49 South Van Ness Avenue can expect the new canonical-image tagging to appear in their application file confirmations starting with that rollout. The city's open data team at DataSF has said it plans to publish a methodology report before the end of the year — which would give other cities, and the public, a clearer look at exactly how the sausage gets made.