San Francisco's Department of Technology logged its highest single-week volume of duplicate image removals since the city's digital asset consolidation program launched in March 2025, according to internal project tracking reviewed this week. The effort, running under the city's broader DataSF modernization initiative, targeted redundant photographs, scanned permit documents, and planning maps stored across at least four separate municipal content management systems.
The timing matters. City Hall has spent the past 18 months pushing departments to migrate legacy records onto a unified cloud platform ahead of a December 2026 state compliance deadline tied to California's Government Code section 6253 public records requirements. Duplicate images — some documents were scanned and uploaded three or four times across different agency portals — have slowed search response times and inflated vendor storage bills that residents ultimately pay for through the city's general fund.
Where the Backlog Built Up
The most acute duplication problems surfaced in two places: the San Francisco Planning Department's permit image library, which covers properties from the Sunset District to SoMa, and the Department of Public Works archive covering street infrastructure records along corridors including Van Ness Avenue and the Central Freeway replacement projects. Planning staff identified one permit file for a Haight-Ashbury mixed-use building that had been scanned and uploaded on seven separate occasions between 2019 and 2024, creating storage overhead and returning duplicate hits to residents using the city's online permit search tool.
The San Francisco Public Library's digitization unit at the main branch on Larkin Street is also involved. The library has been cross-referencing its Historic Photograph Collection — roughly 200,000 items covering San Francisco from the Gold Rush through the 1980s — against the city's central DataSF repository to flag images that exist in both systems at identical resolution. Librarians began that audit in early June.
Separately, the Office of Digital Services, based in the Civic Center complex, deployed an automated perceptual hashing tool this spring that compares pixel-level similarity across image batches rather than relying solely on file-name matching. File-name-based deduplication, which the city previously used, missed images that had been renamed or converted between formats — a common problem when scanned TIFFs were later saved as JPEGs for web publishing.
What the Numbers Show — and What Comes Next
DataSF's project dashboard, which is publicly accessible, showed the city's consolidated digital asset library held approximately 4.2 million image files as of June 30. Early-stage audits across three departments suggested that somewhere between 12 and 18 percent of stored images may qualify as duplicates, though that range has not been validated city-wide and the final figure will shift as more departments complete their reviews. Storage costs for the city's primary cloud vendor contract run on a per-gigabyte basis, and Planning Department staff noted in a March briefing document that image deduplication in their division alone was projected to reduce their annual cloud storage bill.
The work is not purely technical. Archivists at the main library branch have flagged that some apparent duplicates are actually distinct versions — a photograph of Market Street taken minutes apart during the 1906 earthquake aftermath, for instance, may look nearly identical to an automated tool but carries independent historical value. Human review protocols are built into the workflow for items flagged as culturally significant, a category the library defines using its own collection policy rather than an algorithm.
For residents and businesses waiting on permit records or trying to access historical images through SF OpenData, the practical payoff should be faster search results. The Planning Department expects its public-facing permit portal — which logged more than 40,000 unique searches in May alone, according to its published analytics — to return cleaner, deduplicated results by September. The Department of Technology has scheduled a public progress report for the DataSF advisory committee meeting on July 22 at City Hall's Room 400, where department leads are expected to present the first cross-agency deduplication metrics. Members of the public can attend or submit written comments through the DataSF website before that date.