San Francisco's Department of Technology moved this week to accelerate a long-stalled cleanup of duplicate digital images embedded across city agency databases, a project that officials say has grown more urgent as storage costs have climbed and public-records requests backed up at the City Hall Clerk's office on Dr. Carlton B. Goodlett Place. The push, which involves at least four departments, marks the most concentrated effort to address the problem since a 2023 audit flagged redundant files as a drain on the city's centralized data infrastructure.
The timing is not accidental. San Francisco's IT modernization program has been under pressure since the Board of Supervisors began scrutinizing the city's technology budget earlier this year, and duplicate image files represent a category of waste that is both measurable and fixable. Storage for city systems is provisioned through the Department of Technology's Civic Bridge platform, and redundant image assets have been identified as one of the top contributors to unnecessary cloud expenditure across departments including Planning, the Assessor-Recorder, and the Municipal Transportation Agency.
Why Duplicate Images Became a Bureaucratic Headache
The problem is partly a legacy of rapid digitization. When the city's Planning Department on Stevenson Street began scanning paper permit files in bulk after 2015, the workflow software frequently saved multiple versions of the same document image without flagging conflicts. The Assessor-Recorder's Office, which manages property records for all 47 square miles of San Francisco, encountered a similar issue when it migrated to a new records management system in 2021. By the time internal reviews caught up, thousands of property and permit photographs had been saved two, three, or more times under different file names and record numbers.
That redundancy carries a price. City IT procurement documents reviewed by public-records researchers show that cloud storage contracts for government data have roughly doubled in per-gigabyte cost since 2019, a trend driven partly by vendor consolidation and partly by the sheer volume of image files generated by permit, inspection, and infrastructure systems. Duplicate files that serve no archival purpose consume the same storage budget as unique records with genuine public value.
The San Francisco Public Library's digitization program at the main branch on Larkin Street has also been quietly dealing with the issue. The library's digital collections team spent much of the past 18 months deduplicating historical photograph archives, using automated hashing tools to identify identical or near-identical image pairs before flagging them for human review. That process, librarians told colleagues at a regional digital archives meeting in May, eliminated roughly 14,000 redundant image files from one collection alone.
What the Cleanup Looks Like in Practice
The current city-wide effort uses a combination of perceptual hashing — a technique that identifies visually similar images even when file names differ — and metadata cross-referencing to surface duplicates for review. Staff at the Department of Technology are coordinating with counterparts at the Planning Department and the MTA's transit data division, which maintains a large archive of infrastructure inspection photographs from Muni Metro stations and bus depots across the city.
Residents and businesses that have filed public-records requests recently may notice that responses referencing permit or inspection images take slightly longer than usual while agency staff reconcile which image version is authoritative. The City Clerk's office has said response timelines for California Public Records Act requests should not exceed the standard 10-day acknowledgment period, though complex requests involving document images may still take longer during the cleanup phase.
For small businesses navigating permit renewals in neighborhoods like the Mission District and SoMa, the practical upside of a cleaned database is faster retrieval when inspectors or planners pull historical records. Fewer duplicate entries also reduce the risk of a records search returning conflicting photographs of the same property at different points in time — a problem that has complicated at least several variance applications in recent years.
The Department of Technology has set an internal target to complete the initial deduplication sweep across priority databases by the end of September 2026. Departments are expected to adopt updated image-management protocols by the first quarter of 2027 to prevent the backlog from rebuilding. Anyone with questions about how specific records may be affected can contact the City Clerk's office directly or submit an inquiry through the SF311 service portal.