San Francisco's city government is sitting on a sprawling mess of duplicate digital images — scanned permits, planning documents, infrastructure photographs, and public records — that has accumulated across multiple departments over nearly two decades, costing the city measurable money and grinding some routine administrative functions to a halt.
The problem is neither new nor glamorous, but it has reached a tipping point. The Department of Technology's fiscal year 2025–2026 budget, approved by the Board of Supervisors last year, earmarked funds specifically for data deduplication initiatives across shared municipal servers — an acknowledgment, however quiet, that the city's digital housekeeping has fallen dangerously behind. Storage costs for city agencies have climbed steadily since the early 2010s, when San Francisco accelerated its push to digitize paper records held in the Civic Center complex and satellite offices from the Mission District to the Bayview.
A Problem That Built Slowly, Then All at Once
The roots of the duplicate image crisis trace back to a structural decision made around 2008, when individual departments were given broad autonomy to manage their own document scanning and archiving. The Planning Department on Mission Street, the Department of Building Inspection on Duboce, and the Office of the Assessor-Recorder at City Hall each developed parallel workflows. There was no central deduplication protocol, no unified naming convention, and no mechanism to flag when the same physical document had been scanned multiple times by different offices handling overlapping cases.
By the time the city's current enterprise content management contract came up for review in 2023, auditors examining storage utilization found that some shared network directories held three or more copies of identical image files. The problem compounded as departments migrated between platforms — from older Documentum installations to newer cloud-adjacent systems — each migration creating fresh opportunities for files to be duplicated rather than consolidated.
The San Francisco Public Library's digital archive team at the Main Branch on Larkin Street encountered a version of the same problem when it attempted to integrate historical photograph collections into the shared municipal system, discovering that city departments had independently scanned some of the same 20th-century civic records the library already held in high resolution. Staff hours spent manually identifying and removing those duplicates ran into the hundreds, according to internal department communications reviewed as part of the city's open data reporting requirements.
What the Fix Looks Like — and What It Will Take
The Department of Technology began piloting an automated image-fingerprinting tool in early 2026, applied first to the Planning Department's archive of roughly 4 million scanned documents. The technology assigns a unique hash to each image file, flagging exact and near-exact duplicates for human review before deletion. Early results from the pilot, covering approximately 800,000 files, identified a duplication rate that city staff described in a March 2026 progress memo as higher than initially projected — though the city has not released the precise figure publicly.
Storage costs matter here because San Francisco pays for cloud and on-premises capacity at commercial rates. Municipal IT contracts in cities of comparable scale typically run between $2 million and $6 million annually for enterprise storage alone, and every percentage point of redundant data represents real budget exposure at a time when the city faces a structural deficit that has already forced cuts in other departments.
The practical stakes extend to the public records process. When a resident in the Tenderloin files a Sunshine Ordinance request for building inspection photographs related to a code enforcement action, staff must search through systems that sometimes return multiple copies of the same image with different file names and timestamps. That slows response times and increases the risk of incomplete disclosures.
The Department of Technology has indicated it plans to expand the deduplication pilot to the Department of Building Inspection's archive by the end of calendar year 2026. Residents and attorneys who regularly file public records requests should expect the process to become somewhat faster once the cleanup reaches full scale — though the timeline for completing the city-wide effort has not been publicly committed to in writing.