San Francisco's public agencies collectively store millions of digital photographs, scanned documents and satellite images across dozens of separate servers — and a growing share of that data is exact or near-exact duplicates. The problem, which IT administrators at the Department of Technology have flagged internally for at least two budget cycles, is no longer just a bureaucratic nuisance. It is draining storage budgets, slowing public-records responses and, in at least one documented case, delaying emergency housing inspections in the Tenderloin.
The timing matters because the city is mid-way through a $1.2 billion capital technology plan approved by the Board of Supervisors in March 2025, which was supposed to modernise how departments share and retrieve records. Duplicate imagery — think the same drone photograph of a Mission District encampment stored separately by the Department of Homelessness and Supportive Housing, the Department of Public Works and SFPD — eats into the storage allocations those new systems depend on, and multiplies the cost of every cloud migration the city attempts.
Why Duplication Hits Neighborhoods Hard
The clearest community impact shows up in two places: permit processing and emergency response. The San Francisco Planning Department, which handles roughly 40,000 permit applications a year, relies on property image databases to verify site conditions before approvals. When duplicate or mislabelled images populate those databases, staff must manually reconcile records — a process that, according to the city's own 2025 Controller's Office efficiency audit, added an average of 11 business days to certain complex permit reviews last fiscal year. For a renter on Folsom Street trying to get a habitability repair approved, or a small contractor in Bayview waiting on a commercial renovation sign-off, that lag is money out of pocket.
The San Francisco Public Library's digital archive program, which has spent three years digitising historical photographs of neighborhoods like the Fillmore, Chinatown and the Western Addition, hit a practical wall last autumn when its Drupal-based content management system flagged more than 80,000 near-duplicate image files. Librarians working out of the Main Branch on Larkin Street spent an estimated 600 staff hours — time billed at roughly $47 per hour under existing union contracts — manually reviewing and tagging files before the archive could go live to the public in February.
The nonprofit mapping and advocacy group SFpark Data Collaborative, which operates out of a shared workspace on Market Street near Van Ness, ran into the same wall while building a visual database of sidewalk accessibility barriers for the Mayor's Office on Disability. Duplicate drone captures of the same blocks in the Castro and Noe Valley inflated their dataset by nearly 30 percent, they reported in a project summary published this spring, requiring an additional six weeks of processing before the data could be submitted to the city.
What a Fix Actually Looks Like — and What It Costs
Deduplication software has existed for years, but deploying it across a fragmented municipal IT ecosystem is not simple. Commercial tools from vendors like Cloudinary or ImageKit can automate the process for web-based archives, using perceptual hashing — a technique that identifies visually identical or near-identical images even if file names differ. Enterprise licensing for a city the size of San Francisco typically runs between $80,000 and $250,000 annually depending on storage volume, according to published vendor pricing sheets.
The Department of Technology is expected to issue a request for proposals on an image-asset management platform before the end of the current fiscal year, which closes June 30, 2027. Community advocates, including several housing nonprofits that rely on city permit records for tenant protection cases, are pushing for the RFP to include open-data export requirements so that deduplicated image archives become publicly accessible through DataSF, the city's open data portal, rather than locked inside proprietary systems.
For residents dealing with planning delays right now, the most direct step is to submit permit applications with freshly taken, clearly dated photographs rather than relying on images already in the city's system. The Planning Department's public counter at 49 South Van Ness Avenue can confirm which database records are currently attached to any given parcel — a check that takes about ten minutes and can surface outdated or duplicated files before they cause a hold-up down the line.