San Francisco's Department of Technology quietly expanded a pilot program this week aimed at purging duplicate image files from the city's sprawling network of public-facing digital archives — a technical housekeeping problem that has ballooned into a measurable drain on server capacity and staff time across at least a dozen municipal agencies.
The issue is less glamorous than the housing crisis or the fentanyl surge on Civic Center Plaza, but it carries real costs. City IT staff have flagged that redundant image files — photos uploaded multiple times across permit portals, planning records, and public health databases — now account for a disproportionate share of storage overhead in systems that agencies rely on daily. With the city's contract for expanded cloud storage up for renewal before the end of fiscal year 2026, the pressure to act became impossible to ignore.
What Changed This Week
The Department of Technology rolled out updated deduplication software to three pilot agencies starting July 1: the San Francisco Planning Department on Spear Street, the Office of the Assessor-Recorder at City Hall, and the San Francisco Public Library's digital collections team based out of the main branch on Larkin Street in the Tenderloin. The software flags identical or near-identical image files before they are ingested into the city's central content management system, allowing staff to approve or reject the duplicate before it is saved.
The pilot follows months of internal review. City records reviewed by The Daily San Francisco show that the Planning Department alone logged more than 340,000 image uploads in the 12 months ending March 2026, a figure that department IT coordinators said included a significant proportion of resubmitted files from permit applicants who were unaware their documents had already been received. The Assessor-Recorder's office faced a similar problem with property inspection photos, many of which were submitted by contractors through multiple portals.
The San Francisco Public Library's digital archive program — which has been digitizing historical photographs from neighborhood collections including the Western Neighborhoods Project's materials on the Sunset and Richmond Districts — ran into its own duplication headache as volunteer contributors uploaded the same scanned images through different submission channels. Library staff spent an estimated several hundred staff-hours over the past year manually reconciling duplicate entries, according to internal documents obtained by this publication.
Why the Timing Matters
The city's current cloud storage agreement, managed through a vendor contract that covers multiple agencies, is set for renegotiation in September 2026. Reducing the volume of redundant files before that date gives the city's negotiators a stronger hand to argue for lower-tier storage pricing. San Francisco's Office of Contract Administration, which oversees the procurement process, had flagged storage cost containment as a budget priority in its spring 2026 review.
The move also intersects with a broader shift in how city agencies are handling AI-assisted document review. Several departments have begun feeding permit and inspection records into machine-learning tools to speed processing times — tools that perform worse, and sometimes fail outright, when trained on datasets contaminated by duplicate images. Cleaning up those archives now is partly about storage costs and partly about making the underlying data usable for the next generation of city software.
For residents who interact with city permit systems — particularly contractors and architects who regularly submit documentation to the Planning Department's online portal at 49 South Van Ness Avenue — the practical upshot may be fewer error messages and faster confirmation that files have been received. The department has said it plans to add clearer on-screen alerts when a duplicate is detected at the point of upload, rather than leaving applicants to discover the problem days later.
The three-agency pilot is scheduled to run through the end of August, with a report to the city's Committee on Information Technology expected in early September. If the pilot clears that review, the Department of Technology intends to push the deduplication tools citywide before the fiscal year ends in June 2027. Agencies not yet in the pilot — including the Department of Public Health and the Municipal Transportation Agency — have been told to begin auditing their own image storage practices in the meantime.