San Francisco's Department of Technology has identified more than 340,000 duplicate image files spread across the city's shared digital infrastructure, a backlog that has ballooned over roughly six years and now complicates everything from public records requests to the archiving of city planning documents. The discovery, surfaced during a broader audit of municipal data storage systems this spring, puts a specific number on a problem that IT managers at City Hall had flagged informally for years.
The timing matters. The city is in the middle of a push to modernize its public-facing digital services, and duplicate image clutter is one of the less glamorous obstacles standing in the way. When a planning document for a Mission District housing project pulls from three different versions of the same zoning map, or when a homelessness outreach report contains mismatched photographs because a field worker uploaded the same images twice from two different devices, the downstream consequences range from confusion to legal exposure during litigation.
How the Problem Built Up
The roots go back to 2019, when the city accelerated its push to digitize physical records held at the San Francisco Public Library's History Center on Larkin Street and at the Planning Department's offices on Mission Street. Contractors hired to scan thousands of documents frequently lacked a unified naming convention, and images were saved to multiple shared drives simultaneously — a practice that was sloppy but fast.
Then the pandemic hit in March 2020. Remote work scattered staff across home offices in the Sunset, the Excelsior, and East Bay bedroom communities. Employees working from personal laptops created local copies of files that were later re-uploaded to city servers without deduplication checks. The SF Digital Services team, which operates out of 1 Dr. Carlton B. Goodlett Place, was stretched managing the rollout of emergency benefit portals and had little bandwidth to enforce file hygiene protocols.
The AI hiring boom that followed the tech sector's 2023-2024 contraction brought a new wrinkle. City agencies began contracting with firms in SoMa and Mission Bay to run machine-learning tools on their archives — tools that require clean, well-labeled training data. Those vendors quickly discovered that feeding duplicate images into a model produces degraded outputs, and several contracts were quietly renegotiated when the data quality fell short of what had been promised.
The San Francisco Controller's Office noted in its Fiscal Year 2025 performance report that city departments spent an estimated $2.1 million on unplanned data remediation work in the twelve months ending June 30, 2025 — a figure that includes storage costs, contractor hours, and staff overtime. That number is expected to rise for FY2026 unless a systemic fix is in place before the fiscal year closes on June 30, 2027.
What Happens Next
The Department of Technology has issued a request for proposals, posted in May 2026, seeking vendors who can run automated deduplication across the city's Microsoft Azure and legacy on-premise servers. The contract ceiling listed in the RFP is $875,000. Shortlisted vendors are expected to be announced by September, with remediation work targeted to begin in the first quarter of 2027.
Meanwhile, the SF Digital Services office has circulated new internal guidelines requiring all departments to adopt a standardized file-naming protocol — a 16-character alphanumeric string tied to department code, date, and a sequential identifier — before uploading any image to shared drives. Compliance training is scheduled at the Civic Center campus in August.
For residents and businesses that interact with city permitting systems or submit images as part of applications — say, a restaurateur on Valencia Street filing a sidewalk-seating permit with photos attached — the near-term experience is unlikely to change. But IT officials say the cleanup is foundational to the city's ambition of linking Planning, Public Works, and the Department of Building Inspection into a single permitting interface by late 2027. Get the images wrong, and that integration stalls. The duplicate problem, unglamorous as it is, turns out to be load-bearing.