San Francisco's municipal agencies are sitting on a digital records problem years in the making. Across departments from the Planning Commission on Kearny Street to the Department of Public Works yards in SoMa, duplicate image files have accumulated by the millions — the result of fragmented software systems, pandemic-era remote workflows, and a decades-long failure to enforce consistent digital asset management standards. The cost, in both cloud storage fees and staff hours spent hunting for the right version of a file, has grown significant enough that the city's Department of Technology opened a formal review earlier this year.
The timing matters. San Francisco is mid-way through a sweeping push to digitize its permitting and public records infrastructure, a project anchored partly by the city's OpenData portal at data.sfgov.org. Getting that initiative right depends on clean, deduplicated data. If the underlying image libraries remain cluttered with redundant files — the same aerial photograph of the Tenderloin saved eleven times under eleven different filenames, for instance — search results degrade, storage costs climb, and the promise of transparent public access turns hollow.
How the Problem Accumulated
The roots go back at least to 2012, when the city began migrating from on-premise servers toward cloud storage without a unified naming convention or deduplication protocol. Each department essentially built its own digital filing system. The San Francisco Municipal Transportation Agency, which manages Muni's sprawling image archive of infrastructure, signage, and incident documentation, operated a separate content management system from the one used by the Recreation and Parks Department. When files were shared between agencies — during joint projects at Civic Center or along the Van Ness Bus Rapid Transit corridor — duplicates propagated with every email attachment and shared-drive upload.
The COVID-19 pandemic made everything worse. Remote work beginning in March 2020 pushed employees onto personal cloud accounts and consumer-grade file-sharing tools. When staff returned to city offices, those files often got re-uploaded rather than reconciled. Estimates from the Department of Technology's internal review, circulated this spring, suggested the problem had grown to encompass tens of thousands of redundant image files across major departments, though the department has not released a precise public count. Storage costs for municipal cloud contracts have risen sharply in recent years alongside commercial rates industry-wide.
The issue is not unique to San Francisco. Cities that underwent fast digitization without governance frameworks — New York's 311 image database and Chicago's building inspection photo archives have both faced similar audits — tend to arrive at the same reckoning. But San Francisco's particular combination of legacy systems from the dot-com era and aggressive digital ambition in the 2010s made the accumulation especially dense.
What Comes Next for City Systems
The Department of Technology has begun piloting automated deduplication software on a subset of the Planning Department's image library, which includes decades of site photographs tied to permit applications across neighborhoods from the Outer Sunset to the Bayview. The pilot, which started in the second quarter of 2026, uses hash-matching algorithms to flag identical files regardless of their filename or folder location. Flagged files go to a human reviewer before deletion — a safeguard insisted on after an early trial run briefly removed photographs that turned out to be legally required attachments to active permit files.
For San Francisco residents trying to access public records, the practical advice is straightforward: use the city's official OpenData portal and the Planning Department's online permit search rather than requesting files directly from department staff. Those channels are first in line to receive cleaned-up, deduplicated archives as the project moves forward. Requests routed through departmental email often pull from the older, messier backend storage.
The broader lesson, technology administrators have argued internally, is that digital asset governance needs the same mandatory standards that govern paper records. A filing clerk in 1985 did not get to invent her own indexing system. A city employee uploading site photographs in 2026 probably should not either.