The Daily San Francisco

San Francisco news, every day

News

How San Francisco's Digital Archives Got Buried in Duplicate Images — and What's Being Done About It

A slow-building data management crisis in the city's public records systems has finally forced agencies to confront years of duplicated, untagged, and redundant image files clogging storage and slowing public access.

By San Francisco News Desk · Published 4 July 2026, 11:45 am

3 min read

San Francisco's municipal agencies are sitting on a digital records problem years in the making. Across departments from the Planning Commission on Kearny Street to the Department of Public Works yards in SoMa, duplicate image files have accumulated by the millions — the result of fragmented software systems, pandemic-era remote workflows, and a decades-long failure to enforce consistent digital asset management standards. The cost, in both cloud storage fees and staff hours spent hunting for the right version of a file, has grown significant enough that the city's Department of Technology opened a formal review earlier this year.

The timing matters. San Francisco is mid-way through a sweeping push to digitize its permitting and public records infrastructure, a project anchored partly by the city's OpenData portal at data.sfgov.org. Getting that initiative right depends on clean, deduplicated data. If the underlying image libraries remain cluttered with redundant files — the same aerial photograph of the Tenderloin saved eleven times under eleven different filenames, for instance — search results degrade, storage costs climb, and the promise of transparent public access turns hollow.

How the Problem Accumulated

The roots go back at least to 2012, when the city began migrating from on-premise servers toward cloud storage without a unified naming convention or deduplication protocol. Each department essentially built its own digital filing system. The San Francisco Municipal Transportation Agency, which manages Muni's sprawling image archive of infrastructure, signage, and incident documentation, operated a separate content management system from the one used by the Recreation and Parks Department. When files were shared between agencies — during joint projects at Civic Center or along the Van Ness Bus Rapid Transit corridor — duplicates propagated with every email attachment and shared-drive upload.

The COVID-19 pandemic made everything worse. Remote work beginning in March 2020 pushed employees onto personal cloud accounts and consumer-grade file-sharing tools. When staff returned to city offices, those files often got re-uploaded rather than reconciled. Estimates from the Department of Technology's internal review, circulated this spring, suggested the problem had grown to encompass tens of thousands of redundant image files across major departments, though the department has not released a precise public count. Storage costs for municipal cloud contracts have risen sharply in recent years alongside commercial rates industry-wide.

The issue is not unique to San Francisco. Cities that underwent fast digitization without governance frameworks — New York's 311 image database and Chicago's building inspection photo archives have both faced similar audits — tend to arrive at the same reckoning. But San Francisco's particular combination of legacy systems from the dot-com era and aggressive digital ambition in the 2010s made the accumulation especially dense.

What Comes Next for City Systems

The Department of Technology has begun piloting automated deduplication software on a subset of the Planning Department's image library, which includes decades of site photographs tied to permit applications across neighborhoods from the Outer Sunset to the Bayview. The pilot, which started in the second quarter of 2026, uses hash-matching algorithms to flag identical files regardless of their filename or folder location. Flagged files go to a human reviewer before deletion — a safeguard insisted on after an early trial run briefly removed photographs that turned out to be legally required attachments to active permit files.

For San Francisco residents trying to access public records, the practical advice is straightforward: use the city's official OpenData portal and the Planning Department's online permit search rather than requesting files directly from department staff. Those channels are first in line to receive cleaned-up, deduplicated archives as the project moves forward. Requests routed through departmental email often pull from the older, messier backend storage.

The broader lesson, technology administrators have argued internally, is that digital asset governance needs the same mandatory standards that govern paper records. A filing clerk in 1985 did not get to invent her own indexing system. A city employee uploading site photographs in 2026 probably should not either.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.