The Daily San Francisco

San Francisco news, every day

News

How San Francisco's City Agencies Got Buried in Duplicate Images — and What They're Doing About It

Years of siloed digital storage, emergency-era document dumps, and rapid platform migrations left municipal databases bloated with redundant files; now a reckoning is underway.

By San Francisco News Desk · Published 4 July 2026, 12:00 pm

3 min read

How San Francisco's City Agencies Got Buried in Duplicate Images — and What They're Doing About It
Photo: Photo by Malcolm Hill on Pexels

San Francisco's Department of Technology has been quietly wrestling with a problem that sounds mundane until you see the numbers: city agencies collectively stored an estimated three to five copies of the same digital image for every one they actually needed, according to an internal audit framework the department began piloting across Mission Street offices in late 2024. The duplication problem did not appear overnight. It accumulated across two decades of overlapping software contracts, pandemic-driven digitization rushes, and a culture in which deleting anything felt riskier than keeping everything.

The timing matters. With the city facing a projected budget shortfall that Mayor Daniel Lurie's administration is still working to close heading into fiscal year 2027, every gigabyte of redundant cloud storage is a line item someone has to defend. Enterprise cloud contracts — the kind the city holds with major vendors to house planning documents, permit photos, and public-health imagery — can run into the hundreds of thousands of dollars annually at municipal scale. Trimming duplicate data is no longer just a housekeeping question; it is a budget conversation.

How the Backlog Built Up

The roots go back to the mid-2000s, when departments began scanning paper records independently rather than through any centralized protocol. The San Francisco Planning Department, housed on Seventh Street, digitized decades of parcel photographs and permit files. The Department of Building Inspection, operating out of 49 South Van Ness, ran parallel digitization efforts. Neither operation talked much to the other, and both fed into separate content-management systems that rarely communicated.

Then came COVID-19. Starting in March 2020, city staff working remotely uploaded documents to whatever platform their department already subscribed to — Google Drive, SharePoint, legacy FTP servers — and nobody coordinated deduplication. The San Francisco Department of Public Health alone onboarded at least four separate cloud environments during the pandemic period, according to city technology planning documents released under a Sunshine Ordinance request in 2025. Each environment collected its own image libraries: testing-site photographs, facility inspection shots, public-communications graphics. By 2022, the overlap was structural.

The SF Digital Services team, which operates under the umbrella of the Department of Technology at City Hall, began mapping the problem formally in early 2023 as part of a broader data-governance initiative tied to the city's Strategic Plan for Technology. That mapping exercise found duplicate imagery concentrated in three clusters: planning and zoning records, public-health communications assets, and the image libraries attached to the city's open-data portal on DataSF. Fixing any one cluster required coordinating across department heads who had different priorities and different vendor relationships.

The Cleanup Effort and What Comes Next

Progress has been real but uneven. DataSF, the city's public data portal, completed a deduplication pass on its hosted image sets in the first quarter of 2026, retiring roughly 40,000 redundant files according to figures the Department of Technology shared at a March Board of Supervisors committee hearing. The Planning Department is mid-process, working through a backlog of parcel photos that stretch back to 1998. Building Inspection has not yet begun a formal deduplication effort, city records show.

The practical obstacle is human as much as technical. Automated deduplication tools — several open-source options and at least two commercial platforms the city evaluated in 2025 — can identify pixel-identical images quickly. The harder cases are near-duplicates: the same building photographed six days apart, or a public-health graphic resized to three different dimensions for three different platforms. Those require human review, and city staff hours are finite.

For residents and small-business owners who interact with city permitting systems along the Van Ness Corridor or file documents through the online SF Planning portal, the duplication problem has occasionally meant slower load times and misfiled attachments when staff pull the wrong version of a photo from an overcrowded file tree. Fixing it should make those interactions faster and less error-prone.

The Department of Technology has set an internal target of completing agency-wide deduplication protocols by the end of calendar year 2026. Whether departmental budget pressures accelerate or delay that timeline will depend heavily on how the city resolves its broader fiscal picture in the months ahead.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.