The Daily San Francisco

San Francisco news, every day

News

How San Francisco's City Agencies Got Buried in Duplicate Images — and What They're Doing About It

Years of siloed digital workflows and emergency-era content rushes left municipal departments sitting on thousands of redundant files, slowing public records, draining storage budgets, and muddying the historical archive.

By San Francisco News Desk · Published 4 July 2026, 12:23 pm

4 min read

How San Francisco's City Agencies Got Buried in Duplicate Images — and What They're Doing About It
Photo: Photo by Pixabay on Pexels

San Francisco's Department of Technology has spent the better part of 2026 quietly untangling one of the more unglamorous problems created by a decade of rapid digitization: tens of thousands of duplicate images cluttering the city's shared content management systems, internal portals, and public-facing websites. The cleanup, now in its second phase after a pilot that wrapped in March, covers assets held by at least eight municipal departments, from the Planning Department's Civic Center offices to the San Francisco Public Library's digital collections on Larkin Street.

The problem did not appear overnight. It accumulated slowly, accelerated by crisis, and is now costing the city real money to fix.

A Long Road to This Moment

The roots trace back to the early 2010s, when departments across City Hall began migrating paper records and physical photo archives to digital storage with little central coordination. Each agency essentially built its own workflow. The San Francisco Municipal Transportation Agency uploaded documentation photos to one system. The Department of Public Health ran its own image library. The Office of Economic and Workforce Development maintained a separate set of promotional assets for programs like Invest in Neighborhoods, the small-business support initiative that operated through dozens of commercial corridors from the Excelsior to the Tenderloin.

When the pandemic hit in March 2020, the duplication problem metastasized. Remote workers uploading images from home laptops, press offices pushing rapid-turnaround graphics for daily briefings, and emergency operations generating new visual documentation all fed into systems that had no automatic deduplication logic. A single aerial photograph of the Ferry Building might exist in six different folders, resized slightly each time, renamed by a different staffer, and tagged with inconsistent metadata — or none at all.

The situation was compounded by the tech layoffs that swept through San Francisco's private sector starting in 2022 and 2023. Several city contractors who had managed content systems were caught up in broader industry cuts, and institutional knowledge about folder structures and naming conventions walked out the door with them. By late 2024, the Department of Technology's own internal audit — referenced in budget documents submitted to the Board of Supervisors — estimated the city held more than 1.2 million image files across shared drives, with duplication rates in some department repositories running above 40 percent.

The Cleanup, and Why It Matters Beyond IT

That figure matters for reasons beyond storage costs, though storage is not trivial: cloud hosting fees for municipal systems have risen sharply, and redundant assets inflate those bills every billing cycle. More consequentially, duplicate images slow public records responses. When a journalist or attorney submits a California Public Records Act request for documentation photos of, say, a permitted construction site in the Mission District, staffers must manually sort through conflicting versions of the same file to determine which is the authoritative copy. That process adds hours — sometimes days — to response times the city is legally obligated to meet.

The San Francisco Public Library's Digital Collections team flagged a related concern in early 2025: duplicate entries in its historical photo archive were surfacing in public search results, confusing researchers and degrading the integrity of the catalogue. The library, which holds records dating to the Gold Rush era, began its own deduplication effort in cooperation with the Internet Archive, a San Francisco-based nonprofit headquartered on Funston Avenue in the Richmond District, which has long partnered with city institutions on digital preservation.

The Department of Technology's current phase-two effort relies on automated hash-matching software to flag identical or near-identical files before human reviewers make final calls on which version to keep. The city issued a contract for the software platform in January 2026. Departments are expected to complete their reviews on a rolling basis through the end of the fiscal year, which closes June 30, 2027.

For residents and city workers, the practical upshot is straightforward: public portals should become faster and easier to search, records requests should take less time to fulfill, and the historical archive should grow more reliable. Anyone who has tried to locate a specific permit photo through SF Planning's online portal — and found three versions of the same blurry JPEG — will understand why that outcome is worth the effort it took to get here.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.