The Daily San Francisco

San Francisco news, every day

News

How San Francisco's Digital Archives Ended Up Drowning in Duplicate Images — and What It's Costing the City

Years of siloed city departments, rushed digitization projects, and vendor contracts with no deduplication standards have left San Francisco's public image databases bloated, redundant, and expensive to maintain.

By San Francisco News Desk · Published 4 July 2026, 12:06 pm

3 min read

How San Francisco's Digital Archives Ended Up Drowning in Duplicate Images — and What It's Costing the City
Photo: Photo by Ayman Bardi on Pexels

San Francisco's municipal digital infrastructure has a clutter problem. Across at least a dozen city departments — from the Planning Department on Kearny Street to the Department of Public Works and the Office of Economic and Workforce Development — public-facing image databases contain tens of thousands of duplicate files, the product of more than a decade of uncoordinated digitization efforts and contracts awarded without common technical standards. The city's Department of Technology has been quietly working since early 2025 to quantify the scope of the problem, and what it found wasn't pretty.

The duplication issue isn't merely aesthetic. Storage costs money. Redundant image libraries slow down permit portals, neighborhood planning tools, and public-records request systems that San Franciscans use every day. The Planning Department's online map portal — which residents in neighborhoods from the Sunset to Bayview use to check development proposals — has faced recurring load issues tied partly to unoptimized media libraries, according to city technology documentation reviewed as part of the Department of Technology's 2025 infrastructure audit.

How San Francisco Got Here

The roots of the problem go back to 2012 and 2013, when several departments raced to digitize physical records under pressure from then-Mayor Edwin Lee's open-data initiative. Each department hired its own vendors. The Planning Department went with one content management system. The Recreation and Park Department, which manages more than 220 parks including Golden Gate Park and Dolores Park, used another. The San Francisco Public Library digitized its photo archive using a third platform. None of these systems talked to each other, and none required deduplication as a contract deliverable.

By 2019, the city's centralized Digital Services team — housed within the Department of Technology at 1 South Van Ness Avenue — had identified cross-departmental duplication as a problem worth addressing. A pilot cleanup of the San Francisco Municipal Transportation Agency's internal media library that year found that roughly 34 percent of stored image files were duplicates or near-duplicates, according to a summary of that pilot included in the Department of Technology's fiscal year 2020 annual report. The SFMTA alone was paying storage and licensing fees on thousands of redundant files.

Then came the pandemic. The digitization push accelerated sharply between 2020 and 2022, as departments scrambled to put services online with little time for technical coordination. The Controller's Office approved emergency procurement waivers that bypassed the standard vendor review process, meaning more platforms entered the city's ecosystem with no unified media-handling requirements. By 2023, the Department of Technology estimated the city was managing images across more than 40 separate content repositories.

The Cost, and What Comes Next

Cloud storage isn't free. San Francisco's city government spends roughly $18 million annually on cloud infrastructure contracts across departments, a figure the Budget and Legislative Analyst's Office flagged in its March 2025 technology spending review. Duplicate image files represent a measurable fraction of that overhead, though the Department of Technology has not yet released a precise figure for how much deduplication could save.

The current remediation effort — which the Department of Technology is calling the Digital Asset Consolidation Initiative — targets six departments in its first phase, including Planning, Public Works, and the Office of Civic Innovation. The initiative uses automated hash-matching software to flag identical or near-identical files before human reviewers make final deletion calls. Phase one is scheduled to conclude by December 2026.

For residents and journalists who rely on city portals to pull permit records, environmental review documents, or neighborhood planning maps, the practical payoff should be faster load times and cleaner search results. The Planning Department's SFPlanning.org portal in particular has been promised a performance overhaul once its media library is cleared of redundant files.

City technologists working on the project say the longer-term fix requires procurement reform — specifically, mandatory deduplication and interoperability standards written into any new vendor contract touching city media storage. Whether that policy lands before the next wave of digitization contracts are awarded will determine whether San Francisco ends up back in the same place five years from now.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.