The Daily San Francisco

San Francisco news, every day

News

San Francisco Agencies Push to Replace Duplicate Images in Digital Archives This Week

City departments and local tech nonprofits accelerated a quiet but costly effort to clean up redundant visual content clogging public-facing platforms.

By San Francisco News Desk · Published 4 July 2026, 11:45 am

3 min read

San Francisco's Department of Technology flagged more than 14,000 duplicate image files across city-managed web properties this week, triggering an accelerated cleanup effort that touches everything from the Municipal Transportation Agency's rider-facing app to the Planning Department's public permit portal on Kearny Street. The redundancy problem, long acknowledged but rarely prioritized, has now moved to the top of the city's digital infrastructure agenda heading into the fiscal year that began July 1.

The timing matters. San Francisco has spent the past 18 months consolidating digital services under a unified content management framework after a 2024 audit found that siloed departmental websites were costing the city an estimated $2.3 million annually in excess storage, broken links, and failed accessibility compliance checks. Duplicate images — the same photograph or graphic uploaded multiple times under different file names — were identified as a leading driver of that waste. Fixing them isn't glamorous, but it has direct consequences for loading times on platforms that residents use daily to check bus arrivals, file permit applications, or access social services.

Who Is Doing the Work — and Where

The hands-on work this week involved staff at the city's Digital Services office on Van Ness Avenue and contractors coordinating through the nonprofit Code for San Francisco, which operates out of co-working space in the Mid-Market corridor. Code for San Francisco's volunteer brigade has been developing an open-source duplicate-detection script since March 2026, using perceptual hashing — a technique that identifies visually identical or near-identical images even when file names and metadata differ. By Thursday, the brigade had processed roughly 60 percent of the MTA's image library, surfacing more than 800 confirmed duplicates in the agency's real-time rider alerts database alone.

The San Francisco Public Library's Digital Collections team, based at the main branch on Larkin Street in the Civic Center, is running a parallel effort focused on historical photographs. Librarians there have been working since May to de-duplicate the roughly 190,000 images in the San Francisco Historical Photograph Collection before migrating the archive to a new cloud-hosted platform this fall. Duplicates in that collection are often the result of multiple digitization passes over decades, each generating slightly different file versions of the same print.

The problem is hardly unique to San Francisco — cities from Chicago to Seoul have wrestled with image redundancy as digital archives grew faster than the governance structures meant to manage them. But San Francisco's particular combination of legacy infrastructure, high-volume open-data commitments, and a tech workforce that frequently rotates between private sector AI companies and city consulting roles has made the issue both more visible and more tractable here than elsewhere.

Why This Week's Push Could Have Lasting Impact

A key development on Wednesday was the release of version 2.1 of the city's draft Digital Asset Management Policy, circulated internally by the Department of Technology. The updated draft includes, for the first time, a mandatory de-duplication checkpoint before any image file can be uploaded to a city-managed content management system. The policy is expected to go before the city's Committee on Information Technology for review in August 2026. If adopted, it would apply to all 53 city departments that maintain public-facing digital properties.

The practical stakes are real. Storage costs on the city's primary cloud contract run approximately $0.023 per gigabyte per month — a figure that sounds trivial until multiplied across petabytes of redundant files accumulated over a decade. Beyond cost, duplicate images create genuine compliance headaches: when the same photograph exists in eight versions, ensuring that every copy carries the correct alt-text for screen readers becomes exponentially harder, putting the city at risk under Section 508 of the federal Rehabilitation Act.

Residents and local developers who want to track the effort can follow Code for San Francisco's public GitHub repository, where the de-duplication script is posted under an open-source license. The brigade meets every Wednesday evening at its Mid-Market space. The Department of Technology has said it plans to publish a summary of findings from the current sweep before the end of July — the first public accounting of the city's image redundancy problem in at least three years.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.