San Francisco Agencies Push to Replace Duplicate Images in Digital Archives This Week
City departments and local tech nonprofits accelerated a quiet but costly effort to clean up redundant visual content clogging public-facing platforms.
City departments and local tech nonprofits accelerated a quiet but costly effort to clean up redundant visual content clogging public-facing platforms.
San Francisco's Department of Technology flagged more than 14,000 duplicate image files across city-managed web properties this week, triggering an accelerated cleanup effort that touches everything from the Municipal Transportation Agency's rider-facing app to the Planning Department's public permit portal on Kearny Street. The redundancy problem, long acknowledged but rarely prioritized, has now moved to the top of the city's digital infrastructure agenda heading into the fiscal year that began July 1.
The timing matters. San Francisco has spent the past 18 months consolidating digital services under a unified content management framework after a 2024 audit found that siloed departmental websites were costing the city an estimated $2.3 million annually in excess storage, broken links, and failed accessibility compliance checks. Duplicate images — the same photograph or graphic uploaded multiple times under different file names — were identified as a leading driver of that waste. Fixing them isn't glamorous, but it has direct consequences for loading times on platforms that residents use daily to check bus arrivals, file permit applications, or access social services.
The hands-on work this week involved staff at the city's Digital Services office on Van Ness Avenue and contractors coordinating through the nonprofit Code for San Francisco, which operates out of co-working space in the Mid-Market corridor. Code for San Francisco's volunteer brigade has been developing an open-source duplicate-detection script since March 2026, using perceptual hashing — a technique that identifies visually identical or near-identical images even when file names and metadata differ. By Thursday, the brigade had processed roughly 60 percent of the MTA's image library, surfacing more than 800 confirmed duplicates in the agency's real-time rider alerts database alone.
The San Francisco Public Library's Digital Collections team, based at the main branch on Larkin Street in the Civic Center, is running a parallel effort focused on historical photographs. Librarians there have been working since May to de-duplicate the roughly 190,000 images in the San Francisco Historical Photograph Collection before migrating the archive to a new cloud-hosted platform this fall. Duplicates in that collection are often the result of multiple digitization passes over decades, each generating slightly different file versions of the same print.
The problem is hardly unique to San Francisco — cities from Chicago to Seoul have wrestled with image redundancy as digital archives grew faster than the governance structures meant to manage them. But San Francisco's particular combination of legacy infrastructure, high-volume open-data commitments, and a tech workforce that frequently rotates between private sector AI companies and city consulting roles has made the issue both more visible and more tractable here than elsewhere.
A key development on Wednesday was the release of version 2.1 of the city's draft Digital Asset Management Policy, circulated internally by the Department of Technology. The updated draft includes, for the first time, a mandatory de-duplication checkpoint before any image file can be uploaded to a city-managed content management system. The policy is expected to go before the city's Committee on Information Technology for review in August 2026. If adopted, it would apply to all 53 city departments that maintain public-facing digital properties.
The practical stakes are real. Storage costs on the city's primary cloud contract run approximately $0.023 per gigabyte per month — a figure that sounds trivial until multiplied across petabytes of redundant files accumulated over a decade. Beyond cost, duplicate images create genuine compliance headaches: when the same photograph exists in eight versions, ensuring that every copy carries the correct alt-text for screen readers becomes exponentially harder, putting the city at risk under Section 508 of the federal Rehabilitation Act.
Residents and local developers who want to track the effort can follow Code for San Francisco's public GitHub repository, where the de-duplication script is posted under an open-source license. The brigade meets every Wednesday evening at its Mid-Market space. The Department of Technology has said it plans to publish a summary of findings from the current sweep before the end of July — the first public accounting of the city's image redundancy problem in at least three years.
How does this story make you feel?
Spread the word
About this article
Published by The Daily San Francisco
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News