San Francisco's municipal digital infrastructure is carrying tens of thousands of duplicate image files across its public-facing platforms — a data hygiene problem that technology auditors and city IT staff have quietly flagged as a growing cost driver heading into fiscal year 2027. The issue spans everything from the Department of Public Health's patient portal to the Planning Department's online permit tracker on Civic Center Drive, where redundant photo uploads have compounded over years of decentralized content management.
The timing matters. San Francisco's city budget office is under pressure to trim operational spending after a projected general fund shortfall that the Controller's Office has been tracking through the spring. Cloud storage is not free, and duplicate image files — defined as identical or near-identical visual assets stored separately rather than referenced from a single source — represent one of the most straightforward line items to cut. The problem is that nobody, until recently, was counting.
What the Audit Numbers Actually Show
A technology review conducted earlier this year by the San Francisco Department of Technology found that across seven major city web properties, image duplication rates ranged from 18 percent to as high as 41 percent of total stored assets on certain legacy content management systems. The SF.gov portal alone, which consolidated dozens of department sub-sites starting in 2021, inherited image libraries from predecessor systems that were never deduplicated before migration. That inheritance has ballooned storage overhead on the city's contracted cloud environment.
Industry benchmarks offer useful context. According to data published by Cloudinary, a media management firm, organizations that migrate websites without a deduplication pass typically see 25 to 40 percent of their image storage consumed by redundant files within three years of launch. San Francisco's numbers fall squarely inside that range. The Department of Technology has estimated — in internal planning documents, not yet released publicly — that a structured deduplication pass across the SF.gov ecosystem could reduce image storage volume by roughly 30 percent, translating to meaningful reductions in monthly cloud hosting fees billed to individual city departments.
Muni's digital infrastructure presents a separate but related case. The Municipal Transportation Agency manages dynamic image content across more than 300 digital signs at stations including Embarcadero, Civic Center, and Powell Street. Operational staff have confirmed that sign management software on older hardware nodes sometimes stores cached image assets redundantly rather than pulling from a central repository — a known limitation of the legacy Daktronics display system the agency is in the process of phasing out. The MTA's capital program for display infrastructure replacement is budgeted through 2028.
The Fix Is Simpler Than the Problem Sounds
Deduplication is not exotic technology. Hash-based matching — where software generates a unique fingerprint for each image file and flags identical fingerprints — can process thousands of files in minutes on standard server hardware. Several San Francisco nonprofits working in the civic tech space, including Code for San Francisco, which runs volunteer brigades out of its weekly meetups in SoMa, have proposed open-source tooling that city departments could adopt without major procurement overhead.
The practical obstacle is governance, not software. City departments have historically managed their own content with limited coordination through the Department of Technology, which means no single office has had the authority — or the mandate — to enforce a unified image asset library. That is changing. A new Digital Services policy framework, which the City Administrator's Office has been developing since late 2025, is expected to address centralized asset management as part of its scope.
For residents and businesses that interact with city portals daily — filing permits through the Planning Department's Accela system, checking Muni real-time data, or accessing DPH services — the downstream benefit of fixing this problem is faster page load times and more reliable content delivery. Page weight matters: studies by Google's web performance team have consistently shown that each additional second of load time reduces user engagement by measurable margins. In a city where broadband access remains uneven across neighborhoods including the Tenderloin and Visitacion Valley, that lag is not abstract.
The Department of Technology has indicated that a formal deduplication initiative will be included in its FY2027 work plan, with implementation targeting the SF.gov core platform before the end of the calendar year.