San Francisco's public-facing digital infrastructure has a clutter problem. Across city department websites, the San Francisco Public Library's online catalog, and nonprofit grant portals, duplicate images — the same photograph or graphic stored multiple times under different file names — now account for an estimated 30 to 40 percent of total image storage in some municipal content management systems, according to IT project documentation circulating among city contractors this spring.
The issue landed on the radar of the Controller's Office after a 2025 audit of the Department of Technology's cloud storage contracts revealed that the city was paying for redundant data at scale. Cloud storage costs for municipal departments climbed past $4.2 million annually by the end of fiscal year 2025, with a significant portion attributed to unmanaged media libraries that had never been systematically deduplicated.
How San Francisco Got Here
The roots go back to the early 2010s, when city agencies began migrating from physical servers to cloud-based content management platforms with little coordination. The SF Digital Services team, based at 1 Dr. Carlton B. Goodlett Place in Civic Center, has spent the past two years trying to consolidate department websites onto a unified platform — but the image duplication problem predates that effort by years and has compounded with every site migration.
The San Francisco Arts Commission, which maintains an extensive online portfolio of public art installations across the city, flagged the problem internally in late 2024. Its digital archive contained more than 14,000 image files, with staff estimating that roughly 3,800 of those were duplicates or near-duplicates — same photograph, different resolution or file name — uploaded during successive website overhauls since 2014. Storage overhead for the Commission's media library was running about $18,000 per year, a figure that auditors noted could drop by a third with a one-time deduplication pass.
The San Francisco Public Library's digital collections present a similar picture. The library's online portal, which serves branches from the Main Library on Larkin Street to the Chinatown branch on Sacramento Street, manages digitized historical photographs, event imagery, and program graphics. Library technology staff identified more than 22,000 duplicate image records in a sample audit of the portal's back end completed in March 2026. Because the library's system indexes images individually rather than by content hash, identical files uploaded at different times register as separate assets, inflating both storage costs and search results.
What Deduplication Actually Costs — and Saves
Fixing the problem is neither simple nor cheap upfront. Perceptual hashing — the technical process of comparing images by visual content rather than file name — requires either proprietary software licenses or engineering time to implement open-source tools. Vendors pitching the city this year have quoted project costs ranging from $80,000 to $220,000 depending on archive size and the level of human review built into the workflow.
The argument for moving forward rests on long-term savings. If the Controller's Office projections hold, eliminating duplicate image files across six major department systems could reduce annual cloud storage expenditure by roughly $600,000 by fiscal year 2028. That math has attracted attention from the Mayor's Office of Civic Innovation, which has been looking for technology efficiency wins to offset budget pressures created by the city's ongoing homelessness response programs and the residual costs of the fentanyl crisis interventions concentrated in the Tenderloin and SoMa corridors.
SF Digital Services has scheduled a vendor demonstration day at City Hall for August 12, 2026, where three shortlisted firms will present deduplication toolsets to a panel of department IT leads. The Department of Technology is expected to issue a formal RFP by September, with a contract award targeted before the end of calendar year 2026.
For anyone managing a digital content library — whether at a city agency, a Mission District nonprofit, or a small business running its own website — the practical takeaway is straightforward: auditing your image library now, before a platform migration forces the issue, is almost always cheaper than cleaning up after the fact. Free tools like digiKam and open-source perceptual hashing libraries can give even small organizations a baseline count of duplicates before they commit to a paid solution.