San Francisco's public agencies and nonprofits are sitting on a quiet data crisis. An internal review process now underway across several city departments has found that duplicate images — identical or near-identical files stored multiple times across different servers and cloud platforms — account for a measurable share of bloated digital storage costs, drawing renewed attention from budget-conscious administrators at City Hall on Dr. Carlton B. Goodlett Place.
The timing matters. San Francisco's fiscal year 2025-26 budget allocated roughly $14 million toward citywide technology infrastructure, a figure that includes cloud storage contracts across departments ranging from the Department of Public Health to the SF Municipal Transportation Agency. When a significant portion of those storage gigabytes are occupied by redundant files, the waste compounds annually as contract renewals scale costs upward.
How the Duplication Problem Accumulates
The mechanics are straightforward. Staff at agencies like the San Francisco Public Library system — which operates 28 branch locations from the Main Library on Larkin Street to the Excelsior Branch on Ocean Avenue — routinely upload photographs, scanned documents, and promotional graphics to shared drives. Without automated deduplication tools in place, the same image often lives in three, four, or five separate folders. Multiply that across departments and the storage overhead becomes substantial.
Digital asset management specialists describe a common pattern: organizations that have not adopted a centralized digital asset management platform by 2024 typically find that between 20 and 40 percent of their stored image files are duplicates or near-duplicates. For a mid-size city agency running a 10-terabyte storage environment on a commercial cloud platform, that range translates directly into thousands of dollars in avoidable annual spending. Cloud storage pricing from major providers currently runs anywhere from $0.02 to $0.08 per gigabyte per month depending on the service tier and volume commitments — costs that accrue invisibly until an audit surfaces them.
The San Francisco Arts Commission, which maintains a large archive of public art documentation for the more than 4,000 works in the Civic Art Collection, has been among the city bodies exploring image deduplication tools as part of a broader collections management overhaul. Similarly, the Mayor's Office of Housing and Community Development has been digitizing project records tied to affordable housing developments across the Tenderloin and SoMa neighborhoods, generating substantial new image libraries without a standardized filing protocol in place.
The Cost of Doing Nothing
The practical damage runs beyond the storage bill. Staff time spent searching through redundant files — pulling the wrong version of an image, downloading an outdated graphic, or uploading a file that already exists — is a productivity drain that rarely gets quantified. Research from information management organizations suggests that knowledge workers spend an average of 1.8 hours per day searching for information, a figure that worsens when digital libraries are disorganized.
For San Francisco specifically, the issue intersects with the city's push to digitize services following pandemic-era disruptions. The Digital Services team within the City Administrator's Office has championed several modernization efforts since 2021, but deduplication — unglamorous by nature — has lagged behind higher-profile initiatives like the 311 app refresh and the SFMTA real-time transit data portal upgrades.
Agencies looking to address the problem have several practical paths. Open-source tools including Python-based image hashing libraries can identify exact and near-exact duplicates across local file systems at no licensing cost. Commercial platforms like Bynder or Canto offer enterprise-grade digital asset management with built-in deduplication, typically priced on annual subscription models starting around $15,000 per year for a mid-size team. A phased approach — auditing the largest storage repositories first, then establishing a single source-of-truth folder structure before migrating to a managed platform — is the route most municipal IT departments take when resources are constrained.
City departments that have not yet conducted a storage audit should treat the start of the new fiscal year, July 1, 2026, as a natural trigger point. Catching duplicate image bloat before the next cloud contract renewal cycle, typically negotiated in the fall, is the single most direct way to recover budget that can be redirected toward services San Francisco residents actually see.