San Francisco's public-facing digital infrastructure has a clutter problem. Across city agency websites, the Department of Technology's shared content management systems, and the civic data portals maintained under DataSF — the city's open data program headquartered at City Hall — thousands of duplicate images have accumulated over years of decentralized uploading, staff turnover, and emergency-era content sprawl. The question now is who cleans it up, who pays for it, and what standards govern the work going forward.
The issue gained sharper attention this spring when the city's Digital Services team, which operates under the office of the Chief Data Officer on Dr. Carlton B. Goodlett Place, began an audit of assets stored across roughly 40 departmental subsites. The audit found that storage redundancy was driving up annual licensing costs for the city's cloud infrastructure contracts and slowing load times on high-traffic pages, including the SF311 service portal that residents use to report street conditions and request city services. Duplicate images — photographs of the same pothole filed twice, banner graphics uploaded in three slightly different sizes, event photos duplicated across department calendars — were among the most common offenders.
Why the Stakes Are High Right Now
Timing matters here. The city is in the middle of a broader push to modernize its technology stack, with the Mayor's Office of Housing and Community Development and the Planning Department both migrating legacy content to new platforms ahead of a 2027 deadline tied to state housing production mandates. Getting image libraries cleaned and standardized before those migrations is considerably cheaper than doing remediation afterward. A 2024 report from the Government Accountability Office found that federal agencies that performed data deduplication before platform migrations reduced post-migration remediation costs by roughly 30 percent — a figure city technology officials have cited internally as a benchmark, according to documents posted on the city's public budget dashboard.
At the San Francisco Public Library's main branch on Larkin Street, digital archivists have been grappling with a parallel version of the problem. The library's digitized historical photograph collection, which includes tens of thousands of images from the San Francisco History Center, contains an estimated 12 percent duplication rate based on a 2025 internal review. Librarians have been manually flagging redundant files, but the process is slow without automated tooling. The library submitted a grant request to the California State Library earlier this year for funding to license deduplication software, and a decision on that request is expected by September.
Meanwhile, the nonprofit Code for San Francisco — a civic tech volunteer brigade that meets weekly at its Tenderloin-adjacent workspace on Sixth Street — has been prototyping an open-source image deduplication tool designed specifically for municipal use cases. The group demonstrated an early version at a city Digital Services working group meeting in May. The tool uses perceptual hashing to identify near-identical images even when file names or metadata differ, which is particularly useful for catching the kind of organic duplication that builds up when multiple departments photograph the same ribbon-cutting or public hearing.
Decisions Coming This Fall
Three choices will largely define how this plays out. First, the Department of Technology must decide by October whether to mandate a citywide image taxonomy — a standardized naming and tagging protocol — as part of the broader content governance policy currently under public comment through July 31. Second, the Digital Services team must weigh whether to contract with a commercial vendor or integrate Code for San Francisco's open-source prototype; each path carries different cost structures and long-term dependencies. Third, individual departments will need to assign staff time to actual remediation work, which doesn't happen automatically once a tool is chosen.
For residents and community organizations that depend on city portals — from the Tenderloin to the Outer Sunset — the practical payoff is faster-loading pages and more reliable search results when navigating city services. For the city's budget office, it means lower cloud storage bills at a moment when the FY2026–27 budget is already stretched across competing priorities. The audit results are expected to be made public through DataSF by August. What city leadership does with them will signal how seriously this administration treats the unglamorous but consequential work of digital housekeeping.