The Daily San Francisco

San Francisco news, every day

News

SF City Agencies Are Drowning in Duplicate Digital Images — and the Cleanup Bill Is Adding Up

A growing backlog of redundant image files across San Francisco's municipal databases is costing real money and slowing the systems residents depend on every day.

By San Francisco News Desk · Published 4 July 2026, 11:48 am

3 min read

SF City Agencies Are Drowning in Duplicate Digital Images — and the Cleanup Bill Is Adding Up
Photo: Photo by Mary Muñoz on Pexels

San Francisco's municipal tech infrastructure is carrying tens of thousands of duplicate digital images across city databases, a problem that independent audits of similar mid-sized city governments suggest can inflate storage costs by 30 to 40 percent annually — and the city's Department of Technology has been quietly prioritizing a deduplication push since the start of fiscal year 2026.

The timing matters. The city is finalizing a $14.6 billion budget under Mayor Daniel Lurie, who took office in January promising leaner operations after years of fiscal drift under his predecessor. Every department has been asked to find savings. For the Department of Technology, headquartered at 1 South Van Ness Avenue, redundant image data sitting across permitting portals, public health records systems, and planning department archives represents low-hanging fruit — if anyone can quantify exactly how much is there.

The Numbers Behind the Problem

Duplicate image files accumulate faster than most people expect. When contractors upload building permit photos to the San Francisco Planning Department's online portal, the same JPEG can get saved multiple times — once on submission, once on staff review, once when routed to a supervisor. Multiply that across the roughly 30,000 permit applications the Planning Department processes in a typical year, and the redundancy compounds quickly.

City IT specialists working on the deduplication project — a joint effort between the Department of Technology and the Controller's Office — have not yet released full figures publicly. But federal benchmarking data from the General Services Administration, published in 2024, found that unmanaged duplicate files across government digital systems typically represent between 25 and 45 percent of total stored data. For a city like San Francisco, which the Department of Technology's own budget documents show spends approximately $380 million annually on technology infrastructure and services, even conservative savings from deduplication could run into the millions of dollars per year.

Storage is only part of the cost. Duplicate images slow down the retrieval systems that staff at agencies like the Department of Building Inspection, based at 49 South Van Ness, use daily. When a building inspector needs to pull up photos from a 2019 structural assessment, a bloated image library means longer load times, more manual searching, and higher error rates. Those delays have operational consequences in a city where the housing production emergency has pushed permit processing volumes to near-record levels.

What Deduplication Actually Looks Like on the Ground

The technical fix — running automated hash-matching algorithms to identify pixel-identical or near-identical files, then archiving or deleting the redundant copies — is well-established in the private sector. Companies like Dropbox, which maintains a major engineering office on Brannan Street in SoMa, built their early competitive advantage partly on client-side deduplication. The same logic applies to government servers, with the added complication of public records retention rules.

California's Government Code requires agencies to retain certain categories of public records for a minimum of two years, and some permitting and health records must be held far longer. That means city IT staff cannot simply run a delete script. Each duplicate image has to be mapped against its retention schedule before removal, a process that slows the cleanup considerably. The Controller's Office has been developing a retention-aware deduplication protocol since at least early 2025, according to the department's published project roadmap.

The SF Public Library, which digitized more than 120,000 historical photographs through its San Francisco History Center at the main branch on Larkin Street, completed its own deduplication review in 2024 and reported reducing its digital archive footprint by roughly 18 percent — a figure that gives city tech planners a useful local baseline for what systematic cleanup can achieve.

For residents and businesses waiting on permits or accessing city services online, the practical upside of a successful deduplication effort would be faster-loading portals and fewer system outages during peak filing periods. The Department of Technology has indicated it plans to publish preliminary metrics from the current cleanup effort before the end of calendar year 2026. Those numbers, when they arrive, will be the clearest test yet of whether the city's internal IT reform push is producing results that show up somewhere beyond a PowerPoint slide.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.