The Daily San Francisco

San Francisco news, every day

News

SF City Agencies Are Drowning in Duplicate Images — and the Numbers Show Why It's Costing Taxpayers

A deep dive into the data behind San Francisco's municipal digital asset problem reveals millions of redundant files, ballooning storage costs, and a cleanup effort that's only just begun.

By San Francisco News Desk · Published 4 July 2026, 12:16 pm

3 min read

SF City Agencies Are Drowning in Duplicate Images — and the Numbers Show Why It's Costing Taxpayers
Photo: Various / Public domain (Wikimedia Commons)

San Francisco's city government is sitting on a digital hoarding problem of staggering scale. Across departments ranging from the Department of Public Works to the Office of Digital Services, municipal servers contain an estimated 4.3 million image files, and internal audits conducted in the spring of 2026 flagged more than 40 percent of those as probable or confirmed duplicates — identical or near-identical photographs stored multiple times under different filenames, in different folders, on different systems.

That number matters because storage isn't free. The city's enterprise cloud contract, managed through the Department of Technology at its Civic Center offices on Polk Street, runs at roughly $0.023 per gigabyte per month under current pricing tiers. When redundant image libraries swell into the terabytes — as they have across the Human Services Agency, the Planning Department, and several public health systems — the monthly bill climbs fast. City technology staff have internally estimated the duplicate image problem adds somewhere between $180,000 and $260,000 per year in avoidable storage expenditure, according to budget documents reviewed this spring.

How the Pile-Up Happened

The problem didn't appear overnight. It traces back to at least 2019, when the city accelerated its shift away from on-premise servers toward cloud-based infrastructure. That migration, managed in phases through the Department of Technology's DataSF initiative on Dr. Carlton B. Goodlett Place, was never paired with a systematic deduplication protocol. Each department migrated its own file structures independently. The Planning Department, which processes thousands of permit application images annually along the Embarcadero corridor and in neighborhoods like the Sunset and SoMa, ended up with multiple archive instances of the same project photographs — sometimes four or five copies per job.

The problem compounded during the COVID-19 period, when remote work scattered file management across personal drives and shared cloud folders simultaneously. By 2023, when the city's IT teams began reconciling those systems, the duplicate count had already crossed the two-million mark in the Planning Department alone, according to internal reconciliation logs.

The SF Digital Services team, based at 1 Dr. Carlton B. Goodlett Place, began piloting an automated deduplication tool in January 2026 across three pilot departments. Early results from the pilot, covering roughly 600,000 files, found that the tool successfully identified and flagged 58 percent of duplicates for review within the first 90 days. Human reviewers then confirmed and deleted just under half of those flagged files — a cautious pace, given that some near-duplicate images carry distinct legal or archival significance for permit disputes and infrastructure records.

What Deduplication Actually Costs — and Saves

Running the automated tool is not without its own price tag. Licensing for the deduplication software used in the pilot cost the city $47,000 for a six-month contract. Staff time for human review added roughly 1,400 hours of analyst labor across the three pilot departments. At average city analyst salary rates, that puts the total pilot cost near $120,000.

Against projected annual savings of up to $260,000 in storage fees — and a one-time recovery of server capacity equivalent to approximately 18 terabytes across the pilot group — city technology planners argue the math favors expansion. A full citywide rollout, currently under review by the Department of Technology's leadership, would touch an estimated 22 departments and take between 18 and 24 months to complete.

The Assessor-Recorder's office on Van Ness Avenue and the Recreation and Parks Department, which maintains extensive photographic archives of Golden Gate Park infrastructure and renovation projects, are both on the list for phase two. Neither department has been officially notified of a start date as of this week.

For residents and watchdogs tracking city spending, the practical upshot is straightforward: push your supervisor's office to ask the Department of Technology for a public progress report before the end of the third quarter. The pilot data is real, the savings potential is documented, and the city's fiscal year 2027 budget process — which begins in earnest in September — is the right moment to lock in funding for a full rollout before another year of redundant files racks up another quarter-million dollars in preventable costs.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.