The Daily San Francisco

San Francisco news, every day

News

San Francisco's Digital Archives Are Riddled With Duplicate Images — And the City Is Finally Counting Them

A growing data-cleaning push across city departments reveals just how much storage, money, and staff time redundant image files are quietly consuming.

By San Francisco News Desk · Published 4 July 2026, 11:44 am

3 min read

San Francisco's municipal digital infrastructure is carrying dead weight — billions of duplicate image files spread across city servers, cloud buckets, and aging network drives that collectively cost taxpayers measurable sums every fiscal year. A coordinated audit effort underway since January 2026 across at least four city departments is putting hard numbers to a problem that IT managers have long flagged but rarely quantified.

The timing matters. The city is under pressure to cut operating costs after the Mayor's Office of Public Policy and Finance projected a $790 million general fund shortfall through fiscal year 2027-28. Every data center dollar is suddenly a political question, and duplicate image storage — long treated as a low-priority housekeeping issue — has climbed onto the agenda of department heads who once ignored it.

What the Numbers Actually Show

The Department of Technology, headquartered at 1 South Van Ness Avenue, shared preliminary findings from its internal audit at a June 18 budget hearing. Across the five departments it audited — including the Planning Department and the Office of the Assessor-Recorder — an estimated 34 percent of all stored image assets were flagged as exact or near-exact duplicates by automated deduplication software deployed in the first quarter of 2026. That figure, drawn from an inventory of roughly 11 terabytes of scanned permit records, property photographs, and GIS map exports, does not include the San Francisco Police Department or the Department of Public Health, which maintain separate storage contracts.

Storage costs in San Francisco's cloud environment are not trivial. The city's master services agreement with its primary cloud vendor prices hot-tier storage — the kind needed for frequently accessed permit and inspection images — at approximately $0.023 per gigabyte per month. At that rate, carrying even one redundant terabyte costs the city roughly $276 annually. Multiply that across eleven terabytes of confirmed duplicates and the annual waste figure approaches $3,000 — modest on its face, but city IT staff say the audit covered less than 15 percent of total municipal image storage. Extrapolated across all departments, the figure could exceed $20,000 per year in direct storage fees alone, before accounting for staff time spent cataloging, retrieving, and backing up files that shouldn't exist.

The Planning Department on Stevenson Street has been particularly affected. Planners routinely scan building permit applications, and the department's workflow historically created automatic backup copies at three separate stages of document processing. A single TIFF-format floor plan, which can run 8 to 12 megabytes, could therefore appear on city servers three or four times with no deduplication check in place. The department processed more than 22,000 permit applications in 2024 alone, according to figures published in its annual report.

The Human Cost Inside City Hall

Beyond storage fees, the real drain is labor. Staff at the Office of the Assessor-Recorder at City Hall have spent an estimated 400 collective staff-hours since February manually reviewing flagged duplicates before deletion — a safeguard required under the city's record-retention policy, which prohibits automated deletion of files without human sign-off for certain document categories. At an average fully loaded hourly cost of roughly $85 for a mid-grade city employee, that review work alone represents about $34,000 in personnel costs for a single department over five months.

The deduplication software now being tested, sourced through a contract with a vendor already on the city's preapproved technology roster, uses perceptual hashing to identify near-duplicate images — meaning it catches files that have been slightly resized, recompressed, or watermarked rather than only exact byte-for-byte copies. That capability is important for municipal archives, where scanned documents are frequently re-scanned at different resolutions.

Department of Technology staff told the budget committee that a phased rollout of deduplication tooling to all 26 city departments could be completed by the end of fiscal year 2026-27, provided the program receives continued funding in the supplemental budget ordinance expected before the Board of Supervisors in September. Departments that want to participate earlier can request prioritization by filing a service request through the citywide IT portal — a step several departments along the Civic Center corridor have already taken.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.