The Daily San Francisco

San Francisco news, every day

News

San Francisco's Digital Archives Are Riddled With Duplicate Images — And the Numbers Are Staggering

A quiet data crisis in the city's public records systems is costing agencies time and storage dollars, and the scale of the problem is only now becoming clear.

By San Francisco News Desk · Published 4 July 2026, 11:48 am

3 min read

San Francisco's Digital Archives Are Riddled With Duplicate Images — And the Numbers Are Staggering
Photo: Photo by Vision plug on Pexels

San Francisco's municipal agencies are sitting on a growing backlog of duplicate digital images embedded in public-facing databases, internal records portals, and government websites — and early audits suggest the problem runs deeper than anyone publicly acknowledged until this year. Across departments from the Planning Commission to the Department of Public Works, redundant image files are consuming server capacity, slowing records retrieval, and inflating the city's cloud storage bills at a moment when the budget is already stretched thin.

The timing matters. San Francisco is mid-way through a $1.2 billion technology modernization contract cycle that was supposed to digitize legacy paper records and streamline public access to city data. Instead, departments that rushed to scan and upload physical documents — particularly during the 2020-2022 period, when remote operations forced a crash digitization effort — ended up ingesting the same images multiple times. The Department of Technology has identified duplicate image rates as high as 34 percent in some departmental archives, according to internal review documentation circulated to the city's IT Steering Committee earlier this year.

The Numbers Behind the Storage Bills

Cloud storage isn't free, and San Francisco's bill reflects the redundancy problem. The city's enterprise data infrastructure, managed partly through agreements with vendors servicing the Civic Center complex on Dr. Carlton B. Goodlett Place, is running over capacity estimates set in 2023. Duplicate image files — meaning identical or near-identical image assets stored under different filenames or in separate folder hierarchies — account for a disproportionate share of that overrun. Industry benchmarks suggest that in unmanaged government archives, duplicate and near-duplicate files typically represent between 20 and 40 percent of total image storage volume.

The San Francisco Public Library's digital collections portal, which serves researchers accessing records through the main branch on Larkin Street, flagged the issue internally after a 2025 migration of its historical photograph archive to a new content management system. Librarians discovered that roughly 18,000 image records had been uploaded at least twice, and in some cases four or five times, during successive data transfers. Cleaning that archive alone took six weeks of staff time.

The Department of Building Inspection, which maintains permit-related photograph records tied to properties across neighborhoods from the Sunset District to SoMa, faces a similar challenge at much larger scale. Permit photos uploaded through the city's Accela permitting platform are sometimes re-submitted by contractors who receive error messages during upload, generating duplicate files without any automated deduplication check on the back end.

What Deduplication Actually Costs — and What It Saves

Fixing this is neither free nor simple. Deduplication software licenses for enterprise-scale archives run between $40,000 and $120,000 annually depending on the volume of data processed, according to published pricing from vendors such as Cloudian and Veritas. Manual review by data staff is cheaper per hour but slower, and San Francisco's Department of Technology is already operating with a hiring freeze on several classifications as of March 2026.

There's a practical upside, though. Agencies that have completed deduplication projects — including the San Francisco Municipal Transportation Agency, which cleaned its internal media asset library in late 2024 — report storage cost reductions of 20 to 28 percent in the affected archives. For SFMTA, which manages image records tied to everything from Muni line documentation to traffic camera stills, that translated to a measurable reduction in annual cloud storage expenditure.

The Board of Supervisors' Budget and Finance Committee is expected to take up the broader technology modernization review in September 2026. Advocates for open government, including groups that regularly submit California Public Records Act requests through the City Attorney's portal on Polk Street, say the duplicate problem has real-world consequences: search results return redundant records, slowing down responses that are already subject to 10-day statutory deadlines under state law.

For city residents trying to pull permit histories on a Mission District property or access digitized maps from the early 20th century, the practical fix is to file requests that specify file dates or reference numbers rather than relying on keyword image searches — at least until the city's deduplication backlog gets worked through. That could take the better part of 2027, based on the Department of Technology's current project timeline.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.