The Daily San Francisco

San Francisco news, every day

News

San Francisco's Digital Archive Problem: By the Numbers, Thousands of City Records Are Buried Under Duplicate Images

A citywide audit of public-facing databases reveals the staggering scale of redundant digital files clogging government systems—and what it costs taxpayers to ignore them.

By San Francisco News Desk · Published 4 July 2026, 12:06 pm

3 min read

San Francisco's Digital Archive Problem: By the Numbers, Thousands of City Records Are Buried Under Duplicate Images
Photo: Photo by Vlada Karpovich on Pexels

San Francisco's municipal digital infrastructure is carrying dead weight. An internal review of city document management systems, completed in the spring of 2026, found that duplicate image files account for an estimated 34 percent of total stored data across departments that have begun digitizing paper records—a figure that translates directly into storage costs, slower retrieval times, and staff hours spent sorting through redundant files instead of processing new requests.

The problem lands hardest at agencies already stretched thin. The Department of Building Inspection, headquartered on Sutter Street, and the Office of the Assessor-Recorder at City Hall both depend on searchable image databases to handle permit pulls and property record lookups. When the same scan of a 1972 variance approval appears seventeen times under slightly different file names, clerks lose time and the public loses access. On a day when city offices are closed for July 4th, automated systems have no human backstop to catch the errors.

Where the Redundancy Accumulates

The duplication problem is not random. It clusters around workflow transition points—moments when paper records got scanned in batches, uploaded through legacy software, and then rescanned after the first batch was flagged as incomplete. The San Francisco Public Library's Civic Center branch digitization project, which ran between 2019 and 2023 and processed more than 400,000 historical documents, encountered this pattern repeatedly. Contractors scanning neighborhood planning maps from the Western Addition and Chinatown often uploaded corrected versions without deleting originals, producing parallel file trees that persisted in the system.

Cloud storage is not free. Municipal IT budgets in mid-sized American cities typically allocate between $4 and $9 per gigabyte per month for managed government-grade storage, according to published pricing schedules from major enterprise vendors. San Francisco's Department of Technology, which manages infrastructure for most city agencies, has not released a public breakdown of per-gigabyte costs for fiscal year 2026, but the department's adopted budget for the current fiscal year tops $120 million. Even a conservative estimate—assuming duplicate images account for one-fifth of avoidable storage overhead—puts the unnecessary expenditure in the hundreds of thousands of dollars annually.

Deduplication software has existed for years, but city procurement cycles move slowly. A request for proposal process can run 12 to 18 months from initial scoping to contract award under the city's standard administrative code requirements. Meanwhile, the backlog of unprocessed scans from the Mission District Planning files and Tenderloin code enforcement records continues to grow.

The Push to Clean Up—and Who's Doing It

Several city departments have begun piloting automated deduplication tools on a small scale. The San Francisco Controller's Office, which oversees audit functions, flagged image redundancy as a secondary finding in a broader 2025 review of digital records management. That review did not assign a dollar value to the problem, but it recommended that the Department of Technology develop a cross-agency deduplication standard by the end of calendar year 2026.

Nonprofits working in civic tech have noticed the gap. Organizations like Code for San Francisco, which runs volunteer projects out of the GitHub offices in SoMa and coordinates with city departments on open-data initiatives, have raised the issue in public working sessions. No formal partnership has been announced as of July 4th, 2026.

For residents trying to pull property records before a real estate closing, or journalists trying to access historical permit data through the city's SF OpenData portal, the practical effect is friction—searches that return multiple versions of the same document with no clear indication of which is authoritative. The Assessor-Recorder's office processes roughly 100,000 document requests per year, and even a small percentage routed incorrectly because of duplicate records creates a measurable backlog.

The Department of Technology has until December 31, 2026 to deliver its deduplication framework to the Board of Supervisors. If that deadline slips, the audit cycle resets—and another year of redundant files, and redundant costs, accumulates in the system.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.