The Daily San Francisco

San Francisco news, every day

News

San Francisco Tackles Duplicate Digital Assets Choking City Databases

City agencies and tech nonprofits are wrestling with redundant digital assets clogging public databases, but the fix is proving far harder than anyone expected.

By San Francisco News Desk · Published 4 July 2026, 12:16 pm

3 min read

San Francisco Tackles Duplicate Digital Assets Choking City Databases
Photo: Committee on Judiciary / Public domain (Wikimedia Commons)

San Francisco's municipal digital infrastructure is drowning in duplicate images. Across city departments — from the Department of Public Works to the Planning Department's property records system — the same photographs, maps, and scanned documents are stored multiple times, consuming server space, slowing database queries, and costing taxpayers money that officials have not yet publicly quantified. The problem has moved from a bureaucratic footnote to an operational headache as agencies push to digitize more services before the end of fiscal year 2026.

The timing matters. San Francisco, like most major American cities, accelerated its digitization push during the pandemic years and never really slowed down. The Planning Department alone added tens of thousands of scanned permit documents between 2020 and 2024. When databases grow fast without a unified deduplication standard, copies multiply. The city's Department of Technology has been working since early 2025 on a citywide data governance framework, but that framework has not yet been fully adopted across all departments, according to the department's publicly posted project roadmap.

What San Francisco Is Actually Doing

The most concrete local effort is happening inside the city's DataSF program, which sits within the City Administrator's Office at 1 Dr. Carlton B. Goodlett Place. DataSF has been pushing individual departments to audit their open data portals for redundant assets since January 2026, as part of its broader Open Data Ordinance compliance work. The San Francisco Public Library's digital collections team, based at the main branch on Larkin Street in the Civic Center neighborhood, has separately implemented a hash-based deduplication tool for its digitized photograph archive — a collection that, as of last year, exceeded 200,000 items.

The nonprofit Code for San Francisco, which meets regularly at GitHub's former SoMa offices and coordinates volunteer civic technologists, has flagged duplicate image storage as one of three priority issues for its 2026 project cohort. Volunteers there have been auditing publicly accessible city image datasets and documenting where the same file appears under multiple identifiers — a problem that sounds trivial until you realize it affects the accuracy of property search tools that residents and real estate professionals use daily.

How London and Tokyo Are Handling the Same Problem

Other global cities are further along. Transport for London began a formal deduplication program for its CCTV and asset-management image libraries in 2023, working with a third-party vendor to reduce redundant storage across more than 600 stations and road monitoring points. The Greater London Authority has published case studies showing measurable reductions in cloud storage costs after the first year, though the specific savings figures are internal. The key difference: London assigned a single cross-agency data steward role with authority over all participating departments — a governance structure San Francisco has discussed but not yet implemented.

Tokyo's approach is more decentralized but arguably more rigorous at the agency level. The Tokyo Metropolitan Government mandated in April 2024 that all new digitization projects include a deduplication audit as a procurement requirement. That means vendors bidding on scanning contracts must demonstrate their workflow prevents duplicate file creation before the job starts, rather than cleaning up afterward. San Francisco has no equivalent procurement requirement on the books as of July 2026.

The practical stakes for San Francisco residents are not abstract. Duplicated images in the Planning Department's permit portal have contributed to search result confusion, where the same building photograph surfaces under different permit numbers, making it harder for neighbors to track project histories in dense neighborhoods like the Mission District and the Richmond. Fixing that requires not just a technical tool but an agreement among departments about who owns a canonical image record — and that governance fight is the real bottleneck.

DataSF is expected to publish an updated data quality report later this summer, which will include a section on image deduplication progress across participating departments. Residents and developers who rely on city databases daily should watch that report closely. If San Francisco can adopt even a modified version of Tokyo's procurement-side requirement before the next major digitization contract goes out to bid — likely in the fall budget cycle — it would prevent the problem from getting worse, even if it doesn't immediately solve what already exists.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.