The Daily San Francisco

San Francisco news, every day

News

SF's Digital Records Push Hits a Snag: Thousands of Duplicate Images Are Cluttering City Databases

Officials, archivists and technology advisers are debating how to fix a sprawling duplicate-image problem that's slowing San Francisco's ambitious municipal digitization effort.

By San Francisco News Desk · Published 4 July 2026, 11:28 am

3 min read

SF's Digital Records Push Hits a Snag: Thousands of Duplicate Images Are Cluttering City Databases
Photo: Ward, Herbert, 1863-1919 / Public domain (Wikimedia Commons)

San Francisco's push to digitize decades of municipal records has run into a stubborn, unglamorous obstacle: duplicate images. City technology staff and outside advisers say redundant photo files, scanned documents and map tiles are consuming significant server capacity across multiple departments, slowing retrieval times and complicating the archival work that underpins everything from housing permit reviews to Muni infrastructure planning.

The problem surfaced visibly this spring when the Department of Technology flagged storage inefficiencies during a broader audit of the DataSF platform, the citywide open-data portal managed out of City Hall. Archivists at the San Francisco Public Library's San Francisco History Center on Larkin Street, which has been digitizing its photo collection since 2019, and staff at the Planning Department's offices on Mission Street have both reported backlogs tied partly to deduplication backlogs in shared cloud storage environments.

Why This Matters Right Now

The timing is not accidental. San Francisco's Controller's Office has been pressing departments to consolidate cloud contracts ahead of a fiscal year 2026-27 budget cycle that, under Mayor Daniel Lurie's administration, is expected to demand cuts across non-essential operational spending. Bloated storage bills are a visible, fixable target. City IT procurement records show that municipal cloud storage costs have risen sharply over the past three years as departments independently uploaded legacy materials without coordinating deduplication protocols.

Technology policy advisers working with the city's Committee on Information Technology — known as COIT — have argued publicly at recent board meetings that the absence of a unified image-management standard is the root cause. Without a single deduplication tool or workflow applied consistently, the same scanned permit document can exist in four or five separate department repositories simultaneously, each copy accruing storage costs and making keyword search less reliable for the planners and inspectors who rely on those systems daily.

The San Francisco Public Utilities Commission, which manages its own extensive infrastructure image library covering everything from Hetch Hetchy reservoir survey photos to sewer-line inspection footage, has reportedly been piloting an automated deduplication script since March. That pilot, run through the SFPUC's information systems division at its Embarcadero headquarters, is the closest thing the city currently has to a working model, though it has not been formally adopted citywide.

What Experts and Officials Are Recommending

Experts in municipal records management say San Francisco's challenge is common among large American cities that digitized aggressively during the pandemic without standardizing file-naming conventions or metadata schemas. The practical fix, they say, involves three steps: auditing existing repositories to establish a full duplicate count, selecting a deduplication engine compatible with the city's existing Microsoft Azure and Google Cloud environments, and enforcing upload protocols going forward through departmental IT liaisons.

The San Francisco chapter of the Society of American Archivists has flagged the issue in communications to the city's Office of the City Administrator, arguing that unchecked duplication risks are not only financial but archival — when multiple slightly different versions of the same image exist, determining which is authoritative becomes genuinely difficult, particularly for historical documents that may have been scanned at different resolutions across different decades.

COIT is scheduled to take up a formal deduplication policy proposal at its next quarterly meeting, expected in September 2026. If approved, the policy would require all city departments to run new uploads through a hash-based duplicate checker before files are committed to shared storage — a standard practice in private-sector data management that has been slow to reach municipal government.

For San Francisco residents, the practical stakes are clearest at the neighborhood permit level. Homeowners in districts like the Outer Sunset and the Mission who have waited weeks for Planning Department decisions on renovation applications have sometimes been told delays stem partly from staff struggling to locate the correct version of a scanned site plan amid cluttered digital folders. A cleaner, deduplicated repository would not solve every permit delay, but technology staff say it would materially reduce one recurring source of friction — and potentially shave days off routine approvals in a city where housing production is already running well behind targets.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.