The Daily San Francisco

San Francisco news, every day

News

How San Francisco's Digital Archives Became a Maze of Duplicate Images — and Who's Paying to Fix It

Years of rushed digitization projects, siloed city departments, and pandemic-era remote work created a sprawling mess of redundant files that now costs the city real money and real time to untangle.

By San Francisco News Desk · Published 4 July 2026, 12:16 pm

3 min read

How San Francisco's Digital Archives Became a Maze of Duplicate Images — and Who's Paying to Fix It
Photo: Photo by Hannibal Photography on Pexels

San Francisco's municipal digital infrastructure is carrying a hidden weight: tens of thousands of duplicate image files scattered across city department servers, cloud storage accounts, and legacy databases — the accumulated byproduct of more than a decade of poorly coordinated digitization efforts. The Department of Technology, which oversees the city's enterprise IT systems from its offices on Seventh Street, has been working since late 2025 to audit and systematically remove redundant files as part of a broader data governance push. But cleaning up the mess requires understanding how it got there in the first place.

The problem matters now because San Francisco is mid-stream on several major digital transformation projects — including a long-overdue overhaul of the Planning Department's permit portal and updated records systems at the San Francisco Public Library's San Francisco History Center at the Main Branch on Larkin Street. Duplicate image files slow retrieval times, inflate cloud storage costs, and introduce errors when automated systems pull the wrong version of a scanned document. For a city already under fiscal pressure, the inefficiency carries a real dollar cost.

A Problem Built Over Years of Siloed Decisions

The roots go back to roughly 2012, when multiple city departments began independent digitization drives with little central coordination. The San Francisco Municipal Transportation Agency scanned maintenance logs and route maps. The Department of Building Inspection converted paper permit records. The Public Library began archiving historical photographs. Each department procured its own storage contracts and image management software, and none of them used a unified naming convention or metadata standard. The same document, scanned on different occasions, frequently ended up stored under different filenames in different systems — sometimes three or four copies deep.

The pandemic accelerated the chaos. When roughly 35,000 city employees shifted to remote work beginning in March 2020, departments leaned on consumer-grade tools — Google Drive folders, Dropbox shares, personal external drives — to keep operations moving. Files migrated informally and were duplicated again when workers uploaded local copies back to official servers as offices reopened. The Department of Technology's own 2024 internal audit, cited in budget documents submitted to the Board of Supervisors that fall, flagged unstructured data redundancy as among the top five drivers of unnecessary cloud expenditure across city systems.

The San Francisco City Attorney's Office and the Controller's Office both maintained separate image repositories for scanned legal filings and financial records respectively, and cross-departmental projects — such as the Tenderloin Emergency Intervention Program's documentation requirements — generated files that were stored simultaneously by multiple agencies with no reconciliation process in place.

What a Fix Actually Looks Like

Resolving the problem is less a technology challenge than an organizational one. Deduplication software can identify bit-identical files automatically, but many duplicates in San Francisco's case are near-identical rather than exact — the same document scanned at different resolutions, or slightly different crops of the same archival photograph. Those require human review or more sophisticated perceptual hashing tools that carry a licensing cost.

The Department of Technology began piloting a centralized Digital Asset Management platform in January 2026, initially covering the Planning Department and the Office of the City Administrator. Full city-wide rollout is projected for mid-2027. Until then, individual departments are being asked to designate a records liaison responsible for quarterly duplicate reviews — an unfunded mandate that several smaller departments have quietly pushed back on, according to budget hearings held at City Hall in April 2026.

For San Francisco residents and businesses who interact with city digital systems — pulling a building permit from the SFDBI portal, accessing a historical photograph through the Public Library's online catalog, or tracking a Planning Commission case file — the practical advice is straightforward: if a document download fails or returns an outdated version, request a fresh copy directly from the issuing department rather than relying on cached results. The cleanup is underway, but the archive is still noisy, and the city has said publicly it expects the full reconciliation process to run well into 2028.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.