The Daily San Francisco

San Francisco news, every day

News

How San Francisco's Digital Archives Ended Up Full of the Same Image Twice — and What It's Costing the City

A slow accumulation of redundant files across city departments has quietly ballooned storage costs and tangled public records systems for years.

By San Francisco News Desk · Published 4 July 2026, 12:00 pm

3 min read

How San Francisco's Digital Archives Ended Up Full of the Same Image Twice — and What It's Costing the City
Photo: Photo by Tom Fisk on Pexels

San Francisco's municipal digital infrastructure is carrying a hidden weight: thousands of duplicate images buried inside city databases, department servers, and public-facing portals — the product of more than a decade of siloed record-keeping, rushed digitization projects, and the absence of any citywide standard for how photographs and scanned documents get filed, named, or verified before upload. The problem has reached a point where several departments are now actively auditing their holdings and, in some cases, contracting outside vendors to run deduplication sweeps.

The timing matters. The city's Department of Technology has been pushing since early 2025 to consolidate municipal cloud storage contracts under a single framework, partly in response to ballooning annual licensing costs. When agencies independently store redundant assets — the same permit photo appearing five times across three systems, or a property scan duplicated between the Planning Department on South Van Ness Avenue and the Assessor-Recorder's office at City Hall — those bytes add up into real dollars. Storage waste is not an abstraction; it shows up in renewal invoices.

How the Duplication Built Up Over Time

The roots go back to the early 2010s, when San Francisco made a major push to digitize paper records held in physical archives across the city. That effort was decentralized by design — each department managed its own scanning contracts, its own vendor relationships, its own naming conventions. The Recreation and Parks Department, the Public Works bureau, and the Human Services Agency each built their own document management workflows with minimal cross-talk. Files migrated between platforms when departments upgraded software, and each migration created opportunities for copies to propagate without anyone noticing or caring enough to clean house.

The city's shift toward cloud storage accelerated the problem rather than solving it. When departments moved onto platforms like Microsoft Azure and Google Cloud between 2018 and 2022, legacy files were often bulk-uploaded rather than catalogued. A 2023 internal review by the Controller's Office — whose findings were summarized in a publicly available budget report — flagged redundant digital asset storage as a contributor to unanticipated IT cost overruns across at least four major departments. The Controller's Office did not specify a dollar figure for image duplication specifically, but the broader storage inefficiency finding prompted the Department of Technology to begin scoping a remediation program.

The San Francisco Public Library's digitization program, which covers historical photographs held at the main branch on Larkin Street in the Civic Center, encountered its own version of the issue. Librarians discovered that batches of photographs from the Western Neighborhoods Project and other community archiving efforts had been uploaded multiple times as volunteers and staff worked in parallel without a shared asset registry. The library has been working since 2024 to reconcile those holdings, according to publicly posted project documentation on its digital collections page.

What Deduplication Actually Involves — and What Comes Next

Replacing or removing a duplicate image is not as simple as hitting delete. In government systems, images are often attached to records — a building permit, an incident report, a park maintenance log — meaning a duplicate file may be the only copy linked to a particular entry in a database. Delete the wrong instance and you break the record chain. The correct approach requires first identifying which copy is the canonical version, then updating all database references to point to that single file before the redundant copies are safely purged. That process, applied at scale, requires either significant staff time or a third-party tool capable of running hash-matching algorithms across disparate storage environments.

Several city departments are expected to include deduplication line items in their fiscal year 2026-27 budget requests, which go before the Board of Supervisors this fall. For residents and businesses that interact with city permitting and records systems — particularly in high-volume neighborhoods like SoMa, the Mission, and the Tenderloin, where development and social services generate large document loads — a cleaner backend should eventually mean faster response times on records requests filed under the California Public Records Act. The practical payoff will take time, but the audits now underway represent the first systematic attempt to count the cost of years of uncoordinated growth.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.