The Daily San Francisco

San Francisco news, every day

News

How San Francisco's Digital Archives Ended Up Drowning in Duplicate Images — and What the City Is Doing About It

Decades of siloed city departments, rushed digitization drives, and no shared standards left San Francisco's public image libraries bloated, redundant, and increasingly unusable.

By San Francisco News Desk · Published 4 July 2026, 11:58 am

3 min read

How San Francisco's Digital Archives Ended Up Drowning in Duplicate Images — and What the City Is Doing About It
Photo: Photo by Brett Sayles on Pexels

San Francisco's municipal digital archives contain hundreds of thousands of photographs, renderings, and scanned documents — and a significant portion of them are exact or near-exact duplicates. The problem, which city technology staff have been quietly wrestling with since at least 2022, reached a formal tipping point earlier this year when the Department of Technology's Digital Services division flagged it as a barrier to the city's broader open-data modernization push.

This matters right now because San Francisco is mid-rebuild. The city is overhauling its public-facing data infrastructure under a multi-year digital equity initiative tied to the Mayor's Office of Housing and Community Development, which manages visual records for hundreds of affordable housing projects stretching from the Tenderloin to Bayview-Hunters Point. When those records are cluttered with duplicates — the same construction photo uploaded four times under different filenames across four different departmental servers — staff waste hours on every project cycle, and the public portals that residents rely on serve up broken or redundant results.

How the Duplication Problem Grew

The roots go back to the early 2000s, when individual departments digitized their paper records independently, with no centralized file-naming convention and no interoperability requirement. The San Francisco Public Library's San Francisco History Center on Larkin Street, the Planning Department on Mission Street, and the Recreation and Parks Department all ran separate scanning programs, often contracting different vendors who delivered files in incompatible formats. A single photograph of, say, Dolores Park taken from city files during a 2008 master plan review might exist as a TIFF on one server, a JPEG on another, and a compressed PNG on a third — each treated as a distinct record.

The problem compounded during the pandemic. Between 2020 and 2022, remote-work mandates pushed dozens of city employees to download, re-upload, and reshare image files through personal drives, Microsoft SharePoint folders, and departmental Dropbox accounts that were later partially migrated back into official systems. The Department of Technology's own audit, completed in early 2023, found that storage costs for the city's unmanaged image repositories had grown substantially, with redundant files identified across at least 14 separate departmental systems. That audit is a public record available through the city's Sunshine Ordinance process, though the specific dollar figures remain under administrative review.

DataSF, the city's open data portal headquartered at City Hall, has been trying to impose order. Since 2019, DataSF has published metadata standards for city datasets, but image files — as opposed to spreadsheets and databases — fell largely outside those early frameworks. The office began piloting a duplicate-detection workflow in late 2024 using hash-matching software, a technique that compares cryptographic signatures of files to identify identical copies regardless of filename. The pilot covered the Planning Department's environmental review image library first, with the eventual goal of expanding to all city-controlled repositories.

Where Things Stand Today

Progress is real but uneven. The Planning Department's library has been partially deduplicated, freeing up server space and making the EIR — Environmental Impact Report — image databases more searchable for neighborhood groups in areas like the Mission and SoMa that track development closely. The San Francisco Public Utilities Commission, which maintains thousands of infrastructure photographs from projects along the Hetch Hetchy system and local treatment plants, has not yet been brought into the unified deduplication workflow.

For residents and journalists who rely on public records, the practical advice is this: if you request image files from any San Francisco city department right now, ask specifically whether the materials have been through the DataSF deduplication review. If not, the file set you receive may include significant redundancy, which means larger downloads, slower searches, and the real possibility that version-controlled images — photos taken on different dates that look nearly identical — have been collapsed into one, losing the timestamp history.

The city's Digital Services team has indicated that a citywide image-governance policy is expected before the end of fiscal year 2026-27. Until that framework is adopted, each department remains its own island — a situation that has cost the city both money and institutional memory since long before anyone thought to count the duplicates.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.