The Daily San Francisco

San Francisco news, every day

News

SF's Digital Archives Are Riddled With Duplicate Images. Here's What Officials and Experts Are Saying About Fixing It.

City departments, archivists, and technology specialists are wrestling with a sprawling problem inside San Francisco's public records systems — and the pressure to act is growing.

By San Francisco News Desk · Published 4 July 2026, 1:06 pm

3 min read

SF's Digital Archives Are Riddled With Duplicate Images. Here's What Officials and Experts Are Saying About Fixing It.
Photo: Mount, Christiana Stagg, b. 1870 / Public domain (Wikimedia Commons)

San Francisco's public-facing digital infrastructure holds tens of thousands of duplicate images — redundant photographs, scanned permits, and duplicated planning documents that clog storage servers, inflate costs, and slow down records requests. The city's Department of Technology and the San Francisco Public Library's digital preservation unit have both flagged the problem in internal workflow reviews, and specialists who work with municipal data say the issue is more widespread than most residents realize.

The timing matters. The city is in the middle of a broad push to digitize neighborhood planning records — particularly in the Tenderloin and SoMa corridors, where decades of paper files from legacy zoning cases are being uploaded to the city's online public portal. That digitization sprint, accelerated after the Planning Department moved to a hybrid-remote model in 2023, has produced a wave of new uploads with inconsistent file-naming conventions and no automated deduplication layer sitting in front of the storage system.

What the City and Its Partners Are Saying

Officials at the Department of Technology have acknowledged the scope of the issue in public budget presentations to the Board of Supervisors this spring, describing duplicate image files as a contributing factor to rising cloud storage costs. The department has not published a precise dollar figure tied specifically to duplicate images, but city IT staff have pointed to overall cloud expenditure growth as a motivating factor behind a proposed data governance overhaul slated for fiscal year 2026–27.

The San Francisco Public Library, which manages the San Francisco History Center on Larkin Street, has been running a separate effort. Librarians there have been working with the Internet Archive — headquartered on Funston Avenue in the Richmond District — to clean up duplicate scans within the city's historical photograph collections. Some collections had three or four versions of the same image uploaded during different digitization campaigns going back to 2014. Archivists there describe the core challenge as one of institutional memory: each digitization project used different software, different resolution settings, and different metadata standards, making automated matching difficult.

Technology experts who consult with Bay Area municipal governments say San Francisco is not an outlier. But they argue the city's fragmented departmental structure — where the Planning Department, the Department of Building Inspection, and the Public Library each maintain separate digital asset systems — makes deduplication harder than it would be in cities with centralized document management. The lack of a single shared taxonomy across those systems means a photograph of, say, a Victorian on Ashbury Street might exist in three different databases under three different file names with no automated flag to catch the redundancy.

Pressure From the Housing Emergency

The housing production emergency declaration passed by the Board of Supervisors in late 2024 added urgency. Permit processors at 49 South Van Ness — the city's main permit center — have reported that duplicate document uploads by applicants slow down the review queue, because staff sometimes have to manually verify whether two nearly identical files represent one submission or two separate applications. The Department of Building Inspection's online portal does not currently reject duplicate uploads at the point of entry.

One proposal circulating among city technology staff calls for adopting perceptual hashing — a technique that generates a fingerprint for each image and flags near-identical files before they are written to the server. Several county governments in Los Angeles and Santa Clara have piloted versions of this approach. San Francisco's Department of Technology has not committed to a vendor or a timeline, but the proposal is expected to appear in budget discussions scheduled for August.

For residents who rely on the city's online records — whether tracking a permit on their block in the Excelsior or pulling historical documents from the History Center — the practical advice from archivists is straightforward: if you submit documents to any city portal, use a consistent file name that includes the address, date, and document type. It will not guarantee the system catches duplicates, but it gives human reviewers a fighting chance.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.