The Daily San Francisco

San Francisco news, every day

News

SF City Hall's Digital Archive Push Hits a Snag: Thousands of Duplicate Images Clog the System

A citywide effort to digitize public records and planning documents has exposed a messy legacy of redundant files, forcing departments to rethink how they store and surface visual data.

By San Francisco News Desk · Published 4 July 2026, 12:00 pm

3 min read

SF City Hall's Digital Archive Push Hits a Snag: Thousands of Duplicate Images Clog the System
Photo: Photo by Josh Hild on Pexels

San Francisco's Department of Technology flagged a significant backlog this week in the city's ongoing effort to clean up its centralized digital asset library — a problem rooted in years of siloed file management across agencies that uploaded the same images, scans, and permit photographs multiple times without a unified deduplication protocol in place.

The issue surfaced publicly during a July 2 presentation to the city's Committee on Information Technology, where staffers described a content management system carrying an estimated 40 percent redundancy rate across shared image repositories used by the Planning Department, the Department of Building Inspection, and the Office of Economic and Workforce Development. That figure, drawn from an internal audit completed in late June 2026, prompted an immediate call for a remediation sprint before the city's next fiscal year begins in full swing.

Why does it matter now? San Francisco is mid-stride through a housing production emergency that has put enormous pressure on the Planning Department's permit processing pipeline. Duplicate images — the same building facade photograph filed three times under different case numbers, for instance — slow down the document retrieval tools that planners at 49 South Van Ness Avenue rely on daily. With the city aiming to approve thousands of new units under state-mandated housing element targets, any drag on back-office efficiency carries real costs.

The Local Paper Trail

The problem is not abstract. At the Mission Bay campus of the city's DataSF program, staff have been manually flagging duplicate entries in the open data portal since January 2026, focusing initially on permit inspection photos tied to construction sites in the Dogpatch and Central SoMa neighborhoods — two of the most active building corridors in the city right now. The deduplication work there is separate from, but directly related to, the broader city system audit.

The San Francisco Public Library's digital collections team, based at the main branch on Larkin Street in the Civic Center, ran into an earlier version of the same problem in 2023 when it migrated historic photograph collections to a cloud-based platform. Librarians found that roughly one in five images had been ingested more than once, sometimes with conflicting metadata tags that made searches return the wrong results. That project took eight months to fully resolve and cost the library system an additional $215,000 in contractor hours beyond the original migration budget, according to budget documents filed with the Board of Supervisors at the time.

City technology staff say the current interdepartmental problem is larger in scale. The shared image repository in question holds more than 1.2 million files as of the June audit, and the remediation team — six staff members pulled from the Department of Technology's applications division — expects to work through the backlog in phases, starting with Planning Department assets, through the end of September 2026.

What Comes Next for Agencies and the Public

The city is evaluating two commercial deduplication tools already used in comparable municipal systems — one favored by the Department of Building Inspection's IT lead and another championed by staff at the Office of Civic Innovation on Golden Gate Avenue. A contract award is expected before August 1, with a price ceiling of $180,000 set in the current departmental budget. Whichever platform wins the bid will need to integrate with Accela, the permit-tracking software San Francisco has used for building and planning workflows since 2017.

For residents navigating the city's planning portal to track a project on their block, the practical effect of the cleanup should be faster load times and fewer dead-end image links when pulling up permit case files. The Planning Department has said it intends to publicize a status update on the deduplication effort through its public newsletter in August.

The broader lesson city technologists are drawing from this week's review is straightforward: without a shared metadata standard enforced at the point of upload, the redundancy problem will rebuild itself. A draft policy requiring hash-based duplicate checking before any image is accepted into the centralized repository is currently in comment review, with a target adoption date of October 1, 2026.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.