The Daily San Francisco

San Francisco news, every day

News

SF's Duplicate Image Problem: What Happens Next and the Key Decisions Ahead

City agencies and nonprofits racing to digitize San Francisco's archives face a critical fork in the road over how to handle thousands of redundant image files clogging aging databases.

By San Francisco News Desk · Published 4 July 2026, 12:16 pm

3 min read

SF's Duplicate Image Problem: What Happens Next and the Key Decisions Ahead
Photo: Photo by Vision plug on Pexels

San Francisco's push to modernize its public records infrastructure has hit a concrete obstacle: duplicate images. Across city departments — from the Planning Department's permit files on Van Ness Avenue to the Department of Public Health's community clinic records — redundant digital files have multiplied inside storage systems at a rate that administrators say is no longer manageable without a formal policy decision. The question now is who pays to fix it, who decides what gets deleted, and whether the city's open-data commitments survive the cleanup intact.

The timing matters. San Francisco is midway through a broader digital transformation effort tied to the Mayor's Office of Civic Innovation, which has been pushing agencies since 2024 to migrate legacy paper records into cloud-based systems. That migration exposed the duplicate problem in acute form. When contractors scanned historical building permits and environmental health inspections, automated ingestion tools frequently created multiple copies of the same image with different file names. Over months, those redundancies compounded. Storage costs climbed. Search results became unreliable. Archivists found themselves spending hours confirming whether two files were genuinely identical or subtly different versions of the same document.

Where the Bottlenecks Are Forming

Two institutions are particularly exposed. The San Francisco Public Library's San Francisco History Center, housed in the Civic Center main branch on Larkin Street, manages tens of thousands of digitized photographs and ephemera. Staff there have been piloting deduplication software from a vendor used by several other municipal libraries on the West Coast, but a formal procurement decision has stalled inside the City Administrator's Office since early 2026. Meanwhile, the SF Planning Department, which processes thousands of permit applications annually at its offices at 49 South Van Ness, has accumulated image duplicates across both its legacy Accela system and a newer cloud environment brought online in late 2024. Merging those two databases without losing audit trails requires legal sign-off that has not yet materialized.

The financial stakes are not trivial. Municipal cloud storage is not free, and city budget documents presented to the Board of Supervisors' budget committee in June 2026 identified uncontrolled file growth as a contributing factor to IT cost overruns. Deduplication tools capable of handling large image libraries — with built-in hash-matching and metadata preservation — typically run between $40,000 and $120,000 for an enterprise municipal license, according to pricing tiers published by vendors in the govtech procurement space. Doing nothing is also expensive: analysts familiar with municipal IT work estimate that unchecked storage growth at current rates adds meaningful cost to annual contracts.

Decisions That Can't Wait Much Longer

Three choices are coming to a head before the end of summer. First, the City Administrator's Office must decide whether to issue a citywide duplicate-image policy or leave each department to solve the problem independently — a fragmented approach that would likely produce inconsistent results and complicate future data-sharing between agencies like the Department of Building Inspection and the Fire Department. Second, the Planning Department needs to determine whether its deduplication run will be audited by a third party before any files are permanently removed, given that permit images can carry legal weight in enforcement proceedings in neighborhoods like the Mission and the Tenderloin. Third, the Public Library's History Center faces a funding gap: its current fiscal-year allocation does not cover a full vendor license, meaning it may need to apply for a Digital Equity grant through the California State Library by the September 2026 deadline or defer the project another year.

Community groups that rely on the city's open-data portal — including civic tech organizations based in SoMa and Dogpatch that use Planning Department data for housing research — have a direct stake in how these decisions land. Poorly executed deduplication can quietly break dataset links that outside developers depend on, effectively severing access to public records without any public notice. Advocates are urging the Mayor's Office of Civic Innovation to publish a decision timeline before the Board of Supervisors returns from recess in August. The window to get this right, before the next wave of document digitization contracts launches in the fall, is closing fast.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.