The Daily San Francisco

San Francisco news, every day

News

SF City Agencies Push to Fix a Hidden Digital Problem: Duplicate Images Flooding Public Records Systems

A quiet but costly data-management headache is forcing San Francisco departments to overhaul how they store, tag, and retrieve visual files — and this week brought new pressure to act.

By San Francisco News Desk · Published 4 July 2026, 12:06 pm

3 min read

SF City Agencies Push to Fix a Hidden Digital Problem: Duplicate Images Flooding Public Records Systems
Photo: Photo by Stephen Leonardi on Pexels

San Francisco's Department of Technology confirmed this week that a citywide audit of municipal digital asset libraries found thousands of duplicate image files spread across at least six departmental servers, creating storage redundancies that are complicating records requests, slowing public-facing web portals, and adding unnecessary cost to cloud contracts the city renewed earlier this year. The audit, completed in late June and now circulating among department heads at City Hall on Van Ness Avenue, identified the problem as particularly acute inside the Planning Department and the Department of Public Works.

The timing matters. San Francisco has spent the better part of 2025 and early 2026 consolidating its technology infrastructure under a broader digital modernization push, partly in response to persistent criticism over how slowly city agencies respond to California Public Records Act requests. Duplicate files — the same photograph of a Mission District sidewalk repair or a Tenderloin building permit site stored under four different file names across three different servers — make keyword searches unreliable and force staff to manually verify which version of an image is current and authoritative.

What Happened This Week

The most concrete development came Tuesday, when the city's DataSF program — housed within the Department of Technology at 1 Dr. Carlton B. Goodlett Place — issued internal guidance to agencies recommending adoption of perceptual hashing, a technique that generates a unique fingerprint for each image so that near-identical duplicates can be flagged automatically. The guidance stops short of a mandate but sets a 90-day voluntary compliance window, with a formal policy review scheduled for October 2026.

Separately, the San Francisco Public Library's digital collections team at the main branch on Larkin Street confirmed it completed its own deduplication pass on roughly 140,000 historical photographs in the San Francisco History Center archive. Librarians there ran the collection through open-source deduplication software over five weeks ending June 28. The process collapsed what had been catalogued as 140,000 discrete image records down by an estimated 11 percent, freeing storage space and making the public search interface — used by genealogists, journalists, and architects researching historic structures — meaningfully faster.

The library result is being cited internally as a proof-of-concept that even legacy collections, many of them digitized from physical prints decades ago with inconsistent metadata, can be cleaned up without a massive capital outlay. The library's digital services unit spent approximately $18,000 on staff overtime and software licensing for the project, according to a budget line item in materials prepared for the Library Commission's June 24 meeting.

Why This Reaches Beyond Back-Office IT

Housing advocates and neighborhood groups have a stake in this too. The Planning Department's online permit portal, which covers everything from accessory dwelling units in the Sunset District to major mixed-use towers near Transbay, relies on image attachments submitted by applicants. When duplicate site photos clog the system, case managers spend time reconciling files rather than processing applications — a drag on a pipeline that city officials have repeatedly said must accelerate to meet state housing production targets.

The city is not alone in wrestling with this. Large municipalities from Chicago to New York have documented similar problems as they migrated legacy document systems to cloud infrastructure. San Francisco's cloud storage costs for municipal agencies rose noticeably after a 2023 contract consolidation, making deduplication not just an organizational nicety but a line-item concern during every budget cycle.

For residents and businesses dealing with the city digitally, the practical upshot is straightforward: records requests that involve images — construction photos, code enforcement documentation, street-condition surveys — should become more reliable and faster to process once deduplication tools are in wider use. The Department of Technology's October policy review will determine whether the 90-day voluntary window converts into something with actual enforcement teeth. Department heads who want to get ahead of that deadline have been told to contact DataSF's technical assistance team directly. The clock started this week.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.