The Daily San Francisco

San Francisco news, every day

News

SF City Agencies Push to Purge Duplicate Images From Public Records Databases This Week

A city-wide audit of redundant digital assets is forcing departments from the Planning Commission to the Public Library to rethink how they store and serve visual records.

By San Francisco News Desk · Published 4 July 2026, 12:25 pm

3 min read

SF City Agencies Push to Purge Duplicate Images From Public Records Databases This Week
Photo: Photo by Malcolm Hill on Pexels

San Francisco city officials moved this week to accelerate a long-delayed cleanup of duplicate images clogging public-facing databases, with at least three departments reporting active deduplication efforts underway as of July 3. The push affects everything from permit photographs stored at the Department of Building Inspection's Civic Center offices to archival images held by the San Francisco Public Library's History Center on Larkin Street.

The timing matters. The city's digital infrastructure office has been under pressure since early 2026 to cut storage costs as part of a broader budget consolidation. Redundant image files — the same photograph stored under multiple file names or in multiple systems — quietly consume server capacity and slow retrieval times for staff and the public alike. With AI-assisted cataloguing tools now available at lower price points, several departments decided this was the week to act rather than wait for a formal city-wide mandate.

What the Cleanup Looks Like on the Ground

At the San Francisco Planning Department, staff have been running automated scripts against the Parcel Information database, which contains hundreds of thousands of site photographs taken since the early 2000s. The problem is structural: when the city migrated legacy systems to a cloud platform in 2023, files were often copied rather than moved, creating near-identical duplicates that now account for a meaningful share of total stored data. The department has not released official figures yet, but similar migrations in comparable municipal systems have produced duplication rates of 20 to 35 percent of total image libraries, according to industry benchmarks published by the Urban Institute in March 2025.

The San Francisco Public Library's digitisation program, headquartered at the Main Branch on Larkin Street, faces a different version of the same problem. The History Center has been digitising photographs from the James R. Tait collection and other 20th-century San Francisco holdings. Volunteers and contractors scanning physical prints have sometimes produced multiple scans of identical images, and different batches of scans were uploaded to separate folders without cross-referencing. Librarians began a manual-and-automated review process on June 30, targeting completion before the end of July.

Meanwhile, the SF Department of Elections, which stores ballot-related imagery and scanned documentation at its City Hall suite, began its own deduplication review on July 2. The department confirmed the review is routine ahead of the November 2026 election cycle, during which document imaging volumes are expected to spike significantly.

Why Duplicate Images Are a Bigger Problem Than They Sound

Storage costs are one issue. But duplicate images in public records systems also create legal and transparency headaches. When a journalist or attorney files a public records request, duplicate files can appear as separate responsive documents, inflating production costs and creating confusion about what the canonical record actually is. The City Attorney's Office has flagged this as an area of risk in at least two internal guidance memos circulated in 2025, though those memos have not been made public.

The practical cost is real. Commercial cloud storage for municipal governments typically runs between $0.02 and $0.05 per gigabyte per month, and large city image repositories can run into tens of terabytes. Trimming even 25 percent of redundant data from a 50-terabyte archive would represent meaningful annual savings — rough math puts that in the range of $3,000 to $7,500 per year for storage alone, not counting staff time saved in search and retrieval.

The Tenderloin-based nonprofit Gray Area Foundation for the Arts, which has partnered with city cultural programs on digital preservation projects in the past, has publicly advocated for better metadata standards that would prevent duplicates from forming in the first place — a prophylactic approach rather than a recurring cleanup.

For San Francisco residents who use public portals to access permit records, property photographs, or archival imagery, the practical advice right now is straightforward: if you encounter broken image links or unexpected gaps in a city database over the next two to three weeks, it may reflect the deduplication process in progress. The Planning Department's online permit portal and the library's digital collections portal on SFPL.org are both expected to experience brief intermittent slowdowns through mid-July. Bookmark your searches and check back after July 18 for a cleaner, faster experience.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.