The Daily San Francisco

San Francisco news, every day

News

SF City Agencies Push to Root Out Duplicate Images From Public Records as AI Tools Flood Digital Archives

Officials, archivists and tech experts are calling for a coordinated city response to a growing problem that's cluttering government databases and threatening the integrity of public documents.

By San Francisco News Desk · Published 4 July 2026, 11:44 am

3 min read

San Francisco's city government is sitting on a digital mess. Across multiple departments — from the Planning Department's permit archive on Drucker Street to the Digital Services office housed at City Hall — thousands of duplicate images have accumulated in public-facing databases, the result of years of rushed scanning projects, overlapping vendor contracts and, most recently, AI-assisted document ingestion tools that copy files without flagging repeats. Officials and records specialists are now pressing for a unified deduplication strategy before the problem compounds further.

The timing matters. San Francisco is in the middle of a $14 million modernization push for its digital infrastructure, approved by the Board of Supervisors in spring 2026, that is migrating legacy city records into a centralized cloud platform managed through the Department of Technology. That migration has exposed just how redundant and disorganized the underlying image libraries have become. Experts warn that if duplicates aren't cleared before the transition completes, the city will simply be paying to store garbage at scale.

What Officials and Experts Are Saying

Records managers at the San Francisco Public Library's History Center on Larkin Street — which maintains millions of digitized photographs, permits and maps — have flagged the issue internally for more than a year. The concern is not merely cosmetic. When duplicate images carry different metadata tags, search results return conflicting versions of the same document, which can cause errors in title searches, zoning reviews and public records requests. For a city where housing production delays already cost developers and residents months of time, an unreliable permit archive adds another layer of friction.

Tech specialists at Civic Bridge, a program run through the City Administrator's Office that pairs private-sector technologists with city departments, have been consulting on the problem since February 2026. The working group has looked at open-source image-hashing tools — software that generates a unique fingerprint for each file and flags exact or near-exact copies — as a low-cost first step. Perceptual hashing, a technique widely used by platforms like Facebook and Google to detect near-duplicate images even when file names or formats differ, has emerged as the leading candidate for a pilot program.

At SF Digital Services, staff have pointed to the city's Salesforce-based 311 portal as a case study in what happens when deduplication is ignored. Residents uploading photos of the same pothole or encampment through the app regularly generate four or five duplicate image attachments per incident report, inflating storage costs and slowing case-worker review times. The department has not released a public estimate of the storage cost attributable to duplicates, but industry benchmarks for mid-sized municipal governments suggest redundant files can account for 20 to 30 percent of total digital storage consumption.

The Path Forward

Planning Commission staff have proposed a phased approach: a 90-day audit of the permit image archive beginning in September 2026, followed by automated deduplication using hashing tools, and then a policy requiring all future scanning vendors to run deduplication checks before delivery. The proposal still needs sign-off from the City Administrator's Office and a budget allocation — estimated internally at roughly $400,000 for the first year, covering software licensing and two additional full-time archivists.

The BART system, which manages its own separate digital asset library for engineering drawings and station photographs independent of City Hall, has already completed a smaller-scale deduplication of its infrastructure image archive. That project, finished in March 2026, reportedly cleared 18 percent of the agency's stored image files as redundant — a figure that, if replicated across San Francisco's city departments, would represent significant savings in both storage spend and staff time.

For residents and advocates tracking the city's broader transparency efforts, the practical takeaway is straightforward: public records requests that rely on image files — for zoning decisions, building permits or historical documents — may return incomplete or conflicting results until the cleanup is done. Anyone filing a records request through the city's NextRequest portal on 1 South Van Ness Avenue should flag any apparent duplicate documents they receive, as that feedback is now being channeled directly to the Department of Technology's migration team.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.