The Daily San Francisco

San Francisco news, every day

News

SF's Digital Records Mess: What Officials and Experts Are Saying About the City's Duplicate Image Problem

From Planning Department archives to SFMTA permit databases, redundant and mislabeled digital images are quietly gumming up city workflows — and the people who manage those systems want action.

By San Francisco News Desk · Published 4 July 2026, 11:44 am

3 min read

San Francisco's municipal agencies are sitting on millions of duplicate digital images — redundant files that slow database queries, inflate storage costs, and increasingly trip up the AI-assisted tools the city has been deploying since late 2024. The problem isn't new, but pressure to fix it is mounting as departments push deeper into automated permitting, housing inspection tracking, and transit monitoring.

The timing matters. Mayor Daniel Lurie's administration has made housing production a top priority, and the Planning Department's permit-processing pipeline depends heavily on document management systems that store site photographs, architectural drawings, and inspection records. When the same image appears under four different file names — a common occurrence after years of ad hoc scanning practices — automated workflows flag conflicts, slow review times, and, in some cases, force staff to intervene manually.

Where the Bottlenecks Show Up

Two agencies come up repeatedly in conversations with city technology staff: the San Francisco Municipal Transportation Agency, which manages tens of thousands of images tied to parking enforcement, street-condition reports, and construction-zone permits; and the Department of Building Inspection, headquartered on Fell Street, whose Accela permitting platform has accumulated years of duplicated photo attachments linked to projects in neighborhoods from the Tenderloin to the Outer Sunset.

The San Francisco Digital Services office, a unit within the City Administrator's office that was restructured in 2023, has been piloting deduplication software on a subset of Planning Department records held at 49 South Van Ness Avenue. The pilot, which began running on roughly 400,000 image files in the spring of 2025, is evaluating whether perceptual hashing tools — algorithms that compare images by visual content rather than file name — can flag redundant records without human review of each one.

Archivists and records managers watching the pilot say the core challenge is that city departments built their digital storage systems independently, using different naming conventions, different scanner settings, and different metadata standards. An image of a Mission District façade might exist as a TIFF in one system and a compressed JPEG in another, and a simple file-name match won't catch that duplication. Perceptual hashing can, but it generates its own false positives — flagging similar-but-distinct images of, say, two storefronts on Valencia Street as duplicates when they are not.

The cost argument for cleaning up the archives is straightforward. Cloud storage for large unstructured file collections is not cheap, and city contracts for enterprise document management have grown. According to the City Controller's Office FY2025 budget summary, the city's information technology expenditures across all departments exceeded $300 million, a figure that includes storage infrastructure. Records managers argue that eliminating verified duplicates could reduce storage overhead meaningfully — though without completed audits, precise savings projections remain speculative.

What Comes Next for Departments

Digital Services staff have indicated the pilot results are expected to inform a broader policy recommendation sometime before the end of calendar year 2026. If the pilot performs well, the recommendation would likely call for a citywide metadata standard for image attachments — essentially a common set of rules for how departments name, tag, and store photographs when they enter any city system.

For residents and contractors dealing with the city's permitting systems, the practical effect of a successful deduplication push would be faster document retrieval. Builders applying for permits in high-density corridors like Geary Boulevard or in the Dogpatch redevelopment zone have complained for years about delays traced partly to database lookup times inflated by redundant records.

Technology consultants familiar with municipal records work caution that software alone won't solve the problem. Agencies need updated intake protocols so that new duplicates don't accumulate as fast as old ones are removed. Training for the staff who scan and upload documents — often entry-level administrative workers across dozens of departments — is a necessary complement to any automated deduplication tool.

The Fourth of July holiday gives city offices a one-day pause. When they reopen Saturday, the Digital Services pilot will still be running, the Accela backlog will still be there, and the debate over who owns the fix — individual department IT units or a centralized city authority — will pick up exactly where it left off.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.