The Daily San Francisco

San Francisco news, every day

News

SF City Agencies Push to Purge Duplicate Images From Digital Archives This Week

A coordinated effort across several San Francisco departments to clean up redundant visual records is forcing a reckoning with years of bloated, poorly managed digital storage.

By San Francisco News Desk · Published 4 July 2026, 12:06 pm

3 min read

SF City Agencies Push to Purge Duplicate Images From Digital Archives This Week
Photo: Photo by Robert So on Pexels

San Francisco's Department of Technology began a citywide audit this week targeting duplicate images clogging the digital archives of at least six municipal agencies, a housekeeping push that has quietly grown into one of the more consequential data-management projects the city has undertaken since it migrated most records to cloud infrastructure in 2022. The effort, which started in earnest Monday, involves coordinated deduplication work across the Planning Department, the San Francisco Municipal Transportation Agency, and the Department of Public Works, among others.

The timing is not accidental. The city's contract with its primary cloud storage vendor comes up for renewal in the fall, and the Department of Technology is under pressure from the Controller's Office to justify ballooning storage costs before new terms are negotiated. Redundant image files — duplicate photos of street conditions, planning permit documentation, transit infrastructure inspections — have been identified as a primary driver of unnecessary data overhead. Cutting that overhead before contract talks begin could give the city meaningful leverage on pricing.

Where the Redundancy Problem Is Worst

The SFMTA's image library, used to document everything from Muni rail inspections along the Twin Peaks Tunnel to parking enforcement records in the Tenderloin, has accumulated redundant files estimated to represent a significant share of the agency's total storage footprint, according to documents reviewed by The Daily San Francisco. The Planning Department's permit photo archive, which covers building inspection records from neighborhoods including the Mission District and Chinatown, has similarly accumulated years of unmanaged uploads where field staff photographed the same structures multiple times without any automated deduplication in place.

The Department of Public Works maintains a separate image catalog tied to its 311 service request system — the database that logs complaints about broken sidewalks, graffiti, and illegal dumping across the city. When multiple residents photograph the same pothole on, say, Cesar Chavez Street and submit through the SF311 app, each image is currently stored as a discrete file even when they are visually identical. The new protocol, being piloted this week, would flag those images for review before they hit permanent storage.

San Francisco is not the first city to confront this. New York City's Department of Information Technology and Telecommunications undertook a comparable deduplication initiative in 2023, and the city reported reducing its municipal image archive size by roughly 18 percent over 12 months. SF officials have pointed to that effort as a model, though the city's own legacy systems present additional complications because several departments still operate on distinct, siloed database architectures that do not communicate with one another automatically.

Software, Staff, and What Comes Next

The city is deploying a combination of open-source perceptual hashing tools and a commercial deduplication platform to identify near-identical images — not just exact byte-for-byte copies but visually similar photographs taken seconds apart or from slightly different angles. That distinction matters because simple hash-matching would miss a large category of the problem. The technology identifies images as potential duplicates when they exceed a similarity threshold, then routes them to a human reviewer rather than deleting them automatically. That human review step was a non-negotiable condition set by the City Attorney's Office, which flagged legal risk around permanently deleting images that might be relevant to pending litigation or public records requests.

The Civic Bridge program at the San Francisco Office of Civic Innovation has embedded two technology fellows with the Department of Technology through August to help manage the rollout. That program, which pairs private-sector technologists with city agencies on defined projects, has previously worked on data transparency and service delivery tools.

For city residents, the immediate practical effect is minimal — no public-facing services are changing. The longer-term payoff, if the audit proceeds on schedule, is a leaner, faster records system and, city officials hope, a smaller storage bill when contract renewal talks open in September. The deduplication pilot is scheduled to run through the end of July, with a full report to the Controller's Office expected by August 15.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.