San Francisco's Department of Technology began a citywide audit this week targeting duplicate images clogging the digital archives of at least six municipal agencies, a housekeeping push that has quietly grown into one of the more consequential data-management projects the city has undertaken since it migrated most records to cloud infrastructure in 2022. The effort, which started in earnest Monday, involves coordinated deduplication work across the Planning Department, the San Francisco Municipal Transportation Agency, and the Department of Public Works, among others.
The timing is not accidental. The city's contract with its primary cloud storage vendor comes up for renewal in the fall, and the Department of Technology is under pressure from the Controller's Office to justify ballooning storage costs before new terms are negotiated. Redundant image files — duplicate photos of street conditions, planning permit documentation, transit infrastructure inspections — have been identified as a primary driver of unnecessary data overhead. Cutting that overhead before contract talks begin could give the city meaningful leverage on pricing.
Where the Redundancy Problem Is Worst
The SFMTA's image library, used to document everything from Muni rail inspections along the Twin Peaks Tunnel to parking enforcement records in the Tenderloin, has accumulated redundant files estimated to represent a significant share of the agency's total storage footprint, according to documents reviewed by The Daily San Francisco. The Planning Department's permit photo archive, which covers building inspection records from neighborhoods including the Mission District and Chinatown, has similarly accumulated years of unmanaged uploads where field staff photographed the same structures multiple times without any automated deduplication in place.
The Department of Public Works maintains a separate image catalog tied to its 311 service request system — the database that logs complaints about broken sidewalks, graffiti, and illegal dumping across the city. When multiple residents photograph the same pothole on, say, Cesar Chavez Street and submit through the SF311 app, each image is currently stored as a discrete file even when they are visually identical. The new protocol, being piloted this week, would flag those images for review before they hit permanent storage.
San Francisco is not the first city to confront this. New York City's Department of Information Technology and Telecommunications undertook a comparable deduplication initiative in 2023, and the city reported reducing its municipal image archive size by roughly 18 percent over 12 months. SF officials have pointed to that effort as a model, though the city's own legacy systems present additional complications because several departments still operate on distinct, siloed database architectures that do not communicate with one another automatically.
Software, Staff, and What Comes Next
The city is deploying a combination of open-source perceptual hashing tools and a commercial deduplication platform to identify near-identical images — not just exact byte-for-byte copies but visually similar photographs taken seconds apart or from slightly different angles. That distinction matters because simple hash-matching would miss a large category of the problem. The technology identifies images as potential duplicates when they exceed a similarity threshold, then routes them to a human reviewer rather than deleting them automatically. That human review step was a non-negotiable condition set by the City Attorney's Office, which flagged legal risk around permanently deleting images that might be relevant to pending litigation or public records requests.
The Civic Bridge program at the San Francisco Office of Civic Innovation has embedded two technology fellows with the Department of Technology through August to help manage the rollout. That program, which pairs private-sector technologists with city agencies on defined projects, has previously worked on data transparency and service delivery tools.
For city residents, the immediate practical effect is minimal — no public-facing services are changing. The longer-term payoff, if the audit proceeds on schedule, is a leaner, faster records system and, city officials hope, a smaller storage bill when contract renewal talks open in September. The deduplication pilot is scheduled to run through the end of July, with a full report to the Controller's Office expected by August 15.