San Francisco's Department of Technology launched a targeted cleanup of duplicate images embedded in the city's public-facing data portals this week, after an internal audit found thousands of redundant photograph files slowing down three separate systems used by departments ranging from Public Works to the Planning Department. The effort, centered on the DataSF platform managed out of City Hall, is the most focused push the city has made since a broader open-data initiative began in 2019 to bring municipal records into a unified digital framework.
The timing matters. Housing production in San Francisco remains under a state-mandated deadline, and permit applications processed through the Planning Department's e-permit portal rely on image attachments — site photographs, survey maps, architectural renderings — that planners review before approvals. Duplicated files have been identified as a bottleneck that slows retrieval times and, in some documented cases flagged in the audit, has caused version-control confusion where reviewers were uncertain which photograph reflected current site conditions. With the city under pressure to accelerate housing approvals, even incremental slowdowns carry real costs.
What the Cleanup Actually Involves
The technical work is being done in phases. In the first phase, which wrapped up by July 2, engineers ran hash-matching software against image libraries held on city servers at 1 South Van Ness Avenue — the main municipal technology hub — to identify files where identical pixel data had been uploaded more than once under different file names or case-reference numbers. The Planning Department alone had an estimated 40,000 flagged image files in its permit archive queued for review, according to public documentation posted to the DataSF changelog on July 1.
The second phase, running through July 18, extends the same process to the SF Department of Building Inspection's Permit and Project Tracking System, known internally as PPTS, and to the Recreation and Parks Department's digital asset library, which holds images of sites from McLaren Park in the Excelsior to the Ferry Building waterfront. Recreation and Parks manages more than 220 parks and open spaces, and its image database has grown substantially since the department digitized historical photographs beginning in 2021.
Community groups that regularly submit public records requests say the duplicate problem has been a recurring frustration. The Tenderloin Housing Clinic, which monitors city permit decisions affecting low-income residents in the Tenderloin and SoMa, noted in a February 2026 public comment to the Planning Commission that records requests sometimes returned duplicate attachments that made document packets unnecessarily large and harder to parse. The clinic did not characterize it as a systemic failure, but flagged it as a recurring inconvenience in formal written comments that are publicly available in the Planning Commission's archive.
Broader Implications for City Tech Reform
The cleanup sits inside a larger conversation about San Francisco's technology infrastructure at a moment when the city is simultaneously trying to squeeze AI tools into government workflows. The Mayor's Office of Civic Innovation has been piloting automated document-processing tools in partnership with at least two Civic Center departments since early 2026, and duplicate image data is exactly the kind of noise that degrades machine-learning performance when AI is used to sort or classify permit documents.
DataSF, which the city first launched in 2009 and which now hosts more than 600 public datasets, has not undergone a comprehensive image-quality audit of this scale before, according to its public documentation. The current effort was partly prompted by a storage-cost review: city cloud storage contracts, renegotiated in late 2025, now charge on a per-gigabyte basis rather than a flat annual fee, making redundant data directly measurable as a budget line.
For residents who use city portals to track neighborhood projects or pull permit histories — a common practice among Mission District community organizations monitoring commercial conversions — the practical payoff should be faster load times and cleaner document returns on records requests. The Department of Technology says it expects the full deduplication to be complete before the end of July. Anyone filing image-heavy permit applications through the e-permit portal on sf.gov between now and July 18 may encounter brief processing delays while the second-phase audit runs in parallel with live system use.