San Francisco's Department of Technology quietly escalated a months-long cleanup of duplicate images embedded across the city's public records infrastructure this week, accelerating a project that had stalled in early spring. The effort, which touches databases used by the Planning Department on Gough Street and the Department of Homelessness and Supportive Housing, aims to strip redundant files that have been slowing document retrieval times and inflating storage costs across multiple city systems.
The timing matters. The city is mid-way through a $42 million digital modernization contract that the Board of Supervisors approved in late 2025, and program managers are under pressure to show measurable progress before the fiscal year closes on June 30. Duplicate image files — the same photograph of a property, a permit applicant, or a shelter bed logged two, three, or sometimes a dozen times across different systems — have been identified as one of the primary culprits behind degraded search performance in the city's Accela permitting platform and the Salesforce-based case management tool used by HSH field workers.
What Changed This Week
On Tuesday, the Department of Technology's enterprise data team deployed an updated deduplication script across the Planning Department's digital archive, which holds more than 1.2 million permit-related image files going back to 2009. The sweep identified roughly 340,000 candidate duplicate files in the first 48 hours, according to a project status update posted to the city's DataSF portal Wednesday morning. Not all flagged files will be deleted — staff must manually review a sample set to ensure the algorithm isn't misidentifying distinct images taken on different dates as duplicates — but project leads estimate the final purge could free up several terabytes of storage on the city's data center infrastructure hosted through a contract with Equinix in the South of Market district.
The Tenderloin-based nonprofit TechSF, which trains city workers on data management, has been providing supplemental guidance to municipal staff handling the manual review. The organization ran a half-day workshop at City Hall's Room 201 on Monday specifically covering image metadata verification — how to read EXIF data, check file hash values, and flag genuinely unique images that a deduplication algorithm might incorrectly bundle together.
Why Housing and Homelessness Records Are Central
The stakes are highest for the Department of Homelessness and Supportive Housing. HSH caseworkers upload intake photos, ID document scans, and housing placement records into a system shared across more than 60 Navigation Center and supportive housing sites citywide, including the large Multi-Service Center South facility on Fifth Street and the new Embarcadero Navigation Center. When the same client image is logged multiple times — a common error when intake forms are resubmitted after a system timeout — it can generate false counts in aggregate reports and complicate audits conducted by the Controller's Office.
City records show the HSH database currently holds records for more than 9,700 individuals active in its shelter and housing programs as of the most recent quarterly count published in May 2026. Even a small percentage of duplicate image records at that scale can distort the client-level data that program managers use to allocate beds and track outcomes. The deduplication work is designed to get ahead of a state audit of San Francisco's homelessness spending scheduled for the fourth quarter of 2026.
For San Francisco residents who interact with city services — filing a 311 complaint with an attached photo, submitting a building permit application online, or uploading documents to the Assessor-Recorder's portal — the practical effect of the cleanup should eventually be faster page loads and fewer error messages when retrieving documents. The Planning Department has said it expects improved response times in its public-facing permit search tool by late August.
The Department of Technology has not yet released a final completion date for the full deduplication sweep, but the DataSF project page indicates a checkpoint review is scheduled for July 18. Residents or businesses with pending permit applications who notice discrepancies in uploaded documents are being directed to contact the Planning Department's intake counter at 49 South Van Ness Avenue directly rather than waiting for the automated review to resolve flagged files.