San Francisco's Department of Technology has been quietly wrestling with a problem that sounds mundane until you see the numbers: city agencies collectively stored an estimated three to five copies of the same digital image for every one they actually needed, according to an internal audit framework the department began piloting across Mission Street offices in late 2024. The duplication problem did not appear overnight. It accumulated across two decades of overlapping software contracts, pandemic-driven digitization rushes, and a culture in which deleting anything felt riskier than keeping everything.
The timing matters. With the city facing a projected budget shortfall that Mayor Daniel Lurie's administration is still working to close heading into fiscal year 2027, every gigabyte of redundant cloud storage is a line item someone has to defend. Enterprise cloud contracts — the kind the city holds with major vendors to house planning documents, permit photos, and public-health imagery — can run into the hundreds of thousands of dollars annually at municipal scale. Trimming duplicate data is no longer just a housekeeping question; it is a budget conversation.
How the Backlog Built Up
The roots go back to the mid-2000s, when departments began scanning paper records independently rather than through any centralized protocol. The San Francisco Planning Department, housed on Seventh Street, digitized decades of parcel photographs and permit files. The Department of Building Inspection, operating out of 49 South Van Ness, ran parallel digitization efforts. Neither operation talked much to the other, and both fed into separate content-management systems that rarely communicated.
Then came COVID-19. Starting in March 2020, city staff working remotely uploaded documents to whatever platform their department already subscribed to — Google Drive, SharePoint, legacy FTP servers — and nobody coordinated deduplication. The San Francisco Department of Public Health alone onboarded at least four separate cloud environments during the pandemic period, according to city technology planning documents released under a Sunshine Ordinance request in 2025. Each environment collected its own image libraries: testing-site photographs, facility inspection shots, public-communications graphics. By 2022, the overlap was structural.
The SF Digital Services team, which operates under the umbrella of the Department of Technology at City Hall, began mapping the problem formally in early 2023 as part of a broader data-governance initiative tied to the city's Strategic Plan for Technology. That mapping exercise found duplicate imagery concentrated in three clusters: planning and zoning records, public-health communications assets, and the image libraries attached to the city's open-data portal on DataSF. Fixing any one cluster required coordinating across department heads who had different priorities and different vendor relationships.
The Cleanup Effort and What Comes Next
Progress has been real but uneven. DataSF, the city's public data portal, completed a deduplication pass on its hosted image sets in the first quarter of 2026, retiring roughly 40,000 redundant files according to figures the Department of Technology shared at a March Board of Supervisors committee hearing. The Planning Department is mid-process, working through a backlog of parcel photos that stretch back to 1998. Building Inspection has not yet begun a formal deduplication effort, city records show.
The practical obstacle is human as much as technical. Automated deduplication tools — several open-source options and at least two commercial platforms the city evaluated in 2025 — can identify pixel-identical images quickly. The harder cases are near-duplicates: the same building photographed six days apart, or a public-health graphic resized to three different dimensions for three different platforms. Those require human review, and city staff hours are finite.
For residents and small-business owners who interact with city permitting systems along the Van Ness Corridor or file documents through the online SF Planning portal, the duplication problem has occasionally meant slower load times and misfiled attachments when staff pull the wrong version of a photo from an overcrowded file tree. Fixing it should make those interactions faster and less error-prone.
The Department of Technology has set an internal target of completing agency-wide deduplication protocols by the end of calendar year 2026. Whether departmental budget pressures accelerate or delay that timeline will depend heavily on how the city resolves its broader fiscal picture in the months ahead.