San Francisco's city government is sitting on a digital hoarding problem. An internal review of municipal document management systems, completed this spring, found that duplicate image files — scanned permits, inspection photos, archival maps, planning documents — account for roughly 34 percent of total storage consumed across the Department of Building Inspection and the San Francisco Planning Department combined. That single figure is driving an emergency consolidation effort that city IT officials say could save upward of $2.1 million annually in cloud storage and infrastructure contracts.
The timing matters. The city is mid-way through a five-year digital infrastructure overhaul launched in fiscal year 2024-25, and the Mayor's Office of Housing and Community Development is pushing agencies to digitize paper records at Mission Street headquarters faster than at any point in the last decade. As that digitization pace accelerates — the Planning Department processed more than 47,000 permit applications in calendar year 2025 alone — duplicate images multiply quickly. Every scanned floor plan uploaded twice, every inspection photo filed in three folders, compounds into storage costs that the city's General Fund absorbs.
Where the Redundancy Lives
The worst offenders are legacy systems at two locations: the Civic Center campus at Van Ness Avenue and the Department of Public Works annex on Bryant Street. According to the internal review, those two nodes together hold an estimated 1.8 petabytes of image data, of which deduplication software identified approximately 612 terabytes as exact or near-exact copies. To put that in consumer terms, 612 terabytes is roughly equivalent to 122 million smartphone photos.
San Francisco Public Works began piloting a deduplication protocol in January 2026 using software licensed through the city's existing contract with its enterprise cloud vendor. Early results from the first 90-day phase removed 41 terabytes of confirmed duplicates from active storage without a single verified data loss, according to figures circulated inside the department. The San Francisco Digital Services office, which sits under the City Administrator, is now coordinating with the Controller's Office to standardize the protocol across 14 city departments before the end of fiscal year 2026-27.
The problem is not unique to government. In the private sector, the Mission District's tech corridor has watched the issue reshape internal IT budgets for years. AI startups operating out of office space along South Van Ness Avenue and mid-Market Street routinely cite image dataset redundancy as one of the top three causes of runaway cloud bills during model training cycles. For city government, the calculus is different — there are legal retention requirements, public records obligations, and chain-of-custody rules that make automated deletion far more cautious — but the core math is the same: duplicate files are expensive dead weight.
What Deduplication Actually Costs to Fix
The remediation is not free. San Francisco Digital Services has budgeted $340,000 in the current fiscal year for the deduplication initiative, covering software licensing, staff overtime for data validation, and a third-party audit of any files flagged for removal before they are purged. That figure does not include the roughly $180,000 in consultant time the Controller's Office allocated separately to map data lineage across legacy systems — some of which date to a 1998 records digitization project that predates current metadata standards entirely.
The practical math still favors action. If the $2.1 million annual savings projection holds — and city officials have caveated that figure as a ceiling estimate, not a guarantee — the program breaks even before the end of its second full year of operation. The San Francisco Budget and Legislative Analyst's office is expected to publish an independent assessment of those projections before the Board of Supervisors takes up the Digital Services budget amendment in September 2026.
For residents, the downstream effect is less about money than about speed. Duplicate images clog the search indexes that planning staff use to pull permit histories for properties across the city. Clearing that redundancy, Digital Services has told agency heads, should reduce average document retrieval times from roughly 4.2 minutes per query to under 90 seconds — a change that moves fastest at the Permit Center on Edmondson Avenue, where counter staff field hundreds of public inquiries each week. That is the number residents who have waited in line will care about most.