San Francisco's public agencies are sitting on a digital hoarding problem. An internal review circulated this spring among city department heads found that duplicate image files account for roughly 34 percent of all stored digital assets across municipal databases — a figure that translates, in raw storage terms, to an estimated 18 petabytes of redundant data costing the city several million dollars annually in cloud hosting fees alone.
The timing matters. The city's Department of Technology is mid-rollout on a $47 million infrastructure modernization push that includes migrating legacy records to a unified cloud platform. That migration, expected to complete by March 2027, has exposed just how badly catalogued San Francisco's digital image libraries have become — particularly in departments that digitized paper records during the pandemic years without consistent metadata standards.
Where the Problem Concentrates
The worst backlogs sit in the Planning Department's parcel-photo archive on Seventh Street and in the San Francisco Public Library's digital collections hub, which manages the city's historical photograph repository out of the main branch on Larkin Street in Civic Center. Librarians there have flagged that some individual photographs — particularly images of the 1906 earthquake and the Fillmore District jazz era — exist in as many as 40 separate file versions across different digitization projects, each with slightly different file names and cropping but identical underlying content.
The SF Digital Services office, the Mayor's Office unit tasked with improving online public tools, has been working since January 2026 with a contractor called Starling Labs — a Stanford-affiliated digital verification nonprofit with offices on Mission Street — to pilot an automated deduplication protocol across three city departments. The pilot covers Recreation and Parks, the City Attorney's office, and the Office of the Assessor-Recorder. Early results from the Assessor-Recorder phase, completed in April, identified 2.3 million duplicate image files out of 6.8 million total — a 34 percent redundancy rate that matched the citywide estimate almost exactly.
The financial drag is real. Cloud storage prices for municipal contracts typically run between $0.02 and $0.05 per gigabyte per month depending on tier and vendor. At 18 petabytes of purely redundant storage — a conservative estimate that several city IT managers have described as likely understated — the annual cost sits somewhere between $4.3 million and $10.8 million, depending on which tier the data occupies. That range, while wide, represents money that budget analysts say could otherwise fund roughly 30 to 70 additional Muni operator positions.
What Deduplication Actually Fixes
Beyond cost, duplicate image bloat slows the public-facing systems San Franciscans actually use. The city's open data portal, DataSF, serves roughly 280,000 page views per month according to figures the platform published for Q1 2026. Database queries that touch image-heavy datasets — zoning maps, building permit photos, neighborhood planning documents — run measurably slower when indexes are cluttered with redundant files pointing to functionally identical content.
Nonprofits aren't immune either. The Internet Archive, headquartered on Funston Avenue in the Inner Richmond, has grappled with the same structural problem at a global scale and has developed hash-based deduplication tools that SF Digital Services is now evaluating for potential city adoption. The approach assigns every image a unique cryptographic fingerprint at the moment of upload; any subsequent file that generates an identical fingerprint gets flagged automatically rather than stored as a new asset.
For San Francisco residents and businesses that rely on public records for permitting, property research, or historical documentation, the practical upshot is straightforward: expect faster load times and more accurate search results on city platforms as the deduplication project scales beyond its current three-department pilot. SF Digital Services has indicated it plans to expand the program to the Planning Department and the Department of Building Inspection by the end of 2026. The full citywide rollout, if it holds to the current March 2027 timeline, would mark the first comprehensive image-library audit San Francisco has conducted since 2014.