San Francisco's Department of Technology logged a significant push this week to identify and replace duplicate images embedded across dozens of municipal websites and public-facing databases, a housekeeping effort that city IT staff say has been quietly clogging storage systems and slowing page load times across SF.gov and related portals. The work accelerated ahead of the June 30 fiscal year-end deadline, with staff now carrying the momentum into the holiday week.
The timing matters because the city is in the middle of a broader digital infrastructure overhaul. The Mayor's Office of Housing and Community Development, which maintains online dashboards tracking affordable unit production across neighborhoods from the Tenderloin to Bayview-Hunters Point, has been one of the primary consumers of image-heavy content. Duplicate permit photos, redundant project renderings, and stacked asset files have accumulated over years of rushed uploads during the pandemic-era remote work transition. Getting that cleaned up is now part of a larger push to make public data portals faster and more reliable.
What's Actually Happening Across City Systems
The Department of Technology's Digital Services team, based at 1 South Van Ness Avenue, has been running automated deduplication scripts across shared cloud storage since late June. According to city IT documentation reviewed this week, the project targets image assets stored in the city's Amazon Web Services environment, which San Francisco migrated to under a contract originally signed in 2019. Workers are not simply deleting files — they are replacing confirmed duplicates with canonical versions tagged with standardized metadata, so the same photo of, say, a Caltrain station or a Mission District affordable housing project doesn't appear under a dozen different filenames.
SF Digital Services, the civic tech unit that handles the public-facing layer of SF.gov, has separately flagged that duplicate imagery has contributed to slower load times on mobile devices — a problem that disproportionately affects residents in lower-income neighborhoods where broadband access is limited and mobile browsing is the primary way people interact with city services. The team estimates that image deduplication alone could reduce asset storage overhead by roughly 18 to 22 percent across the main SF.gov content management system, according to internal benchmarking shared in a June project update.
The effort connects to work already underway at the San Francisco Public Library, which has been digitizing its collection through the Civic Art Collection program and the San Francisco History Center at the Main Library on Larkin Street. Librarians there have dealt with duplicate scans appearing in the Online Archive of California, a federated database used by researchers nationally. This week, library staff confirmed they are coordinating with the Department of Technology to align deduplication protocols so that images removed from one system don't create broken links in another.
Why This Week, and What Comes Next
The Fourth of July holiday gave city IT teams a rare window. With most departments running skeleton crews and web traffic to SF.gov dropping by a typical 40 percent on federal holidays, the Digital Services team scheduled the most intensive batch-processing runs for the July 3–5 window, minimizing risk of disruption to residents trying to access services. The approach mirrors what the city used during the 2023 Salesforce Transit Center network migration, when engineers deliberately timed major changes to low-traffic periods.
Residents and nonprofit organizations that regularly pull images from city open data portals — including housing advocacy groups working out of offices in SoMa and the Richmond District — will want to audit any direct image links they've hardcoded into their own websites or reports. The Department of Technology is maintaining a redirect registry to catch broken URLs, but direct links to deprecated duplicate files may start returning errors as early as next week.
For anyone building tools on top of San Francisco's open data infrastructure, the city's DataSF team at datasf.org has posted guidance on using the updated canonical asset paths. The full deduplication sweep is expected to complete by July 18, after which the Department of Technology plans to publish a public-facing summary of how much storage was recovered and what performance gains were measured across the city's digital services.