At least a dozen community organizations across San Francisco say they have lost hundreds of photographs to automated duplicate-image removal systems in the past eighteen months — tools embedded in cloud storage platforms that flag visually similar files and delete what the algorithm decides is redundant. For the archivists, small-business owners, and community historians affected, the damage is anything but redundant.
The issue has sharpened this spring as more nonprofits, small vendors, and neighborhood groups migrated storage to cloud-based systems, many of them pushed by cost pressures following the tech-sector layoffs of 2023 and 2024 that thinned out in-house IT support citywide. With fewer technical staff watching over digital libraries, automated housekeeping tools have more room to run unchecked.
From the Excelsior to the Tenderloin, Archives Under Threat
The Filipino Community Center on Mission Street says it discovered earlier this year that a batch of photographs documenting its annual Pistahan festival, some dating to the late 1990s, had been partially purged after a storage migration flagged low-resolution duplicates and removed originals alongside them. The center declined to confirm the exact number of files lost while an internal review is ongoing, but staff described the situation as significant.
A few miles north, the Tenderloin Museum on Turk Street — which holds one of the more detailed photographic collections of the neighborhood's mid-twentieth-century history — told The Daily San Francisco it has audited its own digital holdings after learning of similar incidents at peer institutions. Museum staff said they found no losses to date but have begun manually tagging critical files to protect them from automated processes.
Small-business owners have not been so lucky. A family-run florist on Irving Street in the Inner Sunset said it lost product photographs accumulated over seven years when a cloud storage plan it had upgraded to last October began running a space-optimization feature by default. The owner said the tool removed what it classified as near-identical images — variations in lighting and angle that the business used to show seasonal inventory. Replacing the shots, even roughly, would require a professional photography session the owner estimated would cost upward of $800.
Community technology advocates say the pattern points to a structural problem: opt-out consent for aggressive storage features is buried in terms-of-service documents that most users never read. A 2025 survey by the Electronic Frontier Foundation, which is headquartered in San Francisco on Eddy Street, found that fewer than 12 percent of small-business respondents said they fully understood the default data-management settings on their primary cloud storage provider.
What Comes Next — and What You Can Do Now
The San Francisco Public Library's Digital Futures program, which runs workshops at branch locations including the Chinatown branch on Broadway and the Noe Valley branch on Jersey Street, has added a new session specifically on local backup strategies and deduplication risks. The next session is scheduled for late July 2026. Librarians there are also assembling a printed guide on how to audit cloud storage settings — a low-tech solution aimed at seniors and small nonprofits without dedicated IT support.
For organizations that have already suffered losses, data recovery specialists in SoMa and the Mid-Market corridor say the window for recovering deleted files depends almost entirely on whether the cloud provider's retention policy keeps deleted items in a recoverable state. That window is typically 30 days, though it varies by plan tier and provider.
City Hall has not yet taken a formal position on the issue, and no legislation addressing cloud storage defaults for nonprofits or small businesses is currently moving through the San Francisco Board of Supervisors. Advocates say they plan to bring the matter to the Small Business Commission, which holds public hearings at City Hall on Van Ness Avenue, before the end of the summer. For now, the most reliable protection remains the oldest one: a second backup, kept somewhere the algorithm cannot reach.