San Francisco's public records infrastructure has a clutter problem. Across city departments, the San Francisco Public Library's San Francisco History Center, and the digital collections managed by the San Francisco Arts Commission, archivists and IT managers are sitting on backlogs of duplicate digital images—scanned photographs, planning documents, permit records, and cultural assets that have been uploaded multiple times across incompatible systems. The question now is who decides what stays, what gets deleted, and who bears the cost of getting it right.
The timing is not coincidental. The city's broader push to modernize municipal technology, accelerated after the COVID-19 pandemic exposed fragile legacy systems, has funneled new attention toward data hygiene. At the same time, the AI boom reshaping South of Market and Mission Bay has created both a new market for clean, deduplicated image datasets and a new class of vendors pitching automated solutions to government clients. City procurement offices are being lobbied. Contracts are being drafted. The decisions made before the end of the fiscal year on July 31 will determine which approach San Francisco locks into.
What's Actually at Stake on Larkin Street and Brannan Street
The San Francisco Public Library's main branch on Larkin Street in the Civic Center holds one of the most significant photographic collections on the West Coast—tens of thousands of images documenting the city from the 1850s onward. Librarians there have been working since 2023 to migrate holdings into a unified digital asset management platform, a process that has surfaced extensive duplication: the same photograph sometimes exists in three or four versions at different resolutions, with conflicting metadata, across systems that were never designed to talk to each other.
Meanwhile, the San Francisco Planning Department's permit image archive, stored partly through servers managed out of the city's data center on Brannan Street, faces a more bureaucratic version of the same headache. Permit photographs submitted by contractors and property owners accumulate duplicates whenever applicants resubmit documentation, which happens routinely. Planning staff have flagged the issue in internal workflow reviews, though no public remediation timeline has been formally announced.
The San Francisco Digital Services office, which sits inside the Department of Technology and oversees citywide tech standards, is the entity most likely to set policy here. Its decisions will ripple outward to at least a dozen city departments that maintain their own image repositories.
Three Decisions That Will Define the Outcome
Archivists and municipal technology professionals generally agree the coming weeks hinge on three distinct choices. First: automated versus manual deduplication. AI-powered tools can process large image libraries fast, but they make errors—flagging distinct photographs as duplicates because of visual similarity rather than exact duplication. For irreplaceable historical materials in collections like those at the History Center, a false deletion is permanent. Manual review is slower and far more expensive, running roughly $40 to $80 per labor hour for qualified digital archivists, according to published rate schedules from professional archival services firms.
Second: who retains the authoritative copy. When two city systems each hold a version of the same image, someone has to designate one as the canonical record. That determination has legal implications for public records requests filed under California's Public Records Act, which requires government agencies to provide access to official documents.
Third: long-term storage contracts. The city is weighing cloud-based storage agreements that would consolidate holdings off-premise, a move with both cost and sovereignty implications. San Francisco paid approximately $14.6 million in cloud services contracts across multiple departments in fiscal year 2024-25, according to figures published by the San Francisco Controller's Office.
The next formal checkpoint comes at the San Francisco Digital Services advisory meeting scheduled for late July, where departmental representatives are expected to present competing proposals. Advocacy groups focused on government transparency, including those who have pressed for open data standards at City Hall on Dr. Carlton B. Goodlett Place, have indicated they plan to attend and push for public comment periods before any deduplication policy is finalized. The Fourth of July holiday gives city staff one more quiet week before the political calendar fills back up—and before the budget cycle closes the window on this summer's choices.