The Daily San Francisco

San Francisco news, every day

News

The Numbers Behind SF's Duplicate Image Problem: How Bad Data Is Costing the City Millions

A deep dive into the statistics reveals that San Francisco's government databases and public-facing platforms are riddled with duplicate and mismatched images — and the cleanup bill keeps climbing.

By San Francisco News Desk · Published 4 July 2026, 11:44 am

4 min read

San Francisco's municipal digital infrastructure is carrying a hidden weight: tens of thousands of duplicate, redundant, and mismatched images spread across city agency websites, property databases, and transit information portals — a problem that auditors and IT procurement officers have flagged repeatedly since at least 2023, and one that is quietly draining departmental budgets at a time when every discretionary dollar is contested.

The issue surfaced most visibly this spring when the Department of Building Inspection, which manages records for properties across neighborhoods from the Tenderloin to the Sunset, reported that its public-facing permit portal contained roughly 34,000 redundant image files, many of them duplicate photographs submitted by contractors for the same project. Storage and bandwidth costs tied to those redundancies ran to an estimated $280,000 annually, according to figures presented to the city's Committee on Information Technology during a March 2026 budget session.

A City Database Problem With Real Dollar Costs

The problem is not unique to one department. The San Francisco Municipal Transportation Agency, which runs both BART-connecting Muni Metro lines and the surface bus network, maintains a fleet imagery database used for insurance claims, maintenance logs, and public accountability reports. Staff inside the agency's Potrero Division maintenance facility on Cesar Chavez Street have flagged that the database held duplicate vehicle images at a rate of roughly one in every six files as of a February 2026 internal review — a ratio that IT vendors say is well above the industry benchmark of under three percent for managed municipal systems.

The SF Digital Services office, housed at 1 Dr. Carlton B. Goodlett Place, has been working through a broader data hygiene initiative since late 2024. That program, called DataSF Refresh, set a target of reducing overall duplicate records across six core city databases by forty percent before the end of fiscal year 2026. Images — not text records, not spreadsheets — turned out to be the hardest category to clean. Automated deduplication tools flagged roughly 210,000 suspect image pairs across city systems in the first pass, but human reviewers were needed to adjudicate about sixty percent of those flags because the files were near-duplicates rather than exact copies, differing only in resolution, crop, or timestamp metadata.

At the San Francisco Public Library's main branch on Larkin Street, a parallel digitization project that began converting physical archive photographs in 2021 ran into the same wall. The library's digital collections team identified more than 12,000 duplicate or near-duplicate scans out of approximately 90,000 images processed — a thirteen percent redundancy rate that delayed the public launch of a Depression-era neighborhood photography collection by four months.

Why This Matters Right Now

The timing is awkward. City Hall is negotiating a fiscal year 2026–27 budget under significant pressure, with homelessness response programs and housing production initiatives competing for the same pool of discretionary funds. Technology infrastructure, which rarely generates headlines, tends to lose those fights. But the duplicate image problem has a direct line to costs: cloud storage rates for municipal contracts in San Francisco currently run between $0.023 and $0.041 per gigabyte per month depending on the tier, and image files are the single largest contributor to storage volume across most city departments, typically accounting for between fifty-five and seventy percent of total data stored.

The DataSF Refresh team has piloted a machine-learning deduplication tool developed in partnership with a Mission District-based civic-tech vendor since January 2026. Early results from the pilot, applied to the Recreation and Parks Department's permits and events image archive at McLaren Lodge in Golden Gate Park, reduced redundant files by sixty-one percent in the first ninety days. Projected savings across all six target databases, if that rate holds, would exceed $1.4 million annually in avoided storage and bandwidth expenditure.

City IT officials are expected to present a formal expansion proposal to the Board of Supervisors' Government Audit and Oversight Committee before the end of July. Departments that have not yet joined the initiative — including the Office of Economic and Workforce Development and the Planning Department — will face a decision about whether to opt in voluntarily or wait for a potential citywide mandate. For residents and businesses that rely on those portals for permits, inspections, and transit information, the practical upside is faster load times and fewer errors when pulling up property or vehicle records. For city finance staff, the arithmetic is already clear.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily San Francisco

This article was produced by the The Daily San Francisco editorial desk and covers news in San Francisco. See our editorial standards for how we use AI.

The Daily San Francisco brief

The day's San Francisco news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to San Francisco news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily San Francisco and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily San Francisco

More in News

Enjoyed this story? Get tomorrow's briefing free.