Can we predict San Francisco car break-ins? - Data Science Question


I was wondering recently about a recurring problem I see in San Francisco which is the increasing number of car break in’s through “smashing” the back window; therefore shattering it. I was talking with my friend about how data science may be able to create some kind of algorithm to track all the instances (locations) in the city and if they found out around what time and day the window was broken they might be able to create some patterns and therefore help people park their car in safer neighborhoods / better times in the day, etc.

Do you really think this is a feasible hypothesis? Could data science help car owners avoid car break ins?

Just curious about it! Let me know your thoughts if you have any!

1 Like

So I guess I see two questions.

The first is to “help people park their car in safer neighborhoods / better times in the day”. Algorithm itself doesn’t track locations/instances, produce data out of nowhere. But I am sure there are publicly available sources having those data, such as police authority posting crime data, with the time and location that each instance happened. In fact, I found a this website does a beautiful visualization on car break-ins in SF.

The second is to find the real reasons behind higher break-ins, maybe it’s higher number of homeless people on the road, maybe it’s because the unemployment rate goes up, or maybe it’s even because people are buying better cars, you don’t know, but you could come up with hypothesis and use data to validate your hypothesis. If your model is accurate, you could do prediction tasks, something like given these features, what would be the number of break-ins happening next month? It could have applications in insurance industry I believe.


Thanks Siyu for providing the link, it is super insightful. I can also check out where I live to see if the number is reasonable. Also, potentially would be good for visitors to know as it seems to target some tourist locations.

I hope one day we can create “smash proof windows” or some other ways to prevent these crimes. Thanks for your input.

Here is a heatmap by time of day animation I generated using the data that was provided in Siyu Tao’s link. The heatmap is generated by hour and only includes theft/larceny from vehicle data points.


This second image is the same data but not aggregated by time.


I don’t know how helpful this is in terms of helping people find safer places to park, or how helpful any further exploration of the data would be. Though, there is some really hot spots in street corners such as Twin Peaks and Ocean Beach but they are too concentrated to produce anything meaningful on large heatmaps. Maybe if we have foot-traffic data, we can see that the thieves are more daring if there’s less chance of people witnessing a crime?


Wow @nguyenhderrick, the visualizations really help. As a lifetime bay area resident, much of that in SF, car break-ins are a huge huge pain for everyone with a car :frowning: The first visualization is especially helpful in seeing how car thieves move to completely different areas the moment it becomes nighttime. I suppose focusing heavily on tourists during the day maybe