SQL Analysis of the Astros Sign Stealing Scandal

I grabbed the files from http://signstealingscandal.com/ to analyse them online via Datasette. Feel free to ask SQL questions here to do your own analysis of Tony’s work!

https://trash-can-astros.glitch.me/data

The average across the dataset was 14% of pitches had a trashcan bang prior to it. (How did no one figure this out! That’s a lot!)

Going with that number, I was curious how many runs they got in games where they had more than that average. You could also remove the summary query, and just run the sub-query to look for correlations.

But yes, it definitely looks like there was a benefit overall of around a run per game when they were banging on the trash cans.

Runs with >=14% Games with >=14% Runs with <14% Games with <14%
5.29 28 4.17 30
with pitch_counts as (
select game_pk, cast(sum(case when has_bangs='y' then 1 else 0 end) as real) banged_pitches, 
  cast(count(*) as numeric) total_pitches 
  from astros_bangs_20200127
group by game_pk)

select
    round(sum(case when t.banged_pct >= 0.14 then t.runs else 0 end)/cast(sum(case when t.banged_pct >= 0.14 then 1 else 0 end) as real),2) "Runs with >=14%",
    sum(case when t.banged_pct >= 0.14 then 1 else 0 end) "Games with >=14%",
    round(sum(case when t.banged_pct < 0.14 then t.runs else 0 end)/cast(sum(case when t.banged_pct < 0.14 then 1 else 0 end) as real),2) "Runs with <14%",
    sum(case when t.banged_pct < 0.14 then 1 else 0 end) "Games with <14%"
from (
  select 
    a.game_pk, (case when a.final_home_runs > a.final_away_runs then 'W' else 'L' end) result, 
    (a.final_home_runs - a.final_away_runs) run_diff,
    a.final_home_runs runs,
    p.banged_pitches, p.total_pitches,
    round( (p.banged_pitches/p.total_pitches), 3) banged_pct
from astros_bangs_20200127 a, pitch_counts p
where a.game_pk = p.game_pk
group by a.game_pk
) t

Link to Results