Maffs 'n' graffs - statistical porn

  • The Fighting Cock is a forum for fans of Tottenham Hotspur Football Club. Here you can discuss Spurs latest matches, our squad, tactics and any transfer news surrounding the club. Registration gives you access to all our forums (including 'Off Topic' discussion) and removes most of the adverts (you can remove them all via an account upgrade). You're here now, you might as well...

    Get involved!

Latest Spurs videos from Sky Sports

Small sample size, innit?

Not really, its 100% of the stats. I've used all the available matches that meet both criteria, hence not a sample at all ;) (unless you count before he was even at the club, but that would be silly)

Sammy argued his validity in a 'best 11' in the Soldado thread, but I don't actually think there's much competition or form in either wing position to really say that it's madness. It's equally as nonsensical for people to slag him off though than to praise him, IMO.
Based on all available information, we have done better as a team with him in it.
 
Not really, its 100% of the stats. I've used all the available matches that meet both criteria, hence not a sample at all ;) (unless you count before he was even at the club, but that would be silly)
Hmmm... not sure you follow. Using 100% of the stats available isn't the same argument as those concerning sample size. The very fact that Lamela has only made ten starts makes the stats themselves practically worthless in deriving a conclusion. If he gets some serious game time then perhaps next season with 40 team starts or more then we can take an honest look at team win % when lamela plays versus when he doesn't and have some degree of comfort with the relationship, but not now.
 
Hmmm... not sure you follow. Using 100% of the stats available isn't the same argument as those concerning sample size. The very fact that Lamela has only made ten starts makes the stats themselves practically worthless in deriving a conclusion. If he gets some serious game time then perhaps next season with 40 team starts or more then we can take an honest look at team win % when lamela plays versus when he doesn't and have some degree of comfort with the relationship, but not now.

If it were sampling then I would be selecting some figures out of a batch of data to use for comparison.

In this case, I used all of the available data to show the statistics. We've played 47 games with Lamela in the team, 17 of which he's played in.

Point of it was to show that people are just as flawed for knocking the guy as others are for praising him, only that a player is part of a team and, technically, the data so far suggests that the team benefits from his inclusion.

Like I said originally in the thread, it's not a "this proves that" display, it's just purely up to the observer as to what conclusions they make.
 
If it were sampling then I would be selecting some figures out of a batch of data to use for comparison.

In this case, I used all of the available data to show the statistics. We've played 47 games with Lamela in the team, 17 of which he's played in.

Point of it was to show that people are just as flawed for knocking the guy as others are for praising him, only that a player is part of a team and, technically, the data so far suggests that the team benefits from his inclusion.
I still think that you're woefully overstating the value of the statistic, though. Like I already explained, I'm not calling you out for problems with your selection. It's the size of the sample that's the problem here. There's really just nothing here to go on yet. You simply cannot tease out the affect that Lamela has on the team from a total of 10 starts. And how many minutes did Lamela play in those 7 sub appearances? I would venture a guess to say not enough to make any worthwhile improvement to the problem at hand.
Like I said originally in the thread, it's not a "this proves that" display, it's just purely up to the observer as to what conclusions they make.
Wonderful. I'm happy you said that. However, I think that on this count it's just disingenuous to even be offering it up. The data is immature. This is about one step removed from someone looking at Sherwood's first three games and thinking "Oh gee, two wins out of three, Sherwood has the best win percentage of any Spurs manager, ever!!!!!"
 
I still think that you're woefully overstating the value of the statistic, though. Like I already explained, I'm not calling you out for problems with your selection. It's the size of the sample that's the problem here. There's really just nothing here to go on yet. You simply cannot tease out the affect that Lamela has on the team from a total of 10 starts. And how many minutes did Lamela play in those 7 sub appearances? I would venture a guess to say not enough to make any worthwhile improvement to the problem at hand.

Technically his sub appearances water down the win% and goals per game of his starts :adegrin:

Wonderful. I'm happy you said that. However, I think that on this count it's just disingenuous to even be offering it up. The data is immature. This is about one step removed from someone looking at Sherwood's first three games and thinking "Oh gee, two wins out of three, Sherwood has the best win percentage of any Spurs manager, ever!!!!!"

I don't know, when comparing which winger have been best this season, none of them have set the world alight, so even with his 10 starts I think that there's good justification to consider him when you think how bad the others have been.
 
this the type of thing you're looking for, Dannyboy Dannyboy ?

image.jpg


There's a noticeable increase in GD since Martin Jol, with a plateau this season (as expected).
The gradient of the Villas-Boas reign (for goals scored) isn't too different to the Redknapp reign, but Harry had more time to score more goals too..

Goal difference is interesting though, Hoddle is a bit crap...lol

image.jpg


The x-axis is games played, Hoddle's side just went to utter shit as you can see

Cheers mate; a little surprised in general at how shit things went at the end under Hoddle - I have clearly sugar coated his tenure.

It also means I miss van der Vaart, Modric, Bale and even Crouch during Redknapp's time.
 
Cheers mate; a little surprised in general at how shit things went at the end under Hoddle - I have clearly sugar coated his tenure.

It also means I miss van der Vaart, Modric, Bale and even Crouch during Redknapp's time.

I think a lot of people get nostalgic, but the last 20 games under Hoddle included 12 defeats and conceding 40 goals. Something like 16 points from 60 :/ (over two seasons mind)

2002/03 PREMIERSHIP Sa 18Jan 2003 Aston Villa 0 - 1 Tottenham
2002/03 PREMIERSHIP We 29Jan 2003 Tottenham 0 - 1 Newcastle
2002/03 PREMIERSHIP Sa 01Feb 2003 Chelsea 1 - 1 Tottenham
2002/03 PREMIERSHIP Sa 08Feb 2003 Tottenham 4 - 1 Sunderland
2002/03 PREMIERSHIP Mo 24Feb 2003 Tottenham 1 - 1 Fulham
2002/03 PREMIERSHIP Sa 01Mar 2003 West Ham 2 - 0 Tottenham
2002/03 PREMIERSHIP Su 16Mar 2003 Tottenham 2 - 3 Liverpool
2002/03 PREMIERSHIP Mo 24Mar 2003 Bolton 1 - 0 Tottenham
2002/03 PREMIERSHIP Sa 05Apr 2003 Tottenham 2 - 1 Birmingham
2002/03 PREMIERSHIP Sa 12Apr 2003 Leeds 2 - 2 Tottenham
2002/03 PREMIERSHIP Fr 18Apr 2003 Tottenham 0 - 2 Man City
2002/03 PREMIERSHIP Mo 21Apr 2003 West Brom 2 - 3 Tottenham
2002/03 PREMIERSHIP Su 27Apr 2003 Tottenham 0 - 2 Man Utd
2002/03 PREMIERSHIP Sa 03May 2003 Middlesbro 5 - 1 Tottenham
2002/03 PREMIERSHIP Su 11May 2003 Tottenham 0 - 4 Blackburn
2003/04 PREMIERSHIP Sa 16Aug 2003 Birmingham 1 - 0 Tottenham
2003/04 PREMIERSHIP Sa 23Aug 2003 Tottenham 2 - 1 Leeds
2003/04 PREMIERSHIP We 27Aug 2003 Liverpool 0 - 0 Tottenham
2003/04 PREMIERSHIP Sa 30Aug 2003 Tottenham 0 - 3 Fulham
2003/04 PREMIERSHIP Sa 13Sep 2003 Chelsea 4 - 2 Tottenham
2003/04 PREMIERSHIP Sa 20Sep 2003 Tottenham 1 - 3 Southampton
 
Instead of lineups, you could take a page from hockey and calculate +/-

That way, the player at each position with the best +/- would be whom you want to start.

But I seem to recall reading somewhere that +/- has terrible predictive power for football (as the number of players increases, how much each player is responsible for a goal scored or conceded gets less obvious).

Could you explain how you determine a players +/- please?
 
If it were sampling then I would be selecting some figures out of a batch of data to use for comparison.
But taking the whole population doesn't obviate sample size issues. If I flipped a coin three times, and it came up heads every time, then you could say you're not sampling that the coins has a 100% chance of coming up heads; you're measuring the entire population!

But that doesn't mean the coin can only come up heads. It just happened to in your small sample size (even if it's the size of the entire population). If you bet your life savings that the next throw of the coin was heads… oi.

So it's not that you're sampling a small chunk of Fryer's matches and leaving the rest unaccounted for. It's that Fryers simply hasn't done enough (the population from which to draw data isn't big enough) for us to tell what his effect actually is. It takes many tosses of a coin (MoE = 1/sqrt(n)) to determine that it's 50/50 heads/tails, not just three (or four!). Similarly, Fryers has to make many more appearances before we can tell what role he plays in terms of influenced match outcomes.

I can't precisely tell what you mean when you say (paraphrasing) "but the statistics show!" Statistics never show. They suggest, since it's impossible to account for all the variables, etc. That's why it's called statistical inference and not statistical proof.

While admiring what you've been up to here, I have to say that it remains the case that Fryer's record is inflated by strength of opposition (4? of 10? matches are Europa League group stage), for example. I can't tell if you're seriously suggesting that if we started Fryers we would be at in a better position to win any given match. Because that's not a sample size problem (which VirginiaSpur VirginiaSpur already highlighted), that's a "correlation does not equal causation" problem. Just because we've had a good record under Fryers doesn't mean that it's because of Fryers that we've had the good record. Teasing out Fryers's specific contribution to that record is far more complicated, despite what 100% of the statistics show.

Your 100% accounts for the matches that have already happened, and have little or no predictive value for the future. Returning to the coin tosses from the top, you can say with 100% certainty that in those three tosses, it will come up heads every time, but that doesn't prepare you at all for talking about what the fourth throw will yield.
 
But taking the whole population doesn't obviate sample size issues. If I flipped a coin three times, and it came up heads every time, then you could say you're not sampling that the coins has a 100% chance of coming up heads; you're measuring the entire population!

But that doesn't mean the coin can only come up heads. It just happened to in your small sample size (even if it's the size of the entire population). If you bet your life savings that the next throw of the coin was heads… oi.

So it's not that you're sampling a small chunk of Fryer's matches and leaving the rest unaccounted for. It's that Fryers simply hasn't done enough (the population from which to draw data isn't big enough) for us to tell what his effect actually is. It takes many tosses of a coin (MoE = 1/sqrt(n)) to determine that it's 50/50 heads/tails, not just three (or four!). Similarly, Fryers has to make many more appearances before we can tell what role he plays in terms of influenced match outcomes.

I can't precisely tell what you mean when you say (paraphrasing) "but the statistics show!" Statistics never show. They suggest, since it's impossible to account for all the variables, etc. That's why it's called statistical inference and not statistical proof.

While admiring what you've been up to here, I have to say that it remains the case that Fryer's record is inflated by strength of opposition (4? of 10? matches are Europa League group stage), for example. I can't tell if you're seriously suggesting that if we started Fryers we would be at in a better position to win any given match. Because that's not a sample size problem (which VirginiaSpur VirginiaSpur already highlighted), that's a "correlation does not equal causation" problem. Just because we've had a good record under Fryers doesn't mean that it's because of Fryers that we've had the good record. Teasing out Fryers's specific contribution to that record is far more complicated, despite what 100% of the statistics show.

Your 100% accounts for the matches that have already happened, and have little or no predictive value for the future. Returning to the coin tosses from the top, you can say with 100% certainty that in those three tosses, it will come up heads every time, but that doesn't prepare you at all for talking about what the fourth throw will yield.


? Fryers?

I was talking about the Lamela stats?
 
We might be shit, but we have the cleverest fans. This - is - enough.

That is until you visit White Hart Lane or Facebook of course...

Also CJJ CJJ can you do something on the contribution of Andros and Dembele. It'd be interesting to see how both of their ability to carry the ball forwards effects our team.
 
Sorry, I was confused and very drunk last night. I think that if you s/Fryers/Lamela/g everything holds true.

Still think you're perhaps not getting what the subject was?

In the Soldado thread, people were saying that he has no place in a best XI. Considering how shit our wingers have been, it seemed harsh. I think the inference was that he has contributed nothing.

Like I said in the original post "statistically we're better when he starts, regardless of his individual performance".

There was no effort to say he was our best winger or anything like that, but purely that the statistics suggest he made up part of a more successful team.

None of the graphs or charts in this thread or any other are intended to show anything, only highlighting the statistics to allow people to make their own judgements, rather than listen to the unsubstantiated endorsements or denouncements.
 
That is until you visit White Hart Lane or Facebook of course...

Also CJJ CJJ can you do something on the contribution of Andros and Dembele. It'd be interesting to see how both of their ability to carry the ball forwards effects our team.

Possibly, if you elaborate a bit more on what you want to see. Dembele is probably our 2nd most present player, so I would think that we probably would have to start building in shots on/off and possession into the stats or something, in order to show something meaningful
 
Instead of lineups, you could take a page from hockey and calculate +/-

But I seem to recall reading somewhere that +/- has terrible predictive power for football (as the number of players increases, how much each player is responsible for a goal scored or conceded gets less obvious).

First post from me, I'm a new Spurs fan (picked a great time to hop on the bandwagon...). Still getting used to EPL and familiar with the team so I have very little to add elsewhere, but I am very familiar with hockey and with the new statistics that are being used to analyze the NHL.

+/- (goals for minus goals against while on the ice at even strength) is still shown in most stats tables for the NHL, but it is pretty well ignored and hated by the stat geeks. Sample size is too small and it is too heavily influenced by the save percentage of the goalies to be useful at the individual skater level. Obviously in hockey, the "on-ice" distinction is very important because each player is playing less than 1/3 of the game.

What those geeks have moved on to tracking and analyzing is shot differential. Shots correlate to goals pretty strongly, and it greatly increases the sample size while teasing out goaltender effects. In hockey, there isn't a true of "possession", so this shot differential is taken as a decent proxy. The team shooting the most holds the puck the most. Some people have taken it further and tried to track where on the ice the shots come from to determine "shot quality", but that data is very difficult to find.

I did find one site that has some pretty detailed analysis with shots statistics as the underlying data, but I'm not able to post links in my first post. There is an article on Soldado comparing his shot selection to his Valencia days that was posted two days ago, plus a mid-February article comparing the first 10 games of Sherwood's reign vs AVB. Summary of that second one: 23 points in 10 games was basically a fluke.
 
Possibly, if you elaborate a bit more on what you want to see. Dembele is probably our 2nd most present player, so I would think that we probably would have to start building in shots on/off and possession into the stats or something, in order to show something meaningful

How it effects our win %, goals conceded and chances created. The things you suggested sound interesting too and hey, it's your time. May I also add my thanks for all the work you've done.
 
First post from me, I'm a new Spurs fan (picked a great time to hop on the bandwagon...). Still getting used to EPL and familiar with the team so I have very little to add elsewhere, but I am very familiar with hockey and with the new statistics that are being used to analyze the NHL.

+/- (goals for minus goals against while on the ice at even strength) is still shown in most stats tables for the NHL, but it is pretty well ignored and hated by the stat geeks. Sample size is too small and it is too heavily influenced by the save percentage of the goalies to be useful at the individual skater level. Obviously in hockey, the "on-ice" distinction is very important because each player is playing less than 1/3 of the game.

What those geeks have moved on to tracking and analyzing is shot differential. Shots correlate to goals pretty strongly, and it greatly increases the sample size while teasing out goaltender effects. In hockey, there isn't a true of "possession", so this shot differential is taken as a decent proxy. The team shooting the most holds the puck the most. Some people have taken it further and tried to track where on the ice the shots come from to determine "shot quality", but that data is very difficult to find.

I did find one site that has some pretty detailed analysis with shots statistics as the underlying data, but I'm not able to post links in my first post. There is an article on Soldado comparing his shot selection to his Valencia days that was posted two days ago, plus a mid-February article comparing the first 10 games of Sherwood's reign vs AVB. Summary of that second one: 23 points in 10 games was basically a fluke.
Cartilage Free Captain has been doing this kind of shot analysis all season.
 
For anyone who wants to try and wish their tits off for CL Quals/4th place...

Or just the curious ones...

I've made an interactive/'live' league table so you can see how the results affect the table. Much easier than using your head to work it out.

Xlsx version will be a bit better, but incase you can't open newer office files, the XLS is there too.

All done with formulae, so no worries about VBA and what not..

PLPredictor.xlsx
PLPredictor.xls
 
Last edited:
Sample size is too small and it is too heavily influenced by the save percentage of the goalies to be useful at the individual skater level. Obviously in hockey, the "on-ice" distinction is very important because each player is playing less than 1/3 of the game.
Duly noted. Even so, the only reason +/- is even in discussion in hockey is because you can assume that each player has more to do with a goal, as there are only six of them on the ice. In football, it's probably an even smaller amount of "influence." Especially when so many goals are scored on the break (or because of defensive lapses).
Shots correlate to goals pretty strongly, and it greatly increases the sample size while teasing out goaltender effects.
If they do in football, however, we've not been the beneficiaries. All season last season we were waiting for our goal tally to start to correlate with out shot tally. We were consistently one of the biggest shooters but had precious little to show for it. Lots was suggested as to why (speculative bombs from outside the box, etc.). On the flip side, Lloris was considered a poor keeper because he had such a low save percentage (for some reason, we gave up few shots, but the ones we did give up would go in!).

Here's a post with some links to that:

http://www.thefightingcock.co.uk/forum/threads/slumpen-hotspurariat.5226/page-3#post-244911

Possession, to which you allude, has no correlation in football to results. That's been demonstrated numerous times. Though it could eventually correlate.
23 points in 10 games was basically a fluke.
I wrote about this in a recent thread:

http://www.thefightingcock.co.uk/forum/threads/spursy-luck…-and-its-abundance-so-far.9412/

By pythagorean expectation, we are absolutely playing out of our minds in terms of points, compared to the number of goals we're scoring (and giving up).
 
Last edited:
Back
Top Bottom