Data Analysis of Fandango

By Python

Fandango

Fandango Media, LLC is an American ticketing company that sells movie tickets as well as give movies reviews. In 2015, Fandango faced allegations of providing misleading reviews for movies, as outlined in the mentioned article. FiveThirtyEight

Aim

When a company both displays, movie ratings and profits from selling tickets, there could be a bias toward higher ratings to boost sales. Transparency and audience awareness are essential to prevent such biases.The goal is to assess the data from Fandango and compare it with the claims made in the 538 article to ascertain whether Fandango's ratings in 2015 were indeed influenced to favor higher ratings, potentially to boost ticket sales.

Technologies Used

Language: python

Libraries: numpy, pandas, matplotlib and seaborn

Data

The data set is openly available on 538's github. There are two csv files, one with Fandango Stars and Displayed Ratings, and the other with aggregate data for movie ratings from other sites, like Metacritic, IMDB, and Rotten Tomatoes.

all_sites_scores.csv

all_sites_scores.csv contains every film that has a Rotten Tomatoes rating, a RT User rating, a Metacritic score, a Metacritic User score, and IMDb score, and at least 30 fan reviews on Fandango. The data from Fandango was pulled on Aug. 24, 2015.

                         

fandango_scape.csv

fandango_scrape.csv contains every film 538 pulled from Fandango.

         

part 1 :Fandango site

Importing


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
fandango = pd.read_csv("fandango_scrape.csv")
fandango.head()
    
                

Output


                    
                    

The relationship between popularity of a film and its rating. By a scatterplot which is showing the relationship between rating and votes.


plt.figure(figsize=(10,4),dpi=150)
sns.scatterplot(data=fandango,x='RATING',y='VOTES');
    

Output



                    

Creating a new column that is able to strip the year from the title strings and set this new column as YEAR


fandango['YEAR'] = fandango['FILM'].apply(lambda title:title.split('(')[-1]))
fandango

Output


        
                

number of movies in the Fandango DataFrame per year.


fandango['YEAR'].value_counts()

Output


    
            

The count of movies per year:


sns.countplot(data=fandango,x='YEAR')

Output


            
            

the top 10 movies with the highest number of votes.


fandango.nlargest(10,'VOTES')

Output


            
            

removing any films that have zero votes.


fan_reviewed = fandango[fandango['VOTES']>0]


         
            

KDE plot, that displays the distribution of ratings (STARS) versus what the true rating was from votes (RATING)


plt.figure(figsize=(10,4),dpi=150)
sns.kdeplot(data=fan_reviewed,x='RATING',clip=[0,5],fill=True,label='True Rating')
sns.kdeplot(data=fan_reviewed,x='STARS',clip=[0,5],fill=True,label='Stars Displayed')
            
plt.legend(loc=(1.05,0.5))

Output



            

Calculating this difference with STARS-RATING and rounding these differences to the nearest decimal point.


fan_reviewed["STARS_DIFF"] = fan_reviewed['STARS'] - fan_reviewed['RATING'] 
fan_reviewed['STARS_DIFF'] = fan_reviewed['STARS_DIFF'].round(2)
 fan_reviewed

Output


        
            

The count plot which display the number of times a certain difference occurs:


plt.figure(figsize=(12,4),dpi=150)
sns.countplot(data=fan_reviewed,x='STARS_DIFF',palette='magma')

Output


    
            

the plot show that one movie was displaying over a 1 star difference than its true rating!


fan_reviewed[fan_reviewed['STARS_DIFF'] == 1]

Output


        
            

Part 2: Comparison of Fandango Ratings to Other Sites


all_sites = pd.read_csv("all_sites_scores.csv")        
all_sites.head()       
        

Output



                    


all_sites.info()   
    

Output


                
                

Rotten Tamatoes

RT has two sets of reviews, their critics reviews (ratings published by official critics) and user reviews.


plt.figure(figsize=(10,4),dpi=150)
sns.scatterplot(data=all_sites,x='RottenTomatoes',y='RottenTomatoes_User')
plt.xlim(0,100)
plt.ylim(0,100)    

Output



            

Creating a new column based off the difference between critics ratings and users ratings for Rotten Tomatoes. ie :RottenTomatoes-RottenTomatoes_User


all_sites['Rotten_Diff']  = all_sites['RottenTomatoes'] - all_sites['RottenTomatoes_User']

Output

the Mean Absolute Difference between RT scores and RT User scores is.


all_sites['Rotten_Diff'].apply(abs).mean()

Output


        15.095890410958905
        

Histogramsof distribution of the differences between RT Critics Score and RT User Score.


plt.figure(figsize=(10,4),dpi=200)
sns.histplot(data=all_sites,x='Rotten_Diff',kde=True,bins=25)
plt.title("RT Critics Score minus RT User Score");

Output



        

distribution of the absolute value difference between Critics and Users on Rotten Tomatoes.


plt.figure(figsize=(10,4),dpi=200)
sns.histplot(x=all_sites['Rotten_Diff'].apply(abs),bins=25,kde=True)
plt.title("Abs Difference between RT Critics Score and RT User Score");

Output


        
        

the top 5 movies, users rated higher than critics on average



all_sites.nsmallest(5,'Rotten_Diff')[['FILM','Rotten_Diff']]

Output


        
        

the top 5 movies critics scores higher than users on average.


print("Critics love, but Users Hate")
all_sites.nlargest(5,'Rotten_Diff')[['FILM','Rotten_Diff']]

Output


        
        

MetaCritic

a scatterplot of the Metacritic Rating versus the Metacritic User rating.


plt.figure(figsize=(10,4),dpi=150)
sns.scatterplot(data=all_sites,x='Metacritic',y='Metacritic_User')
plt.xlim(0,100)
plt.ylim(0,10)

Output



        

the highest Metacritic User Vote count for a movie


all_sites.nlargest(1,'Metacritic_user_vote_count')

Output


        
        

IMBD

a scatterplot for the relationship between vote counts on MetaCritic versus vote counts on IMDB.


plt.figure(figsize=(10,4),dpi=150)
sns.scatterplot(data=all_sites,x='Metacritic_user_vote_count',y='IMDB_user_vote_count')

Output



        

the highest IMDB user vote count for a movie


all_sites.nlargest(1,'IMDB_user_vote_count')

Output


        
        

Part 3: Fandago Scores vs. All Sites

Combining the Fandango Table with the All Sites table, by inner merge to merge together both DataFrames based on the FILM columns.


df = pd.merge(fandango,all_sites,on='FILM',how='inner')

Output


            
            

Creating new normalized columns for all ratings so they match up within the 0-5 star range shown on Fandango.


df['RT_Norm'] = np.round(df['RottenTomatoes']/20,1)
df['RTU_Norm'] =  np.round(df['RottenTomatoes_User']/20,1)
df['Meta_Norm'] =  np.round(df['Metacritic']/20,1)
df['Meta_U_Norm'] =  np.round(df['Metacritic_User']/2,1)
df['IMDB_Norm'] = np.round(df['IMDB']/2,1)              
                df.head()
    

Output



                

Now creating a norm_scores DataFrame that only contains the normalizes ratings. Include both STARS and RATING from the original Fandango table


norm_scores = df[['STARS','RATING','RT_Norm','RTU_Norm','Meta_Norm','Meta_U_Norm','IMDB_Norm']]
norm_scores.head()
        

Output


                    
                    

Comparing Distribution of Scores Across Sites


def move_legend(ax, new_loc, **kws):
old_legend = ax.legend_
handles = old_legend.legendHandles
labels = [t.get_text() for t in old_legend.get_texts()]
title = old_legend.get_title().get_text()
ax.legend(handles, labels, loc=new_loc, title=title, **kws)
fig, ax = plt.subplots(figsize=(15,6),dpi=150)
sns.kdeplot(data=norm_scores,clip=[0,5],shade=True,palette='Set1',ax=ax)
move_legend(ax, "upper left")
            

Output



                        

Clearly Fandango has an uneven distribution. We can also see that RT critics have the most uniform distribution. Let's directly compare these two

A KDE plot that compare the distribution of RT critic ratings against the STARS displayed by Fandango.


fig, ax = plt.subplots(figsize=(15,6),dpi=150)
sns.kdeplot(data=norm_scores[['RT_Norm','STARS']],clip=[0,5],shade=True,palette='Set1',ax=ax)
move_legend(ax, "upper left")
                

Output



                            

histplot which is comparing all normalized scores.


plt.subplots(figsize=(15,6),dpi=150)
sns.histplot(norm_scores,bins=50)
                    
                    

Output



                                

the worst movies rated across all platforms.


sns.clustermap(norm_scores,cmap='magma',col_cluster=False)
                        
                        

Output



                                    

The distribution of ratings across all sites for the top 10 worst movies.


plt.figure(figsize=(15,6),dpi=150)
worst_films = norm_films.nsmallest(10,'RT_Norm').drop('FILM',axis=1)
sns.kdeplot(data=worst_films,clip=[0,5],shade=True,palette='Set1')
plt.title("Ratings for RT Critic's 10 Worst Reviewed Films");
                            

Output


                        
                                        

Conclusion

Clearly Fandango is rating movies much higher than other sites, especially considering that it is then displaying a rounded up version of the rating. the top 10 worst movies, based off the Rotten Tomatoes Critic Ratings are :


norm_films = df[['STARS','RATING','RT_Norm','RTU_Norm','Meta_Norm','Meta_U_Norm','IMDB_Norm','FILM']]
norm_films.nsmallest(10,'RT_Norm')                           
                            

Output



                                        

Here we can clearly see tekken film got 4.5 stars from fandango, on the other hand comparison to Rotten tamatoes which had gave 0.4 rating


norm_films.iloc[25]                                    
                                    

Output


            
                                                


result = 0.4+2.3+1.3+2.3+3
avg_review =  result/5                                      
                                    


avg_review = 1.86

Hence Fandango is showing around 3-4 star ratings for films that are clearly bad! Notice the biggest offender, Taken 3!. Fandango is displaying 4.5 stars on their site for a film with an average rating of 1.86 across the other platforms!