In this notebook I am going to analyse a dataset about the famous TV Series Friends. The dataset has been found and downloaded from Kaggle ~ Friends Series Dataset.
All copyrights for the display cover image of Friends are owned by the official FRIENDS American Tv Show and its creators. No copyrights for the display cover image presented are reserved by me!
First of all I will import all the libraries that I am going to need for my analysis.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import emoji
import plotly.express as px
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
%matplotlib inline
sns.set_style('darkgrid')
init_notebook_mode(connected = True)
cf.go_offline()
df = pd.read_csv('.../Datasets/FriendsDataset/friends_episodes_v2.csv')
df.head()
df.tail()
We can see that the two last episodes have the exact same title (*The Last One*). I, for the needs of this analysis and the interpretation of the results to words, will edit the two last titles into the following two:
- *The Last One (i)*
- *The Last One (ii)*
df.at[233 , 'Episode_Title'] = 'The Last One (i)'
df.at[234 , 'Episode_Title'] = 'The Last One (ii)'
df.tail()
df.columns
In this dataframe we have the following attributes:
Year_of_prod : Year of production.
Seasons : Season of the TV Show FRIENDS (1 to 10)
Episode_Title : The title of the episode.
Duration : The duration of an episode.
Summary : A summary of an episode.
Director : Who has directed the episode from the
Stars : How many starts out of 10 has the episode been awarded with.
Votes : How many people voted.
There is no need to change any of the attribute's name.
for i in df['Episode_Title']:
if i[0:7] == 'The One':
print('Good')
else:
print('Not good as:', i[0:7])
We can see that all the episodes apart from the last two, have a title starting with 'The One". Now let us try and create a list of the words that come after that, what was just reported.
Next_title_words = []
for x in df['Episode_Title']:
if x[8:13] not in Next_title_words:
Next_title_words.append(x[8:13])
print(Next_title_words)
We can now see the words that come after the usual title phrase "The One".
df[['Season','Episode_Title']].groupby('Season').count()
So according to the data provided we have the following information:
Season 1 has 23 episodes.
Season 2 has 24 episodes.
Season 3 has 25 episodes.
Season 4 has 24 episodes.
Season 5 has 24 episodes.
Season 6 has 25 episodes.
Season 7 has 24 episodes.
Season 8 has 24 episodes.
Season 9 has 24 episodes.
And finally season 10 has only 18 episodes!
In total one has 235 episodes to enjoy from this great TV show called FRIENDS.
df.info()
df.describe()
From the descriptives provided above, we know that there is no point trying to translate the attributes Year_of_prod and Season as those two are some (so called) dummy variables. When it comes to the other attributes we can see that the avg episode lasts 22.3 minutes and an avg episoded is rated with 8.45 stars out of 10 with 3352.28 votes. Pretty good for a TV Show I would say.
I know!!!
(Hopefully you got the joke..) Moving forward, according to the votings, let us see which season is the best!
df_best_0 = df[['Season','Votes','Stars']]
df_best_V = df[['Season','Votes']].groupby('Season')['Votes'].sum()
A_data = pd.DataFrame(df_best_V).sort_values('Votes',ascending = False)
A_data
fig = px.pie(A_data, values='Votes', names=A_data.index, title = 'Sum of votes collected per season of the TV Show FRIENDS.')
fig.update_layout(legend_title_text='Season')
fig.show()
From the table and the piechart demonstrated just above we can see that the most voted Season is the 1st one with 95397 votes.
df_best_S = df[['Season','Stars']].groupby('Season')['Stars'].sum()
B_data = pd.DataFrame(df_best_S).sort_values('Stars', ascending = False)
B_data
fig = px.bar(data_frame = B_data, x = B_data.index, y = 'Stars', title = 'Sum of stars collected per Season of the TV Show FRIENDS.')
fig.show()
From the table and the barplot created above, we can see that the 6th Season is the one that collected the highest number of Stars. Given that 2350 is the maximum amount of stars for the whole TV Show and that 235 is per season (divided evenly) we can say that Season 6 managed to collect 212.4/235, quite a good collection. Well, on the other hand the season that collected the least amount of Stars is the last one (nr 10) with a score of 156.2/235 (keep in mind that season 10 haw 18 episodes and the actual maximum is 180, though I decided to divide the full amount of stars evenly).
Well the next question that pops up my mind and many may disagree with the answer is which episode is the best (according to the voteers of course)?
df[df['Stars'] == df['Stars'].max()]
And so here we are, we have a winner or two 🤷.. Well the two best episodes according to IMDB's votes are Season 5 with episode title "The One Where Everybody Finds Out" and Season 10 with episode title "The Last One (ii)". Now apart from the top episode (since public opinion is divided), I believe there is an immense need to find out the Top-10 episodes of FRIENDS. So here we go...
Top_10 = df[['Episode_Title','Stars']].sort_values('Stars',ascending = True).tail(10)
Top_10
plt.figure(figsize=(10,5))
sns.barplot(data = Top_10, y = 'Episode_Title', x = 'Stars', palette='viridis')
plt.title('Top 10 Episodes according to IMDB Rating System', fontsize=15)
plt.xlabel('Stars', fontsize=13)
plt.ylabel("Episode's Title", fontsize=13)
plt.xticks(np.arange(9, 9.8, 0.1), fontsize=12)
plt.yticks(fontsize=12)
plt.xlim(9, 9.8)
nr_of_episodes = df[['Director', 'Season']].groupby(by='Director').count()
fig = px.bar(data_frame=nr_of_episodes, x=nr_of_episodes.index, y = 'Season')
fig.update_layout(
title="Number of Episodes that each Director directed",
xaxis_title="Directors",
yaxis_title="Number of Episodes",
font=dict(
size=18,
color="RebeccaPurple"
))
fig.show()
And finally, what we have here.... Those names are the ones that apart fromn David Crane, Marta Kauffman and Kevin S. Bright, that we all see in the end of every episode, directed and created this great TV Show called FRIENDS.
Well, this has come to an end. I hope that you found out something new through this EDA. One further research that needs to be conducted about this TV Series is whether, Ross Geller and Rachel Green were on a break or not but unfortunately the data provided are insufficient!
Namely the Friends are:
🐣🎩🍕👩❤️💋👨 Joey Tribbiani
👛🐱🎤🎸 Phoebe Buffay
👠💄💅👗 Rachel Green
💍👞🙊📚 Ross Geller
🛁👓🎾 Chandler Bing
🎂🧽👜👙 Monica Geller
Hope you've enjoyed my work, stick around for more!
Comments