By Cheng-Chun Lee, Sanadhi Sutandi, Skander Hajri.

Background

Music has become one fundamental part in our daily activities. Unconsciously, we listen to music everytime and anywhere, e.g. while cooking, sitting in Rolex (either in silent area or cafeteria), coding your project, cycling to Vevey, and so on.

A famous violinist once said. Music transcends words. By exchanging notes, you get to know one another, to understand one another. As if your souls were connected and your hearts were overlapping. It's a conversation through instruments. A miracle that creates harmony. In that moment, music transcends words -K.

It is widely known that music with lyrics, well known as “Song”, is one favorite way to express human emotional feelings and expressions. Song is one of the greatest creations of human kind in the course of history and now it has already been transformed into music industry.

It is exicing for us to elaborate what factors influence songs popularity at most. Thus we present the analysis of songs’ popularity as our final project for ADA course!

Datasets

Throughout this project, we mainly use Million Song Dataset that has collection of audio features and metadata of popular and unpopular songs.

In addition we also utilize two additional datasets:

Important fields of Million Song Dataset:

track_id
The primary identifier field for all songs in dataset.
song_hotttnesss
the popularity of a song measured with value of between 0 - 1.

Observing Songs' Popularity

Using correlation matrix, we can briefly observe which features influences songs’ popularity. Compared with other features, artist_familiarity, artist_hotttnesss, year have stronger correlation into song_hotttnesss.

In order to obtain more accurate results, we use random forest classifier to predict whether a song is popular/unpopular. We get a high accuracy of 97,52% by random forest, and then we observe the attribute feature_importances to see which feature matters the most. We figure out that two most important features are:

artist_hotttnesss
The popularity of an artist (usually last for short-term)
artist_familiarity
The indication of how well-known an artist is (usually last for longer-term)

More into Exploratory Data Analysis

Artist and Release Album

Let’s compare the distribution of occurence for popular and unpopular songs coming from same artist and release album in order to justify the previous results.

While we infer that, in average, at least 4 of popular songs are coming from the same artist, we see that there is a tiny clear distinction between popular songs and unpopular songs. In average, at least two popular songs are coming from the same release album in list.

This strengthen our previous analysis that an artist himself/herself (artist_hotttnesss and artist_familiriaty) gives significant correlation to the song_hotttnesss.

Popular = Blue, Unpopular = Green

We spot for both popular and unpopular songs, they are mostly coming either from United States (Eastern America) or European Union (England). In general, songs coming from non-english countries are tend to be unpopular. There is a high possibility that audiences around the world prefer to listen for songs in English.

Rock songs are favorite music for audiences from 2001-2009.

However, in 2010, pop becomes the top first popular genre. This indicate that music popularity is inconsistent and can change as time goes by.


Herding Bias in Songs

Have you taken a close look at your playlist? Do you notice that several songs from your playlist are actually from certain artists?

We define this phenomenon as herding bias, and we guess this phenomenon would exist because once the artist/artist gives a positive impression on users, they are more willing to listen to, or even more likely to love their songs. To measure the degree of herding bias, we use the following formula:

Consider there are M data of user, pm is play_count of mth user, and sm is the singer of mth data where Im is equal to 1 when sm exists more than once in M data, and is equal to 0 otherwise (exists only once).

We analyze playlists of 1022 users, and get the following distribution (To avoid misleading in histogram, we make bins = 50 to get a higher resolution.)

We find there are 160 people out of 1022 people (only 16%) don’t have herding bias, and the median value is of herding bias is 0.38. Which means, users commonly listen, at least 38%, songs from certain artists. Is it a good thing or a bad thing? This is a subjective question, if you love to try new stuff, then don’t let herding bias constrain yourself!

Tendency of Hearing Singers’ Voice, not the Songs

Are songs from popular artists usually popular? We collect 25 popular and 20 unpopular artists in 2010, and analyze the song hotness of their songs. Surprisingly, it differs a lot!

This kind of phenomenon is like “Rich gets richer”, once you gain more connections (popularity), the more possible that your songs will be popular. Now, let’s observe the clickthrough rate of 2 popular artists in 2017:

The Importance of First Performance

Do the songs in the first year matter a lot for artist? Are they key to success for artists? We observe nowadays people could get popular or famous because of single event (You always can find viral videos to watch when you are bored, right?), and hence we want to see whether this would also somehow lead to the career success of a singer. To do so, we choose several popular and unpopular artists during 1995-2000, 2000-2005 and 2005-2010, and observe the song hotness of their songs in their first year:

The scatter plot tells us artists may need to seize the opportunity in their first year because several recently popular singers make a success during their first year! Let us give 2 classical examples: Psy and Taylor Swift:

The Korean artist, Psy, becomes extremely popular because of the song “Gangnam Style”. On May 31, 2014, the video for “Gangnam Style” hit 2 billion views, and since then, Psy and Psy’s new songs are always popular.
The American artist, Taylor Swift, starts her career since 2006. In 2007 and 2008, four single songs are published, "Teardrops on My Guitar", "Our Song", "Picture to Burn" and "Should've Said No" are all highly successful on Billboard Hot Country Songs chart.

Lyrics of Songs

Do people tend to listen to songs that contains certain terms or themes?
Here we only display the figures for the popular songs.

Explanation:

The results being very similar for the popular and unpopular songs in which for the two first categories we have that the top word is by far ‘yeah’. Hence as a first conclusion we might say people do not really care about lyrics as ‘yeah’ isn’t related to any specific topic. Apart from ‘yeah’ we can see a lot of top words concerning themes such as youth, world, and verbs that refer to desire(wish, want).

For the two remaining categories we have a different result. Over all the songs we can see that the most recurrent word is ‘love’ and there are many other high-ranked words that recall feelings (feel, like, want, baby, heart, girl). So emotional, isn’t?

Sentiment Analysis of Songs

From a list of positively/negatively connoted words, lets determine whether a popular song is usually positive(happy) or negative(sad).

We have about 43.6% positive songs and 56.4% negative songs for the tracks with high hotttnesss and about 40% positive songs and 60% negative songs for the tracks with low hotttnesss. Here, we have no significant difference between popular and unpopular songs.

Presence of “Slang Words” in Songs

“Slang words”, such as insults or controversed subjects, are gathered in frequencies within popular/unpopular songs which will give an estimation of the lyrics quality.

We have a ratio of 30.6% top songs contain bad words. For the unpopular songs we get lower ratio of 22.1%. So people might be more interested in borderline songs ?


Users Behaviours Analysis

Now, we give an example of analyzing user behavior in listening songs according to playcount distribution, favorite singer, herding bias, and genres.

User analysis:

Based on playlist record, 32.1 % of user playcounts contribute to at least 2 songs from the following singers:

Benabar: Les Mots D'Amour, L'Itinéraire, Y'a Une Fille Qu'Habite Chez Moi,

The genre this user love the most: 1. French


Conclusions

Using million song dataset, we elaborate more about how people react to songs, especially for popular and unpopular songs. We come to conclusion of:

That’s all of our project. Thanks for reading and keep listening from different singers :)