Douban Through 200,000 Albums

Posted by Donche on November 8, 2017

Last time we talked about that my spider had got 60,000 albums. During these days I had been thinking about how to deal with these data. At the same time, the spider was also busy wandering aroud to get more. Untill now, I have 200,000 albums, and we’ll see what can we know from these enormous information

0. Review of Previous Blog

In the previous blog, I wrote a spider(or we can call it a Web crawler as well) to get the information about albums. In fact, there’s not only albums, but also EPs, collections and singles. But we may as well call them albums for convenience.

So last time, the information I got includes the name of the album, name of the artist, music genre, type of compilation(album, EP, single…), release time, ratings and the number of people who rated.

In this blog, comparison will mostly be made based on the difference between the genres which we can easily tell. So this kind of comparison also allows us to take a deep look at the diverse music genres and their listeners (and of course the preferences of the Douban users).

I will not say much about the characteristics of various genres (anyone who have no idea about any genre can check it out in wikipedia). In addition, because the information I’ve got is not complete, many albums do not actually have a genre. Although we have about 200,000, there are actually not many who have genre labels, only about 60,000. So the analyse below concerning the genre will only be done with these 60,000 albums(which is not a small number either).

1. Ratio

Since I started my spider from a rock collection called Guruguru Brain Wash(in fact it’s psychedelic/krautrock, but yes it is kind of rock) and went through the recommends of Douban, so from the begining it would be mostly rock. With more and more information being captured, the proportion of rock has been declining, and the pop has gradually taken a dominant position(which is quite logical). After 60,000 albums, the ratio of other genres became stable, only the pop keeps grabing the territory of the rock. The evolution process of music genres is as follows:

The final proportion of each genre is as follows:

It can be seen that pop, rock, electronic, classical and rap altogether account for more than 80% of the total number, among which the pop even accounts for one third of the total. And according to the first graph, if the spider continues to climb, the ratio of the pop would much likely continue to increase.

2. Genre comparison

This can be really interesting let the fight between genres begins!(No no…just kidding).

2.1 ratio of the album

In my opinion, the form of the album does have some meaning to the music, especially after Beatles’s Sgt. Pepper’s Lonely Hearts Club Band announced the extraordinary status of the concept album in the 1960s, the album became the best way for artists to express their ideas in a complete approach. Through the whole album’s emotional development and atmosphere changing, the artists can fully express the their ideological status at this time. Because of the different characteristics of emotions and different ways of expression, the ratio of albums to total circulation is also very different among the genres, which is shown below:

Although the number of pop is dominant, the proportion of albums is far below others. Latin music which is in the same level as the pop music only accounts for 0.1% of the total number of music, that is, only 60 or more. Therefore, the proportion of Latin music’s album may also be due to incomplete coverage of the Douban music. However, light music and world music, which are also rare species, are far ahead of the ratio of albums.

2.2 ratings

Ratings should be the best way to show the taste of people on the Douban. Not much to say, we can see the result as follows:
In terms of average scores, classical music is far ahead (as it should), and pop music is the lowest number (a matter of course). At the same time, blues, jazz and other high-powered genres also showed their strong strength, one score higher than the pop rating.

When it comes to the average number of scores for each album, the artistic taste of Douban fellows was displayed incisively. The folk sitting at the top of music in China won the first place with no suspense, leaving the second far behind. Classical, as the highest representative of taste, settled the last one and exchanged his best wishes with the least popular Latin music.
Of course, having a second does not mean that pop listeners’ power is inferior to those of the folk. In terms of the total number of scores, pop can be the undoubted first place:

3. Time Series

3.1 the booming in quantity

The 50s of the last century and the beginning of this century can be regarded as two important turning points of modern pop music. Humans established and developed the main forms of pop music at a previous point of time, while in the latter it achieved a quantitative blowout development. In particular, because of the popularity of the Internet, more people have come into contact with a large number of different music styles at a very low cost, thus embarking on the road to music.

3.2 backward-looking attitude(passéisme)

However, after further analysis of the average scores, the various genres showed a striking agreement: the past is always the best. The scores of several major genres evolved as follows:

It can be seen that since the 1990s and the beginning of this century, all genres have suffered great rating losses. Even the classical music can’t escape. Even if it fluctuates on occasion in the timeline, it will not hide the drastic decline in ratings of music albums in the past two decades. Then the question arises. Is it because of the uneven quality of the music outbreak or the romanticism of Douban users?

Reference Materials

  1. Python for Data Analysis