What BTS Sings About: Using NLP to Analyze the Boy Band's Lyrics

Welcome to Huntsy Beats!

While I aim to discuss many different topics in music on this platform, this inaugural post will discuss one of my favorite acts in the music industry today: BTS.

Connecting with BTS

A lot has already been written about the rise of BTS, the Korean pop phenom that has amassed millions of fans across the globe and continues to confound and impress the music industry with its ever-growing success. Although the group has been active since 2013, their popularity seems to balloon with each new musical release. On August 21st, the band’s first fully English release, “Dynamite,” debuted at No. 1 on the Billboard Hot 100, making it the first time a Korean artist has achieved that feat. Spotify reported in an episode of its For the Record podcast that “Dynamite” also pushed a 300% increase in people listening to BTS for the first time on its platform; all told, BTS songs have been streamed over 11.4 billion times. Not surprisingly, BTS is often also the most streamed artist in K-pop. Of the 100 million unique Spotify listens to K-pop songs in the last 30 days, BTS accounts for almost half of those.

Kat Moon, a Time journalist interviewed on the BTS For the Record episode, attributes a large part of BTS’ relatability and popularity to their lyrics:

Throughout their songs, BTS have always had quite introspective lyrics. They’re one of the first ones to sing so openly about depression and mental health illness. Because they do have these lyrics that are open and vulnerable, it really allows listeners to connect more with them.

As a frequent listener of BTS, I found myself agreeing with Kat’s assessment. I wanted, however, to see if a data-driven analysis of the group’s lyrics would find Kat’s hypothesis to hold true. To do so, I used tools in natural language processing (NLP), a subfield of computer science focused on the interactions between computer programs and natural language data, and followed a process similar to that of Codecademy’s analysis of Taylor Swift lyrics to determine how BTS’ lyrical content has changed over time.

Analyzing BTS’s Lyrics

My main goals for analyzing BTS’s song lyrics were:

  1. To determine what themes are present in BTS songs and

  2. To explore how these themes have changed over each BTS album release.

For the analysis, I used a Kaggle data set that contains information on lyrics and songs across BTS’s current album discography. Until the release of “Dynamite,” BTS’s lyrics almost always included a mix of both English and Korean; in order to apply NLP to the text, I used the English-translated versions of the Korean words.

It is also very common for a BTS album to contain repackaged or remixed versions of previously released songs. In order to prevent the lyrical content of these additional versions from over-weighting a particular song and thus skewing the lyrical data, I attributed a song only to the first album in which it appeared. For example, the song “I NEED U” appears on both The Most Beautiful Moment in Life Pt. 1 and The Most Beautiful Moment in Life: Young Forever albums, but is only included in the former album for my analysis. The albums Skool Luv Affair and Skool Luv Affair (Special Addition) had suck similar track listings (only one new track, “Miss Right,” appears on the special addition) that I grouped both under the former’s name for the analysis.

To clean the data, I began by removing all stop words, or words that do not provide any significance for analysis, from the lyrics. The Python Natural Language Toolkit (NLTK) provides a library of stop words to check for, such as words like “myself,” “having,” “between,” etc. BTS songs include a lot of onomatopoeias, so I extended my list of stop words to include others commonly found in their songs, such as “uhh,” “woo,” and “ay”. I then lemmatized the data so that different forms of the same words could be analyzed as a single item (i.e. “dancing” and “dance”).

After processing the text, the next step is feature extraction, or reducing the dimensionality of the BTS lyrics data in a way that still maintains the content of these lyrics with sufficient accuracy. While there are many methods for feature extraction (Bag of Words is commonly mentioned), I utilized term frequency-inverse document frequency (tf-idf), a frequently used weighting scheme found in NLP, to determine how relevant a given word in a BTS song was, in the context of BTS's larger discography. Once I had my features, I then applied Non-Negative Matrix Factorization (NMF) to generate a set of topics, or co-occurring sets of words, that efficiently represent the songs across BTS’s discography. The top 10 words in the top 9 topics were:

Topic #1: look, get, time, go, day, want, right, one, know, say Topic #2: touch, wish, know, save, bae, live, fake, universe,  painful, love 
Topic #3: cause, fear, regret, although, burn, wonder, mask,  dream, answer, tear 
Topic #4: hurt, higher, smile, walk, want, cry, sky, spread,  wing, fly 
Topic #5: shadow, drink, cool, dream, pretty, woman, feel,  dance, baby, wanna 
Topic #6: end, idiots, visible, vain, even, house, dream,  collapse, card, way 
Topic #7: real, sleep, away, throw, ice, voice, pain, cover, sound, lake 
Topic #8: hop, music, hip, hyung, bang, top, beat, go, monster,  rap 
Topic #9: best, rewind, please, tell, explain, baby, fall,  need, know, girl

The corresponding labels I created to summarize these topics are respectively:

Moments in Time 
Love 
Fear & Insecurity 
Growth 
Superficiality 
Meaninglessness 
Reality & Pain 
Music 
Longing & Nostalgia

Every song in the data set receives a set of weights in the NMF coefficient matrix, which are the relative scores indicating how much a particular topic exists in that song. In order to delineate whether a topic appears significantly, I set a cutoff threshold to score >= 0.05. I was then able to sum up the number of times a topic appeared in a particular album (by summing up the number of songs where the relative score for that topic was greater than 0.05). Finally, I normalized the totals to account for the variation in the number of songs that appeared across different BTS albums.

Analysis Results

With the normalized number of songs for each topic, I had what I needed to create the below chart. It depicts how the frequency of different topics changes across BTS albums over time:

The topic Moments in Time features prominently across almost all of BTS’s albums. This appears consistent with Kat’s description of BTS lyrics as introspective, given how often BTS lyrics describes the boys’ thoughts during moments in time. More broadly speaking, the topics themselves, and particularly the words labeled in the Fear & Insecurity and Reality & Pain topics, demonstrate a level of vulnerability that appears across BTS’s discography as well.

Fear & Insecurity also displays an interesting trend over time. While the topic was less present during the boy band’s earlier albums, it was a prominent theme in the mid-2018 album, Love Yourself: Tear. Given that BTS made a historic international breakthrough with their performance at the American Music Awards in November 2017, it would not be difficult to suppose that the increased fear and insecurity the boys expressed through their music in 2018 was correlated to their overwhelming adjustment to global fame.

Over time, BTS has seen a decrease expression of Music as a lyrical subject. During their rookie years, the boys would often croon about the importance of having music in their lives to help them through difficult times, but this topic has been less relevant in recent albums as the group increasingly uses lyrics to communicate relatable messages to fans.

Perhaps the most notable trend is that some topics peak sharply for certain albums and other topics in others. For example, Longing & Nostalgia is prominent in the album Dark and Wild, where the group sings, “I rewind my girl baby come back to my world” (from “Outro: Do You Think It Makes Sense?”) and “all night girl / I think about you all night girl” (from “24/7=Heaven”). A different topic, Growth, features prominently in the 2017 album You Never Walk Alone; in one of the lead singles, “Not Today,” the 7-piece band sings about pushing past adversity. Even Love, a topic which resurfaces in a few different albums, takes a different shape across each. In the 2014 album Skool Luv Affair, Love generally refers to affections for another person; in the more recent case of Love Yourself: Answer's hit song “Epiphany,” however, Love is all about loving oneself.

Releasing albums that emphasize specific themes or topics seems to be by design. When asked about the songwriting process on the same For the Record episode mentioned earlier, the members of BTS described their inclination to start with a keyword for the album, then create lyrics and songs around them. “It’s like making a puzzle when we try to make an album,” answered RM, the band leader and an active songwriter, during the interview.

Already, it doesn’t seem like BTS's new album BE, set to be released on November 20th, will be an exception. According to Variety, “The new album imparts a message of healing to the world by declaring, ‘Even in the face of this new normality, our life goes on.’” Given the current global health crisis and the resulting hopelessness felt by many across the world, “healing” is an ideal keyword to create an album around, if there ever was one. What will be interesting to hear, then, are the other ways in which this album might be different, whether that means more fully English tracks or lyrical topics (excluding the pandemic-related ones) featured notably between the beats.