Keyword Extraction using RAKE — May 26, 2017

Keyword Extraction using RAKE

If you’ve ever wanted to know what a document or piece of text is about without reading the entire thing, you’ll be glad to know you can do so using keywords. Keywords, in this context, are words or short phrases that concisely describe the contents of a larger text. This post describes the working of a relatively new approach to automatically generating keywords from a given document, called Rapid Automatic Keyword Extraction (RAKE).

Continue reading

Regex – Part 3 (with exercises) — April 6, 2017

Regex – Part 3 (with exercises)

(Level: Intermediate)

We’re back! In the previous regex tutorial, we covered character classes and anchors in some detail. We also explained the use of raw strings when defining regular expressions in Python. Today’s post discusses quantifiers in detail and introduces the ideas of alternation and grouping, which are explained by building our own URL regex. It also has several regex challenges based on the concepts covered so far that will test your regex-building skills. Let’s begin.

Continue reading

Stemming — March 19, 2017


(Level: Beginner to Intermediate)

This week’s post is about stemming. It’s a little different from our previous articles because we’ll discuss some English grammar before getting into the technicalities and coding. But even before that, try out this quick experiment (10 seconds, max) and you’ll probably immediately understand what stemming is all about.

Continue reading

Named Entities — March 12, 2017

Named Entities

(Level: Beginner)

“The beginning of wisdom is to call things by their proper name.” – Confucius

Hello there, we apologize for the delay in publishing this article. The last two weeks have been pretty hectic.

Now that you are equipped with the basics of text processing, it is high time that we move to some NLP specific concepts. This week’s article is about Named Entities, as the title suggests. You will understand what they are, why they are important, and how to identify them.

Continue reading

Regex – Part 2 — February 22, 2017
Stopwords —


(Level: Beginner)

“Constantly talking isn’t necessarily communicating.” – Charlie Kaufman

So far, we have covered the basics of regular expressions and tokenization. It must be evident by now how simple, yet fundamental these concepts are. Today’s lesson covers another important concept that is almost absolutely essential to any NLP task; stopwords filtering. You will understand what stopwords are, why we need to filter them and how to remove them.
Continue reading

Tokenization — February 4, 2017


(Level: Beginner)

“A computer is only as smart as its programmer.” – Unknown

Last week’s tutorial covered the basics of regular expressions or regexs, along with some sample code for your understanding. This week is about tokenization. Sounds fancy? Easy to understand, yet extremely powerful. By the end of this tutorial, you’ll understand what it is, why you will need it, and how you can build your own tokenizer.

Continue reading