If you’ve ever wanted to know what a document or piece of text is about without reading the entire thing, you’ll be glad to know you can do so using keywords. Keywords, in this context, are words or short phrases that concisely describe the contents of a larger text. This post describes the working of a relatively new approach to automatically generating keywords from a given document, called Rapid Automatic Keyword Extraction (RAKE).
We’re back! In the previous regex tutorial, we covered character classes and anchors in some detail. We also explained the use of raw strings when defining regular expressions in Python. Today’s post discusses quantifiers in detail and introduces the ideas of alternation and grouping, which are explained by building our own URL regex. It also has several regex challenges based on the concepts covered so far that will test your regex-building skills. Let’s begin.
(Level: Beginner to Intermediate)
This week’s post is about stemming. It’s a little different from our previous articles because we’ll discuss some English grammar before getting into the technicalities and coding. But even before that, try out this quick experiment (10 seconds, max) and you’ll probably immediately understand what stemming is all about.
“The beginning of wisdom is to call things by their proper name.” – Confucius
Hello there, we apologize for the delay in publishing this article. The last two weeks have been pretty hectic.
Now that you are equipped with the basics of text processing, it is high time that we move to some NLP specific concepts. This week’s article is about Named Entities, as the title suggests. You will understand what they are, why they are important, and how to identify them.
In the first regex post, we discussed the concept of regular expressions, some of their applications, and made a small program to extract years from a text. We also looked at some important character classes like uppercase letters, word characters, digits and whitespace. In this regex tutorial, we will learn in greater depth about character classes and anchors.
“Constantly talking isn’t necessarily communicating.” – Charlie Kaufman
So far, we have covered the basics of regular expressions and tokenization. It must be evident by now how simple, yet fundamental these concepts are. Today’s lesson covers another important concept that is almost absolutely essential to any NLP task; stopwords filtering. You will understand what stopwords are, why we need to filter them and how to remove them.
“A computer is only as smart as its programmer.” – Unknown
Last week’s tutorial covered the basics of regular expressions or regexs, along with some sample code for your understanding. This week is about tokenization. Sounds fancy? Easy to understand, yet extremely powerful. By the end of this tutorial, you’ll understand what it is, why you will need it, and how you can build your own tokenizer.