2017-05-26T13:32:46+00:00

Keyword Extraction using RAKE

May 26, 2017May 27, 2017/codelingo/Leave a comment

If you’ve ever wanted to know what a document or piece of text is about without reading the entire thing, you’ll be glad to know you can do so using keywords. Keywords, in this context, are words or short phrases that concisely describe the contents of a larger text. This post describes the working of a relatively new approach to automatically generating keywords from a given document, called Rapid Automatic Keyword Extraction (RAKE).

Continue reading →

Regex – Part 3 (with exercises)

April 6, 2017January 3, 2019/codelingo/1 Comment

(Level: Intermediate)

We’re back! In the previous regex tutorial, we covered character classes and anchors in some detail. We also explained the use of raw strings when defining regular expressions in Python. Today’s post discusses quantifiers in detail and introduces the ideas of alternation and grouping, which are explained by building our own URL regex. It also has several regex challenges based on the concepts covered so far that will test your regex-building skills. Let’s begin.

Continue reading →

Stemming

March 19, 2017March 25, 2017/codelingo/7 Comments

(Level: Beginner to Intermediate)

This week’s post is about stemming. It’s a little different from our previous articles because we’ll discuss some English grammar before getting into the technicalities and coding. But even before that, try out this quick experiment (10 seconds, max) and you’ll probably immediately understand what stemming is all about.

Continue reading →

Named Entities

March 12, 2017March 14, 2017/codelingo/Leave a comment

(Level: Beginner)

“The beginning of wisdom is to call things by their proper name.” – Confucius

Hello there, we apologize for the delay in publishing this article. The last two weeks have been pretty hectic.

Now that you are equipped with the basics of text processing, it is high time that we move to some NLP specific concepts. This week’s article is about Named Entities, as the title suggests. You will understand what they are, why they are important, and how to identify them.

Continue reading →

Regex – Part 2

February 22, 2017January 3, 2019/codelingo/Leave a comment

(Level: Beginner)

In the first regex post, we discussed the concept of regular expressions, some of their applications, and made a small program to extract years from a text. We also looked at some important character classes like uppercase letters, word characters, digits and whitespace. In this regex tutorial, we will learn in greater depth about character classes and anchors.

Continue reading →

Stopwords

February 22, 2017March 14, 2017/codelingo/2 Comments

(Level: Beginner)

“Constantly talking isn’t necessarily communicating.” – Charlie Kaufman

So far, we have covered the basics of regular expressions and tokenization. It must be evident by now how simple, yet fundamental these concepts are. Today’s lesson covers another important concept that is almost absolutely essential to any NLP task; stopwords filtering. You will understand what stopwords are, why we need to filter them and how to remove them.
Continue reading →

Tokenization

February 4, 2017March 12, 2017/codelingo/7 Comments

(Level: Beginner)

“A computer is only as smart as its programmer.” – Unknown

Last week’s tutorial covered the basics of regular expressions or regexs, along with some sample code for your understanding. This week is about tokenization. Sounds fancy? Easy to understand, yet extremely powerful. By the end of this tutorial, you’ll understand what it is, why you will need it, and how you can build your own tokenizer.

Continue reading →

Regex – Part 1

January 27, 2017April 5, 2017/codelingo/2 Comments

(Level: Beginner)

This is our first post in which we’ll really get our hands dirty with some coding. Today’s concept is an extremely useful one – regular expressions or regex. Once you get started with regex, there’s no turning back. An extremely powerful concept, it can be used to do things like – batch renaming of files, checking whether a given bit of text is a valid phone number, scraping useful information from a webpage, correcting a mistake you made repeatedly in a file (or tens, hundreds, even thousands of files at a time), and MUCH more.

Continue reading →

The Road Ahead.

January 21, 2017February 2, 2017/codelingo/Leave a comment

Greetings!

Most journeys have a destination. It helps us plan our journey in advance, in order to get the best of the experiences. That’s what today’s post is about. A little about what we plan on doing over the course of the next few months, how we plan to do that and what we’ll need to get there. Exploring new places is an exciting activity. Traveling helps open your mind to so many different things that you may not have observed previously. More importantly, it helps you understand a great deal about yourself. If you haven’t tried it already, go ahead and explore a new place as soon as you can. You might have often heard people say that the journey itself is more important than the destination. We strongly believe in that idea, and that’s why we hope to make the journey an exquisite experience for you. Remember, we’re in this together.

Continue reading →

Hello World!

January 17, 2017February 22, 2017/codelingo/2 Comments

This blog is run by two curious people who have always had a fascination with computer science. Over the years, we have explored diverse areas ranging from game development to artificial intelligence, always seeking to expand our knowledge and never being satisfied with incomplete explanations, boring textbooks and uninspired teachers. Learning is supposed to be an enjoyable activity, and that is what this blog aims to achieve.

Continue reading →