Email Harvesting makes use of Harvesting Bots to obtain email IDs en masse from online sources – although offline sources are also made use of. This post looks at Email Harvesting and Email Scraping, methods used by spammers to obtain your email ID. We’ll also see methods to prevent Email Harvesting if any.
Email Harvesting Definition
The method by which spammers collect email IDs for their use is known as email harvesting. The phrase is inclusive of both online and offline methods.
Normally, people think they might have given out their email ID somewhere on the Internet and the spammers’ bots got it for them. While it can be true, it can also be true that you did not enter the email ID anywhere except on a printed application form and yet it reached the database or mailing lists of spammers. Let us see some common methods used by spammers to harvest emails IDs online and offline.
Bots and Crawlers: Email Scraping
The most used and easiest method is to create a bot that crawls or scrapes the Internet for email addresses. Since email addresses have a specific format, it is easy to create bots that will read each phrase and the phrase containing pre-specified formats are picked up. Checking for email addresses through bots is known as email scraping. There are certain email scrapers available on the Internet but using those methods is neither profitable nor ethical.
For example, one of the bots looks for [email protected] while another looks for formats like “first_word [@] domain [dot] com”. The programmers look for different formats people use nowadays to post their email IDs. Since AT and DOT are required for any email ID, it is easy to build bots for email harvesting.
In their attempt to parse phrases, Spambots can pick up wrong email IDs, but that doesn’t matter as the email will bounce back and the wrong email can be removed from the list. For example, if there is a statement somewhere saying “Let’s meet at the cafe of Global dot net institution“, the bot can note it down as [email protected]. Though in this particular case, the email is wrong, you can understand how sophisticated the bots can be. They are made to pick up different formats and some may even send a test message to the email ID to make sure it exists, before adding it to the spammers’ mailing list.
Forums and Chat Groups
This too is similar to bots crawling the Internet, except that they look for email ID in headers of the text or in the body. Many people use emails to communicate with forums. In that case, finding email ID becomes easier as it is specified in the header – of the post – that is not visible to people visiting the forum. Same goes with groups such as IRC and other chat rooms.
Sale of Mailing Lists
Almost everyone likes making some money out of available resources. So if someone places an ad, say on craigslist, smaller websites and mailing companies fall for it. They sell the complete email list that contains your email IDs.
A company might have privacy policies and ethical systems in place, but if an employee goes corrupt, there is no stopping him or her from earning a little by selling your email ID. Social Engineering can also be used by spammers to lay their hands on a company’s email database.
Email IDs are printed or written on some paper such as admission forms. When the in-charge of those application forms or admission forms sees an ad that offers him 5 cents per email ID, why not go ahead and sell the email IDs? No one will know anyway. So the in-charge manually types in email IDs from the application forms, onto an Excel sheet and sends it across for some money.
Pinging Business Servers
This is a bit sophisticated than the above methods of email harvesting. When the bots ping to business servers, they are practically hacking the business network system. With proper procedures in place, such requests will be denied 99.99 times out of 100. But they can be successful 0.01 times and that is when your email ID gets into the spammers’ mailing list.
How to prevent email harvesting
Looking at the above email harvesting methods, I don’t think there is much you can do. But since it is true that not every company is going to sell mailing lists or email databases, it is better to have an email ID exclusively for sharing. You can use this one to receive newsletters or to login into different websites. That way, even if the damage is done, the extent of damage is restricted to that one email ID.
Some of the websites on the Internet ask you to spell out the special characters so that spam bots cannot read them. But we’ve seen how they can look for variations of email ID formats. This does reduce the chances of spamming but it is not completely foolproof.
Using a graphic signature is better. Create a graphic that has your email ID written on it. Since the bots cannot read the graphic at this point, your ID will be safe. But when you do that, don’t go ahead and link the graphic to your email ID. Linking graphic to email ID will nullify the purpose of that graphic email address or signature.
These methods can help prevent email harvesting and reduce your chance of being spammed, but are not 100% effective. If you can add to the methods for prevention of spam, please comment and share with us.