We often hear success stories, so today I wanted to share the opposite! Ha, it’s really not a failure because I learned a lot, but you get my point.
The following is my thought process and attempts at discovering “different” data sources to help me find good and different domain names than the masses.
Since most domain investors all look at the very same data (think ExpiredDomains.net), my goal was to change that for myself and see if I could find some different sources to make domains shine that didn’t already shine because of metrics on popular sites.
My idea somewhat worked, but it required a witch’s brew of sources to make it work better.
I started looking at popular, mainly user-generated websites. My goal was to find data related to popularity and tied to a term. Think “likes” and “hashtags”. Some sites would work well, like Linkedin for example, but obtaining the data was a totally different task and the fail point for me and my project.
After a few weeks of looking around the web, I was set on 3 sources that not only had the type of data I was looking for, but also data that was accessible via web scraping or API.
- Instagram was the best. Both for Hashtag and Username data.
- Twitter was closely behind.
- Business Directory (I won’t name the site but it started with an M.)
None of the above required a “login” to obtain the data I was looking for and was important. There were other sources but at least something hindered the process to obtain the data.
How I did it:
I would take a raw list of expired domains for the day at GoDaddy Auctions. I filtered the full list down to .com domains only and reduced the length to 12 characters. You could could filter any way you wish. Then I would do a bulk WHOIS scan via DomainIQ to have domain age with the domains. I would then take only the domains that were 5 years or older. This would be my final list (normally around 2,000 domains) to obtain additional data on. I often used dictionary tools to help filter the lists, which was helpful but not always perfect.
I never scraped a website before and I had no idea how to do it. I picked ParseHub to do this part and it took a little bit of playing around and a paid subscription for a month, but it worked.
It was a process to convert all the domains from my list, into links to be scraped but I figured out ways to do it with Excel and notepad. It all took time, that’s for sure!
My focus was on Hashtags because Instagram would display those specifically and a count with each. This was perfect data because hashtags are words and Instagram would add them up and provide a real number with the terms.
Domains were jumping out at me, all showing popularity right along with them! I wasn’t looking for the best of the best (although they were highlighted), I was looking for domains that may fly under the radar. For a tiny example of what was being highlighted to me, all .com: EverydayCounts, KidsFestival, GreatestGifts, GreenCleaner, FreshStore, StoryMakers, TheEffects, FeedYourHunger and so many more.
Instagram hashtag data was very helpful. Really helpful on catchy type marketing terms and common terms. It was good, until it stopped!
Domain investing requires a “mix” of inventory and the common terms are good but so are business/branding names. This is where I used the business directory that started with an M. Again, I used ParseHub and my focus was on a search term URL that showed a “results” total, for the searched term. This would help me rank each term that I searched and the cream would rise to the top.
The business directory search involved some deeper work by me and more fancy domain related tools. I needed to convert my domain list, into a “search term” with split keywords (keeping the space) and making all that a URL string. It was challenging but I was able to do it. Keep in mind that these lists were 1,500-2,000 long, so there was no way for me to manually do this every day.
Again, the business directory was producing some nice results but this time, the results were more business name or business related terms. This was also good, until it stopped.
Instagram changed its API and really started blocking all scraping attempts. I simply couldn’t get access to the data any longer. I currently still can’t and didn’t not know how to get it even by paying for it from them.
The business directory that I was using would constantly block ParseHub. It wasn’t worth trying any more. I sent the company several emails to make a deal that I could pay for the type of data I was looking for but they never replied.
It was a fun experiment. The Instagram hashtag data helped the most but again, it mainly highlighted common phrases and popular terms. Both often make for good domains though! Seeing how often these terms were used, was also very helpful.
Twitter was similar to Instagram but it’s a different site and hashtags are used differently on a photo website compared to Twitter.
The business directory was a needed set of data to mix things up a bit. I really felt there would be many of these business directories, but I had a hard time finding one that I could get data from. I didn’t feel I should have to scrape data but it seemed really hard or expensive or simply no way to directly obtain the data from the companies.
If you can obtain this data from popular sites like LinkedIn for example, that would be helpful.
There are datasets around the web that can help you look at domain names differently. I really do not think this has been tapped into much. For people that are more familiar with this kind of data and how to obtain it, it would be interesting to use!
I hinted at this process in a different blog post called: Look At Domain Names with a Different Perspective