I'm Building a Search Engine for the Django Ecosystem

tldr; Frustrated with the duopoly of Google and Bing, I decided to build a narrowly focused search engine to see if it can deliver better results. My first experiment covers the Django ecosystem.

See it here: https://django.curaffe.com

Search is a critical part of how I develop software

I picked up Django a few years ago. Recently, I have studied my own habits and workflow. This has led to some interesting insights.

  1. A search engine is a critical part of how I build software. I research issues, find solutions, and use them as entry points into a fragmented ecosystem of blog posts, StackOverflow answers, documentation, packages, and more.
  2. Search results are polluted with ads. Advertising masquerades as content in the "organic results," making the signal to noise ratio rather poor.
  3. There is a "namespace" issue which often return results irrelevant to the ecosystem. (eg - Tarantino's popular file, Django often shows up). I'm sure you have experienced the same.

Let's go deeper...

Ads dominate above the fold

Google without an ad-blocker stinks. One can actually get a page full of search results that are nothing but ads. It seems that every year the ads blend into "organic" search results more and more.

SEO ruined search. Full stop.

Google setup the rules, and the SEO practitioners have beat them at their own  game. It is a whack-a-mole situation that never ends, and it only hurts the end user.

I also take issue with all the extra "features" like snippets that keep users within the walled garden of Google rather than pointing them on to the actual content owners.

Last, smaller sites who have not optimized their SEO strategy have little to no visibility. This makes discovery a near-impossible task.

Context is everything

When I'm at work, I'm at work. I don't search for recipes, new cameras, or improving the lighting for my streaming setup. I generally come back to search when look for:

  • Stack traces and fixes
  • Packages and libraries
  • Syntax and algorithm implementation
  • Tutorials and documentation
  • Other miscellaneous things from time to time like job postings, media, courses, books, etc

I want focused search results for when I'm working – free from distractions, without ads of questionable relevancy, or junk SEO spam.

I want bespoke results, filters, and sources that current search engines cannot deliver

Google is okay for general use cases, but niche and purpose-built search engines excel as filtering and displaying results within that domain. Google cannot or will not compete here because the niche is too small to move the needle for them.

Developers know how to ask a computer questions

I understand this group (I'm one of them). I don't have to build a system to guess or infer what is asked. The distance between intent and result is more direct. This may not be the case when looking at other niches. It might also be difficult for new entrants into Django to ask the right questions until they learn the terminology.

Human curation beats AI and Automation

This engine is biased by design. I curate only the sites from which I get the most utility. I can add more, but only after vetting it myself.

Also, I can choose to boost certain results over others. For example, I prefer the official Django documentation to the answers from StackOverflow. Let's just say that is how I see the world. As such, users should be able to see that "boost" somewhere in the results. A little transparency in the algorithm.

I agree – this does not scale. But then again, it does not have to.

Scale is not an issue

Google and Bing have an impossible task in front of them. Every day they crawl millions/billions of pages into their massive indexes--an impressive bit of engineering.

I don't need that. I don't have to worry about scaling compute, network, memory, or storage beyond what my well-priced dedicated server can deliver. If I only intend to index a hundred or so websites, all that content packs nicely into a single server.

This eases the engineering burden significantly. I could run crawlers manually if I want to (hint: I don't). I can use off-the-shelf databases like Solr or Postgresql, depending on my needs.

Yay for simplicity.

Bespoke is too small for Google to pursue but too big an opportunity for me to ignore

It is my firm belief that Google's moat is drying up. There are a growing number of search engines out there, but their niches are too small, too complex, or moving too quickly for the giant to care.

The big elephant does not care about the crumbs, but to ants those crumbs are a feast.

The takeaway

We just went thorough some of insights and reasoning about my desire to test out a niche search engine for the Django ecosystem. Google is "good enough" in some cases, but I believe with the right niche and execution this search engine will be helpful to developers at all levels.

Hey, Sam Texas here. If you like what I wrote and want to see more then please consider:

Show Comments