postgresql - How to hande joins in Mongodb? -


In PostgreSQL I have two tables: URL (table, host indexed columns, 30 mLN lines with indexed pages) Hosts (table)
"text-post" itemprop = "text"> < P> Host indexed columns, 1mln rows, with information about host)

One of the most consistent selections in my application is:

  SELECT url. * Urls.host = host.host where urls.projects_id = join the host from the URL? And host.Is_spam is a null order by url. DESC, serial?   

The query runs very slowly in more than 100 000 rows in the URL table.

Since the table performance is slow and slow, I have read a lot about the NSQL database (such as MongoDB) that are designed to handle such large tables and keeping I'am in mind I will move my data to MongoDB. Everything will be easy, if I should choose the data from the URL table, the host table should not be checked. I have heard that MongoDB does not support joining, so my question is how to solve the above problem? I can put information about the host in the URL collection, but the field host.is_spam can be updated by the user and I have to update the entire URL archive. I do not know that this is the right solution.

I would be great for any advice.

You are right that is involved in the problem, but my guess is that it's just the wrong type of connection As is Frank H. Has mentioned, PostgreSQL should be able to process this type of query, rather it should be easily based on the frequency of hosts.is_spam . You probably want to clone the order on urls table id to optimize the order. Since you can only urls Regardless of * , you can create at least disk io by creating a partial index on hosts.host where is_spam is not empty to create It's easy to avoid a short list of hosts.

Try it:

  Select the url. Join urls at * urls.host = host.host from Urls and hosts.is_spam is not zero where urls.projects_id =? And hosts.host is empty   

or this:

 choose  from urls * where urls.projects_id =? And does not exist (select 1 from hosts where hosts.host = urls.hosts and hosts.is_spam are not empty)   

This is the only way to remove PostgreSQL from the anti-joint Will not allow a known spammy host to be mapped, with URLs or invalid hosts, the results can be different from your query.

Comments