database - How to handle joins between huge tables in PostgreSQL? -


itemprop = "text">

I have two tables: URL (table with indexed pages, host indexed columns, 30 mLn lines) host (with notification Table) about host, indexed column to host, 1mln rows)

One of the most consistent selections in my application is:

  SELECT url * Host host on url from host = host host WHERE urls.projects_id =? And host.Is_spam is a null order by url. DESC, serial?   

The query runs very slowly in more than 100 000 rows in the URL table.

Since the table performance is slow and slow, I have read a lot about the NSQL database (such as mongodibi) which is designed to handle such large tables but from PGSQL to Mangodebi The big issue for me to change the database is that I just want to try to optimize the PgSQL solution. Is there any advice for you? What should I do?

Add an index to the hosts.host column (primarily host ANALYZE statement to update a composite index on table, this case), and urls.projects_id, urls.id All statistics and spam percentages Regardless of the inspection of sub-paid performance.

If everything is always spam and if "projects", whatever, some number is too big. Explanation: The update of data makes it possible to understand the adapter that both urls and hosts tables are quite large (well, you have We do not see the schema, so we do not know your line size). The overall index starting with projects.id is expected that 1 will exclude most urls content, and its second component will immediately return the desired code In urls , it is quite likely that the base of the query plan selected by an index scan planer of urls will be the basis. Then it is necessary to put an index on hosts.host to make the host look efficient; Most of this large table will never be used. Here's where we believe that projects_id is reasonably selective (this is not the same value in the entire table).


1

Comments