How to search table recursively by given path for a tree-like structure in postgresql-10

Pratha Source

I have a table like this:

+-----------------------------------+
| id | client_id | main_id |  name  |
|-----------------------------------|
| 1  | 1         | NULL    | hello  |
| 2  | 1         | 1       | hello2 |
| 3  | 1         | 2       | hello3 |
| 4  | 2         | NULL    | hello  |
| 5  | 2         | 4       | hello2 |
| 6  | 2         | 5       | hello3 |
+-----------------------------------+

What I want to get id:3 by giving /hello/hello2/hello3 and client_id because /hello3 belongs to hello2 and hello2 belongs to hello. When I give a full path I want to return last path ID.

This is my table schema:

CREATE TABLE "public"."paths" (
  "id" serial8,
  "client_id" int8,
  "main_id" int8,
  "name" varchar(255) NOT NULL,
  FOREIGN KEY ("main_id") REFERENCES "public"."paths" ("id")
)
;
-- INDEX ON CLIENT ID's.
CREATE INDEX "cid" ON "public"."paths" USING btree (
  "client_id"
);

So far, With recursive I tried this:

WITH RECURSIVE full_paths AS
(SELECT id, name, main_id, CAST(name As varchar(1000)) As fname
FROM paths
WHERE client_id = 1
UNION ALL
SELECT x.id, x.name, x.main_id, CAST(y.fname || '/' || x.name As varchar(1000)) As fname
FROM paths As x
    INNER JOIN full_paths AS y ON (x.main_id = y.id)
)
SELECT id, fname FROM full_paths WHERE fname = '/home/home2/home3';

But I have a million of records in my table and this slow downs the request by querying whole table.

Also see below for EXPLAIN:

CTE Scan on full_paths  (cost=4383987797.32..7008489047.29 rows=583222500 width=40) (actual time=1254.573..1675.192 rows=1 loops=1)
  Filter: (fname = '/home/home2/home3'::text)
  Rows Removed by Filter: 482943
  Buffers: shared hit=23754, temp read=8510 written=13548
  CTE full_paths
    ->  Recursive Union  (cost=0.00..4383987797.32 rows=116644499999 width=61) (actual time=0.015..1476.644 rows=482944 loops=1)
          Buffers: shared hit=23754, temp read=8510 written=10261
          ->  Seq Scan on paths  (cost=0.00..13955.49 rows=482999 width=42) (actual time=0.013..127.433 rows=482943 loops=1)
                Filter: (client_id = 24)
                Rows Removed by Filter: 3
                Buffers: shared hit=7918
          ->  Merge Join  (cost=966864.46..205108384.18 rows=11664401700 width=61) (actual time=600.989..600.990 rows=0 loops=2)
                Merge Cond: (x.main_id = y.id)
                Buffers: shared hit=15836, temp read=8510 written=6974
                ->  Sort  (cost=69904.11..71111.60 rows=482999 width=29) (actual time=276.900..360.597 rows=482946 loops=2)
                      Sort Key: x.main_id
                      Sort Method: external sort  Disk: 19848kB
                      Buffers: shared hit=15836, temp read=4962 written=4962
                      ->  Seq Scan on paths x  (cost=0.00..12747.99 rows=482999 width=29) (actual time=0.010..106.355 rows=482946 loops=2)
                            Buffers: shared hit=15836
                ->  Materialize  (cost=896960.36..921110.31 rows=4829990 width=40) (actual time=192.873..192.876 rows=3 loops=2)
                      Buffers: temp read=3548 written=2012
                      ->  Sort  (cost=896960.36..909035.33 rows=4829990 width=40) (actual time=191.121..191.122 rows=3 loops=2)
                            Sort Key: y.id
                            Sort Method: quicksort  Memory: 25kB
                            Buffers: temp read=3548 written=2012
                            ->  WorkTable Scan on full_paths y  (cost=0.00..96599.80 rows=4829990 width=40) (actual time=0.012..44.830 rows=241472 loops=2)
                                  Buffers: temp read=3289 written=1
Planning time: 0.261 ms
Execution time: 1685.199 ms

How can I write a proper and effective fast query? Do I need to write functions (which I do not know how I'll be happy if you provide a sample function)?

postgresqlrecursive-querypostgresql-10

Answers

answered 6 days ago klin #1

You should filter visited rows by appropriate parts (names) of the wanted path. Add an auxiliary query (pattern) to convert the input path to an array and use elements of the array to shed off unnecessary rows.

with recursive pattern(pattern) as (
    select string_to_array('hello/hello2/hello3', '/') -- input
),
full_paths as (
    select id, main_id, name, 1 as idx
    from paths
    cross join pattern
    where client_id = 1 and name = pattern[1]
union all
    select x.id, x.main_id, x.name, idx+ 1
    from paths as x
    cross join pattern
    inner join full_paths as y 
        on x.main_id = y.id 
        and x.name = pattern[idx+ 1]
)
select id, name
from full_paths
cross join pattern
where idx = cardinality(pattern)

Working example in rextester.

comments powered by Disqus