Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
844 views
in Technique[技术] by (71.8m points)

csv - SQLite: Create File Structure Table from String Path. New query or modify existing?

I have a SQLite table:

FileDataID | Path
0            /FileAtRoot.txt
1            /video/gopro/father/mov001.mp4
2            /pictures/family/father/Oldman.jpg
3            /documents/legal/father/estate/will.doc

Using an elegant solution designed by forpas, a new table consisting of only the directory structure is created:

Directory | Directory_Parent | Value
0           null               root
1           0                  documents
2           1                  legal
3           2                  father
...

Reference: SQLite: Create Directory Structure Table from A List Of Paths

Now that a table of the directory structure exists, I need to link the original files to their parent by using a foreign key Directory_Parent in a new table:

FileDataID | Directory_Parent | Value
0            0                  FileAtRoot.txt
1            19                 mov001.mp4
2            9                  Oldman.jpg
3            4                  will.doc

How can I create this table from the original data using SQLite?

  • Can forpas's solution be modified so it creates both tables at once?
  • Or should this 2nd "file_struct" table be created in a 2nd SQLite query?

    A 1 megabyte example database can be found here:
    A bounty will be awarded for helping with this final question, thank you.
question from:https://stackoverflow.com/questions/65933201/sqlite-create-file-structure-table-from-string-path-new-query-or-modify-existi

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

For this to work, you must have already the table dir_struct (from your previous question) so that the file names can be inserted to the table with the id of the directory they belong to.

First I create the new table files:

CREATE TABLE files(
  FileDataID INTEGER REFERENCES listfile(FileDataID), 
  Directory_Parent INTEGER REFERENCES dir_struct(Directory), 
  Value
);

You must also create a unique index for FileDataID in listfile, because it is not defined as the PRIMARY KEY or UNIQUE, so columns in other tables (like the column FileDataID of files) can reference it.

CREATE UNIQUE INDEX idx_listfile_FileDataID ON listfile(FileDataID);

A recursive CTE is used to query dir_struct and build all the possible paths and it is joined to listfile to match the file names and their paths:

WITH cte AS (
  SELECT Directory, Directory_Parent, Value, '' full_path 
  FROM dir_struct
  WHERE Directory = 0
  UNION ALL
  SELECT d.Directory, d.Directory_Parent, d.Value, full_path || d.Value || '/'
  FROM dir_struct d INNER JOIN cte c
  ON c.Directory = d.Directory_Parent
)
INSERT INTO files(FileDataID, Directory_Parent, Value)
SELECT f.FileDataID, c.Directory, SUBSTR(f.Path, LENGTH(c.full_path) + 1)
FROM listfile f INNER JOIN cte c
ON f.Path LIKE c.full_path || '%' AND INSTR(SUBSTR(f.Path, LENGTH(c.full_path) + 1), '/') = 0 

See the demo, where the code for the insertions in dir_struct has also been modified because now the table listfile contains files at the root, which did not exist in the sample data of your previous question.

So the code in the demo must be executed as a whole.

I used your 1MB sample data and the queries ran very fast.
But, for 1M rows (from the link you first posted), which I also tested (and found duplicates which you must delete before doing anything else), the creation of the table files took about 1.5 hour.
As I mentioned in my answer of your previous question, if this is a one time thing then use it. If you will need it frequently, then you should consider something else.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...