Hi All,

I recently tried out the KDE/Plasma search (Baloo).

  1. Indexing full content was too slow (I have some 100GB of data), and I disabled it.

  2. Indexing filenames only was reasonably quick.

  3. The search was very restrictive (full words only, miscategorized files). To make it usable for me, I had to get a list of all files and dump it to fzf, which worked reasonably well.

  4. Using baloosearch6 to get a long list of files provides almost no noticable performance improvment over fd:

     > time ( baloosearch6 mimetype:application/pdf | wc -l )
     0.05s user 0.03s system 111% cpu 0.069 total
    
    
     > time ( \fd -H --no-ignore-vcs --xdev -tf -tl '.pdf$' | wc -l ) 
     0.24s user 0.15s system 364% cpu 0.107 total
    

    (Both commands found about 11,000 files. I’m using a SSD with about 500mbps read speed).

  5. If I try it again with a larger file set :

     > time ( baloosearch6 -d VSync/ '' | wc -l ) 
     0.23s user 0.10s system 123% cpu 0.264 total
    
     > time ( \fd -H --no-ignore-vcs --xdev -tf -tl --base-directory=VSync/ | wc -l )
     0.13s user 0.11s system 456% cpu 0.052 total
    

    This time baloo found 96000 files, and fd found 59000 files. (fd might have run faster cause of disk caching.)

fd used more CPU no doubt. But the wall time difference in performance is so small that it doesn’t make sense to me to use an indexed search anymore.

Any thoughts?

  • gi1242@lemmy.worldOP
    link
    fedilink
    arrow-up
    1
    ·
    2 days ago

    My images etc. are on a separate partition (300GB, not indexed). I certainly have tonnes of data in .git folders, which fd ignores. But the exclude_folders setting in baloofilerc seems to ignore most of these by default.

    I agree metadata and context makes a huge difference. Looking at my work flow, I’ve put all the data I need into the file names 😄. The metadata is borked for most of them cause many were download some 20+ years ago. So I put the author names and title into the filename to make it easy to search…

    Unfortunately the full path is ignored by Baloo. There’s main.* file in several folders; the parent folder name is ignored by Baloo search, so I dump all Baloo results to fzf and search there…