Hi All,

I recently tried out the KDE/Plasma search (Baloo).

  1. Indexing full content was too slow (I have some 100GB of data), and I disabled it.

  2. Indexing filenames only was reasonably quick.

  3. The search was very restrictive (full words only, miscategorized files). To make it usable for me, I had to get a list of all files and dump it to fzf, which worked reasonably well.

  4. Using baloosearch6 to get a long list of files provides almost no noticable performance improvment over fd:

     > time ( baloosearch6 mimetype:application/pdf | wc -l )
     0.05s user 0.03s system 111% cpu 0.069 total
    
    
     > time ( \fd -H --no-ignore-vcs --xdev -tf -tl '.pdf$' | wc -l ) 
     0.24s user 0.15s system 364% cpu 0.107 total
    

    (Both commands found about 11,000 files. I’m using a SSD with about 500mbps read speed).

  5. If I try it again with a larger file set :

     > time ( baloosearch6 -d VSync/ '' | wc -l ) 
     0.23s user 0.10s system 123% cpu 0.264 total
    
     > time ( \fd -H --no-ignore-vcs --xdev -tf -tl --base-directory=VSync/ | wc -l )
     0.13s user 0.11s system 456% cpu 0.052 total
    

    This time baloo found 96000 files, and fd found 59000 files. (fd might have run faster cause of disk caching.)

fd used more CPU no doubt. But the wall time difference in performance is so small that it doesn’t make sense to me to use an indexed search anymore.

Any thoughts?

  • just_another_person@lemmy.world
    link
    fedilink
    arrow-up
    5
    ·
    2 days ago

    Metadata and context is the difference.

    Using fd you literally only grab a list of, well…file descriptors. It’s not looking into content of anything, and specific built-ins ignore things like contents of gitignore files. See for yourself.

    You’re comparing apples and oranges here.

    If you have 100GB, the question is more about what you want it to scan, and why. If you don’t need to know where media files are, exclude those directories. Same with git repos and such.

    • gi1242@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      ·
      2 days ago

      My images etc. are on a separate partition (300GB, not indexed). I certainly have tonnes of data in .git folders, which fd ignores. But the exclude_folders setting in baloofilerc seems to ignore most of these by default.

      I agree metadata and context makes a huge difference. Looking at my work flow, I’ve put all the data I need into the file names 😄. The metadata is borked for most of them cause many were download some 20+ years ago. So I put the author names and title into the filename to make it easy to search…

      Unfortunately the full path is ignored by Baloo. There’s main.* file in several folders; the parent folder name is ignored by Baloo search, so I dump all Baloo results to fzf and search there…

  • Aiwendil@lemmy.ml
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    2 days ago

    For me the real advantages of baloo are metadata search and KDE integration.

    Searching for tags with baloosearch6 tag:<tag> is something I use rather often, I even use the star ratings in baloosearches with rating>=6. Combine that with a mimetype and and you have a quick playlist of all music you rated with 4 or more stars in dolphin: baloosearch rating>=8 AND type:audio.

    I also using baloosearch for images…the width and height keys are really useful for finding textures with specific dimension…something like baloosearch type:image AND Width>=2048 AND Height>=2048

    And the of course the KDE integration that makes this really useful…you can use baloosearch queries everywhere in KDE, in open-file dialogs, as bookmarks in dolphin or file-dialogs, for desktop widgets showing folders…you can easily create an activity that has several folder-views on the desktop each showing a different set of files with specific tag…so left folder-view showing all files tagged “WIP” while right folder-view shows all files tagged “Finished” (To use queries in KDE you need them in the form baloosearch:/?querry=<the querry as you would use it in balooserarch6>

    Edit:I wrote a reddit post some years ago about this…hope linking reddit is okay here: https://www.reddit.com/r/kde/comments/pmcshj/tip_baloosearch_kioslave/

    • gi1242@lemmy.worldOP
      link
      fedilink
      arrow-up
      2
      ·
      2 days ago

      Thanks; KDE integration is a big plus. But I haven’t been tagging or rating files … 🙄 so several of these queries are not useful for me. A fuzzy matching based on the whole file name seems to best fit my workflow… and there doesn’t seem to be an option to fuzzy match, or match on the whole path in Baloo.

      I was hoping dumping a long list of matches to fzf would give me a performance gain (because of Baloo’s indexing), but right now the performance difference seems negligible.

      • Aiwendil@lemmy.ml
        link
        fedilink
        arrow-up
        2
        ·
        2 days ago

        Yep, I really don’t understand why people use baloo without content indexing…if you do that other means like your fd or even mlocate will probably be better solutions if all you need is filename search. KDE integration is really the only advantage left then…and I don’t really see much need of creating bookmarks/folderviews with filename searches, you hardly ever have reoccurring searches for the same filenames.

        Baloo only makes sense to use with content indexing in my view…and there it hardly has any equal. I personally can’t be without this feature anymore. I use it actively since KDE4 days (anyone remembering nepomuk?) and my whole workflow is built on it.

        • gi1242@lemmy.worldOP
          link
          fedilink
          arrow-up
          2
          ·
          1 day ago

          I would love to have content indexing with full text search in files and emails. I used to run namazu / notmuch mail to index it (back when I would fetch it using offlineimap). But sadly my mail lives on an IMAP server now and isn’t searchable locally. And Baloo was simply taking too long to index all my files.

          I’m really surprised that an indexed search takes roughly as much time as a directory search. I guess it shows how much the filesystem and tools have evolved… back in the day, a full tree search through even 10GB would be slow and noisy…