CatalogSearch

From Netatalk Wiki
Jump to: navigation, search

Contents

Overview

This page explains the details behind the "CatSearch" functionality of netatalk. This is significant for performing file & folder searches from a Mac (OS X) client.

Note that CatSearch *predates* Spotlight, it is/was a method for providing fast/efficient filesystem search functionality and was the search method available since Mac OS 8, used before Spotlight came along .

Since Mac OS X 10.4 CatSearch is not used for search queries, if the queried filesystem supports Spotlight. For AFP filesystems, the AFP server must support Spotlight, which up to Netatalk 3.0 is still not the case, because Apple didn't document the relevant AFP functionality. The Netatalk team reverse engineered it so the one of the next Netatalk versions should add support for Spotlight.

Background

(Note: Any mention of Mac OS includes OS X, HFS includes HFS+)

The preferred file system format used by Macs, HFS, stores a volume's directory in one large file, called the catalog file. The records about files and folders on a HFS volume are organized in a B*-tree in this catalog file.

The catalog file is fairly contiguously stored on the disk, i.e. requires little seeking when reading it sequentially.

These facts makes searching the entire volume for files, e.g. by their name or date, very efficient: Reading all the records into memory, then matching them by name or date, is much much faster than doing it the classic way, which would read the root directory contents first, then recursively looking into every subdirectory.

Therefore, Apple soon added special file system API functions to search an entire HFS volume. The functions are called PBCatSearch (Mac OS 7), FSCatalogSearch (Mac OS 8.5 and later) and searchfs (of OS X's BSD calls). On the user interface end, Apple offered apps such as Find File and Sherlock to make use of these fast search operations. These tools allow the user to search for files by name, date range and similar file and folder attributes.

When Apple introduced their Spotlight technology in OS X 10.4, which focuses on searching content, the old CatSearch specific search options were mostly hidden in the user interface. 3rd party apps such as EasyFind and Find Any File are now giving access to the specific CatSearch options on OS X.

Note: Since not all file systems can efficiently offer such an optimized search operation, the CatSearch functionality is optional - an application on Mac OS can inquire a volume's parameters to tell if the underlying file system supports CatSearch - if it doesn't, the app can take suitable measure, e.g. perform a classic directory tree walk.

Netatalk's role

The CatSearch functionality can also be implemented by file servers over the AFP protocol. Mac OS offers this feature by default via its File Sharing, provided the shared volume supports CatSearch (e.g. is a HFS volume).

Netatalk, being an AFP server, offers CatSearch as well, since at least version 1.5, if not longer.

However, since volumes served by netatalk are usually not HFS formatted, and since Linux doesn't have inherent CatSearch support, netatalk need to employ special strategies that may even lead to complications. That's what will be documented below...

General CatSearch operation

A call to CatSearch can request the results for the following conditions:

All these rows may be combined in a single search so that all conditions must be met (there's not "or" operation). Furthermore, the entire condition set may be negated - then all items are returned that do not match the conditions.

CatSearch support by netatalk

General

By default, netatalk implements CatSearch in a straight-forward manner:

It simply performs its own recursive directory walk on the targeted shared directory, matching the conditions. This is pretty fail-safe but slow, due to the need to walk the dir tree in a seek-intensive way (provided it's hard disk and not an SSD medium). Yet, this is still much faster than if the client would be told that the server doesn't support CatSearch, requiring the client to walk the volume's dir tree itself: That would require the transmission of every item's properties over the network so that the client can match them, instead of netatalk performing the matching locally on the Linux machine.

"searchdb" optimization

Version 2.2 of netatalk introduces a new option "searchdb" that can be set in the config file.

If enabled, it can speed up searches via CatSearch significantly in special cases: It then uses a separate database (dbd) that's normally used by netatalk to remember the correlation between a file and the "CNID" (which is practically a reference number to any file or folder on a disk that Mac OS uses in place of a directory path). It can use that db to look up items by name, making this lookup much much faster than performing the dir tree walk to match the names.

The use of the dbd for this is quite, though: It can only find names that begin with the name to be matched. Since the CatSearch conditions do not provide a "name begins with" classifier, the only CatSearch condition that can be used here is "name matches exactly" - which is rarely used by a user - he usually searches for partial name matches.

In cases where netatalk can't use the dbd to look up a name, it automatically falls back to the classic tree search mode.

Known issues

Incomplete results

If you consistently don't get all the results that you expect, this could be related to a bug that got fixed in 2.2.2.

Name matching works only at beginning of item names

This can happen if the searchdb option is enabled. It happens due to a little bug in the searchdb optimization code, and has been fixed on [24 Feb 2012].

The only work-around is to disable searchdb support until the fix is applied.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox