Detect and wait until a file has been unzipped to avoid permission errors

In my app the user can select a source folder to be synced with a destination folder. The sync can also happen in response to a change in the source folder detected with FSEventStreamCreate.

If the user unzips an archive in the source folder and the sync process begins before the unzip operation has completed, the sync can fail because of a "Permission denied" error. I assume this is related to the posix permissions of the extracted folder being 420 during the unzip operation and (in my case) 511 afterwards.

Is there a way to detect than an unzip operation is in progress and wait until it has completed? I thought that using NSFileCoordinator would solve this issue, but unfortunately it's not the case. Since an unzip operation can last any amount of time, it's not ideal to just delay a sync by a fixed number of seconds and let the user deal with any error if the unzip operation takes longer.

let openPanel = NSOpenPanel()
openPanel.canChooseDirectories = true
if openPanel.runModal() == .cancel {
    return
}
let url = openPanel.urls[0].appendingPathComponent("extracted", isDirectory: false)
var error: NSError?
NSFileCoordinator(filePresenter: nil).coordinate(readingItemAt: url, error: &error) { url in
    do {
        print(try FileManager.default.attributesOfItem(atPath: url.path).sorted(by: { $0.key.rawValue < $1.key.rawValue }).map({ ($0.key.rawValue, $0.value) }))
        try FileManager.default.contentsOfDirectory(at: url, includingPropertiesForKeys: nil)
    } catch {
        print(error)
    }
}
if let error = error {
    print("file coordinator error:", error)
}
Answered by DTS Engineer in 841732022

Thanks for your input. My idea of file coordination was that it would allow processes to make "atomic" changes to files and folders so that they are left in a consistent state to other processes that use file coordination as well.

Yes, that's generally what it's goal is, particularly with it's broader role in file versioning and syncing. However, the problem of your particular situation is:

  1. File coordination is inherently "opt in", so it only helps if the writing app (which you don't control) chooses to implement it.

  2. It looks like you want to operate in the "general case", meaning you're expecting to work with whatever directories the user specifies and with whatever apps/files they happen to be using.

Those to factors mean you simply cannot rely on file coordination. You can certainly choose to implement it and there are definitely case where it may be helpful, but you still need to figure out a solution that works for all of the other case where the writing process doesn't implement file coordination.

Unzipping seemed like an ideal fit to me, regardless how long it takes

No, this is not what file coordination is "for". It is NOT acceptable for an app to block inside a file coordination call for an "extended" period of time. File coordination calls are intended to be very brief (<~1s) "low level" I/O calls, not a tool for blocking long running operations. The problem with doing this:

(after all, the user should be aware that an unzip operation is in progress and shouldn't worry about other scan operations containing the zip archive apparently hanging for the duration of the unzip operation).

...is that you're basically setting up a "trap" for other apps running on the system. Apps expect file coordination calls to block for very limited duration and now those calls will end up blocking for far longer than they were ever expecting.

Note that this means that correctly using file coordination for large operations is more complicated than simply calling coordinate(writingItemAt:) and writing whatever you want. In practice, that means that large operations should generally be implemented using some variation of this approach:

  1. The app uses NSFileManager.url(for:in:appropriateFor:create:) to establish a private location on the same volume as the final destination.

  2. The app writes whatever it needs to write out to that location.

  3. The app starts a coordinated write, then uses NSFileManager.replaceItemAt(_:withItemAt:backupItemName:options:) to safely replace the existing file with it's new objects.

FYI, this same issue can also apply when reading data. Particularly on APFS (where file cloning makes file duplication extraordinarily fast), an app doing bulk copying might be better off using a coordinated read to create a "private" copy on the same volume, then using that private copy as the source for it's copy operation.

That leads to here:

If file coordination isn't the answer and the unzipped files are not meant to be accessed before the unzip operation has completed, perhaps it would make sense for unzip to write the temporary files to a standard temporary folder and then move them to the final location only at the end.

I won't try and provide a full/specific justification but here are two specific issues:

  • This approach has issues outside of AFPS (and HFS+). On AFPS and HFS+, the replace operation in #3 is atomic operation internal to the file system, which means it's both extremely fast and require no meaningful storage. On other file systems, it's going to require copying and will (temporarily) require double the total storage.

  • For some operations (copying and unzip included), there can be value in ensuring that whatever data the operation produces is accessible, even if the full operation never completes. This could be done by recovering data out of the temporary location, but the most straightforward solution is to just write directly to the final target.

As I mentioned, I'm not trying to avoid permission errors in general, but only those caused by temporary operations such as unzipping. My app keeps syncing, sends a notification when an error happens and allows the user to see the list of errors, but the user isn't forced to do anything for the app to keep syncing. But even if the user is aware that the errors are caused by the unzip operation, they would have to go through the list of errors (which could be quite long) to make sure that they haven't overseen an unrelated error. What I could do mitigate this is to mark an error as "solved" if a successive sync of the same file is successful.

The other thing I would add here is that simply heuristics can add significant value. For example:

  • The permission set used is distinct and not particularly common.

  • Lots of decompression happen in the same directory as the source.

...so if "foo" is set to "420" and in the same directory as "foo.zip", then there is a decent chance that this is an in-progress unzip operation. Not guaranteed of course, but the way I would think about this is that your goal here is to better manage work and present what's actually going on, not "perfection". Related to that point, going back to here:

sends a notification when an error happens and allows the user to see the list of errors,

Unless you've gone out of your way (at significant performance cost) to impose a specific file ordering, the order you process files in isn't going to be meaningful to the user. On HFS+, the catalog ordering behavior makes bulk iteration roughly alphabetical, but on most other file systems (particularly APFS) the order is going to look basically arbitrary/random. That's important here because it means there isn't really any reason your app HAS to tell the user about any particular issue at the moment you encounter it- you could just defer that location/file to later processing and silently keep going.

Now, there's obviously a balancing act there (you don't necessarily want your app to dump all of it's errors at the very end), but it can certainly be another tool you use to improve the overall experience.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

In my app the user can select a source folder to be synced with a destination folder. The sync can also happen in response to a change in the source folder detected with FSEventStreamCreate.

First of all, these are two radically different use cases. Once the user is doing something, then you can use file coordination to have (hopefully) some control while an operation is occurring. With file system events, you are simply getting a notification about something that happened at some point in the past.

The only real similarity here is that the file system is out of your control. Whatever you're doing, you have to account for unexpected changes to permissions, additions, deletions, etc. Unzipping is always a good test because it exercises all of that.

The short answer is that there is no answer. File sync is a hard problem, in a mathematical sense.

There's nothing wrong with a little delay. That is how FSEvents work, after all. But a delay doesn't solve anything, it just makes the higher-level logic a bit more manageable.

When dealing with file system operations, and especially file system events, there is no real concept of "failure". Your sync simply can't ever fail unless the entire hard drive controller goes up in smoke. Lower-level failures from the file system will be a regular occurrence, as in every few seconds. You simply have to handle them.

Another trick you can do is to take a snapshot of the directory you are working with. By that I mean just creating a representation of the directory tree in memory the way it is in the file system. (Just the file system, not the data) File coordination might help here. Then you work from that to do your sync. But this too is just a convenience. You have to expect that when you're working through the tree, the on-disk representation can radically change, or go away altogether.

First of all, these are two radically different use cases.

Thanks for your answer, but I'm not sure I see how they are radically different use cases. If the user starts unzipping, then starts a sync, it may find the same unfinished files that can be found if the sync is triggered by FSEventStreamCreate. That's actually how you can reproduce the permission error (and which I forgot to explain earlier):

  1. Run the code I provided which will show an open panel.
  2. Start unzipping an archive which was previously created by zipping a folder named "extracted".
  3. Before the unzipping completes you select the parent folder in the open panel and click Open.

The only real similarity here is that the file system is out of your control.

True, hence why I was hoping in a cooperative mechanism like NSFileCoordinator which would allow to coordinate writing to the extracted folder on zip's side, and later reading it on my side.

You simply have to handle them.

Not so simple :-) In this case, I would handle the "permission denied" error by postponing the sync until the unzip has completed, which is the question I'm trying to find an answer to.

I would handle the "permission denied" error by postponing the sync until the unzip has completed, which is the question I'm trying to find an answer to.

Unzipping is a good test case, but there are other operations that could trigger the same problem. The end user could act as root and create an unreadable directory, leaving it there forever.

Unless this is an operation that you initiated, with a reliable completion mechanism, then there is no way to tell when it completes. Maybe the user runs out of disk space, or the file is corrupted, and it never completes.

The end user could act as root and create an unreadable directory, leaving it there forever.

In this case it would be expected that the sync results in an error. I'm only looking to handle temporary permission errors caused by running operations such as unzipping.

Maybe the user runs out of disk space, or the file is corrupted, and it never completes.

Ideally the unzipping operation would still allow me to detect that it has completed / aborted, e.g. by deleting the files extracted so far.

A few different thought/comments here:

True, hence why I was hoping in a cooperative mechanism like NSFileCoordinator which would allow to coordinate writing to the extracted folder on zip's side, and later reading it on my side.

One thing to understand here is that the word "cooperative" is a huge part of the issue for an app that's trying to do what you're describing. NSFileCoordinator is a great solution in theory but if you try to build on it as the "exclusive" solution you'll find that there are way to many case which simply don't "cooperate".

As a side comment on that point, even when file coordination is being used that doesn't mean it will do what you want. You're expectation here seems to be that the unzip routine would issue a coordinated write against the entire directory which ends when it's "done", but that's the kind of very long running operation we specifically warn against.

More broadly, part of what you're fighting here is an example of "Inferring High-Level Semantics from Low-Level Operations". You want to know that an unzip (or other long operation) is occurring, but all the system can tell you that that API level is basically "stuff is changing" and the current permission state.

Note that these issues are what directly lead to the File Provider architecture. It's not necessarily that our API can do this an inherently "better" way, it's that it's better positioned to provide coherent behavior.

Not so simple :-) In this case, I would handle the "permission denied" error by postponing the sync until the unzip has completed, which is the question I'm trying to find an answer to.

I think what's really helpful here is to shift how you think about what the "problem" here actually is and what "handling" actually means. The problem here isn't "permission denied". It's normal and expected that a process could encounter files it doesn't have access to. Similarly, "postponing the sync until the <operation> has completed" isn't really a solution. The time here is TOTALLY unbounded and, more importantly, nothing in the system will tell you how long it might take. Most importantly, there's a good chance the user already knows exactly what's going on and doesn't have a issue with it.

Jumping back to your original statement here:

If the user unzips an archive in the source folder and the sync process begins before the unzip operation has completed, the sync can fail because of a "Permission denied" error.

In terms of the user experience, the worst case for that flow looks like this:

  1. User clicks "Sync" button.

  2. App posts "Sync failed: Permission Denied".

  3. User dismisses error dialog and returns to #1, repeating until sync finishes.

The problem in that flow isn't that the sync failed, it was the iteration of the same pointless error.

Similarly, the solution here is often about improving the interface experience and "flow", NOT actually fixing/handling any problem. That means thinking about this in terms of:

  • How do you inform the user of issues in away that minimizes/eliminates any disruption?

  • Using things like waiting and retrying to "hide" short term disruptions from the user.

  • Deciding what your app should do for really LONG disruptions ("an hour") and then building that solution.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for your input.

My idea of file coordination was that it would allow processes to make "atomic" changes to files and folders so that they are left in a consistent state to other processes that use file coordination as well. Unzipping seemed like an ideal fit to me, regardless how long it takes (after all, the user should be aware that an unzip operation is in progress and shouldn't worry about other scan operations containing the zip archive apparently hanging for the duration of the unzip operation).

As I mentioned, I'm not trying to avoid permission errors in general, but only those caused by temporary operations such as unzipping. My app keeps syncing, sends a notification when an error happens and allows the user to see the list of errors, but the user isn't forced to do anything for the app to keep syncing. But even if the user is aware that the errors are caused by the unzip operation, they would have to go through the list of errors (which could be quite long) to make sure that they haven't overseen an unrelated error. What I could do mitigate this is to mark an error as "solved" if a successive sync of the same file is successful.

If file coordination isn't the answer and the unzipped files are not meant to be accessed before the unzip operation has completed, perhaps it would make sense for unzip to write the temporary files to a standard temporary folder and then move them to the final location only at the end.

Accepted Answer

Thanks for your input. My idea of file coordination was that it would allow processes to make "atomic" changes to files and folders so that they are left in a consistent state to other processes that use file coordination as well.

Yes, that's generally what it's goal is, particularly with it's broader role in file versioning and syncing. However, the problem of your particular situation is:

  1. File coordination is inherently "opt in", so it only helps if the writing app (which you don't control) chooses to implement it.

  2. It looks like you want to operate in the "general case", meaning you're expecting to work with whatever directories the user specifies and with whatever apps/files they happen to be using.

Those to factors mean you simply cannot rely on file coordination. You can certainly choose to implement it and there are definitely case where it may be helpful, but you still need to figure out a solution that works for all of the other case where the writing process doesn't implement file coordination.

Unzipping seemed like an ideal fit to me, regardless how long it takes

No, this is not what file coordination is "for". It is NOT acceptable for an app to block inside a file coordination call for an "extended" period of time. File coordination calls are intended to be very brief (<~1s) "low level" I/O calls, not a tool for blocking long running operations. The problem with doing this:

(after all, the user should be aware that an unzip operation is in progress and shouldn't worry about other scan operations containing the zip archive apparently hanging for the duration of the unzip operation).

...is that you're basically setting up a "trap" for other apps running on the system. Apps expect file coordination calls to block for very limited duration and now those calls will end up blocking for far longer than they were ever expecting.

Note that this means that correctly using file coordination for large operations is more complicated than simply calling coordinate(writingItemAt:) and writing whatever you want. In practice, that means that large operations should generally be implemented using some variation of this approach:

  1. The app uses NSFileManager.url(for:in:appropriateFor:create:) to establish a private location on the same volume as the final destination.

  2. The app writes whatever it needs to write out to that location.

  3. The app starts a coordinated write, then uses NSFileManager.replaceItemAt(_:withItemAt:backupItemName:options:) to safely replace the existing file with it's new objects.

FYI, this same issue can also apply when reading data. Particularly on APFS (where file cloning makes file duplication extraordinarily fast), an app doing bulk copying might be better off using a coordinated read to create a "private" copy on the same volume, then using that private copy as the source for it's copy operation.

That leads to here:

If file coordination isn't the answer and the unzipped files are not meant to be accessed before the unzip operation has completed, perhaps it would make sense for unzip to write the temporary files to a standard temporary folder and then move them to the final location only at the end.

I won't try and provide a full/specific justification but here are two specific issues:

  • This approach has issues outside of AFPS (and HFS+). On AFPS and HFS+, the replace operation in #3 is atomic operation internal to the file system, which means it's both extremely fast and require no meaningful storage. On other file systems, it's going to require copying and will (temporarily) require double the total storage.

  • For some operations (copying and unzip included), there can be value in ensuring that whatever data the operation produces is accessible, even if the full operation never completes. This could be done by recovering data out of the temporary location, but the most straightforward solution is to just write directly to the final target.

As I mentioned, I'm not trying to avoid permission errors in general, but only those caused by temporary operations such as unzipping. My app keeps syncing, sends a notification when an error happens and allows the user to see the list of errors, but the user isn't forced to do anything for the app to keep syncing. But even if the user is aware that the errors are caused by the unzip operation, they would have to go through the list of errors (which could be quite long) to make sure that they haven't overseen an unrelated error. What I could do mitigate this is to mark an error as "solved" if a successive sync of the same file is successful.

The other thing I would add here is that simply heuristics can add significant value. For example:

  • The permission set used is distinct and not particularly common.

  • Lots of decompression happen in the same directory as the source.

...so if "foo" is set to "420" and in the same directory as "foo.zip", then there is a decent chance that this is an in-progress unzip operation. Not guaranteed of course, but the way I would think about this is that your goal here is to better manage work and present what's actually going on, not "perfection". Related to that point, going back to here:

sends a notification when an error happens and allows the user to see the list of errors,

Unless you've gone out of your way (at significant performance cost) to impose a specific file ordering, the order you process files in isn't going to be meaningful to the user. On HFS+, the catalog ordering behavior makes bulk iteration roughly alphabetical, but on most other file systems (particularly APFS) the order is going to look basically arbitrary/random. That's important here because it means there isn't really any reason your app HAS to tell the user about any particular issue at the moment you encounter it- you could just defer that location/file to later processing and silently keep going.

Now, there's obviously a balancing act there (you don't necessarily want your app to dump all of it's errors at the very end), but it can certainly be another tool you use to improve the overall experience.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for the discussion, it will be helpful in deciding how to handle long operations.

I just wanted to clarify one thing. I've been using NSFileCoordinator for copying files for quite some time now, i.e blocking for the duration of the file copy operation. Did you mean that I shouldn't be doing that, since copying a file can potentially take more than 1 second?

I just wanted to clarify one thing. I've been using NSFileCoordinator for copying files for quite some time now, i.e blocking for the duration of the file copy operation. Did you mean that I shouldn't be doing that, since copying a file can potentially take more than 1 second?

Yes, that's something I'd try and avoid. Building on our conversation here, what I would actually do is something like this:

  1. Create temporary/working directories on both the source and destination volumes. The "temporary" case would come from "url(for:in:appropriateFor:create:)", while the "working" case would probably be something I'd create in an obvious way ("<app name> in progress files") or have the user select.

  2. Perform a coordinated read and clone source object to source working directory.

  3. File is copied from source working to destination working. Theoretically this could also be coordinated, however, practically that doesn't really matter as this operation is between "your" directories.

  4. Perform a coordinated write and use replaceItem(at:withItemAt:backupItemName:options:resultingItemURL:) to replace destination object with destination working object.

Note that the coordinated operations at #2 (clone files) and #4 (atomic replace or series of moves) are all very short lived operations. The exception here is if a non-APFS source forces you to be a copy but that's a case which forces a much broader consideration and dropping the initial duplicate entirely.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

I didn't receive some notifications for recent replies to my posts. Were notifications disabled during the WWDC25 week?

I'm not sure I understand the difference between "temporary" and "working (with in progress files)". Wouldn't the temporary one be for in progress files too? And do you mean that I shouldn't use the temporary folder in a production app?

I didn't receive some notifications for recent replies to my posts. Were notifications disabled during the WWDC25 week?

Not that I'm aware of, however, I'm also not sure they've ever been entirely reliable.

I'm not sure I understand the difference between "temporary" and "working (with in progress files)".

Here's what I meant by that:

  1. "Temporary"-> One of the system provided directories (retrieved through "url(for:in:appropriateFor:create:)") which are typically invisible to the user.

  2. "Working"-> A user visible directory that the user either directly specifies/creates.

  3. "Working-ish"-> (variant of #2) A user visible directory your app creates "for" this, probably within the destination hierarchy. For example, creating a directory like "<app name> In Progress" at the top level of the hierarchy you're creating.

Wouldn't the temporary one be for in progress files too?

In terms of how your app uses these directories, they'd all work exactly the same way.

The difference between the two (and which is the "right" choice) really depends on the details of what your app is actually "for" and who's using it. The advantage of #1 ("Temporary") is that it's the "easy" choice, since the user doesn't "do" anything. More importantly, if your app copies between lots of different locations (so there isn't a consistent destination), it means you don't have a keep asking the same question over and over again.

The advantage of #2 ("Working") is that it gives the user control over what's actually going on, which can be important in an app that's more "expert" oriented. For example, while #1 "should" give you a directory that's "always" writable, that's also not a guarantee an API like that could ever give you. The world is a complicated, fairly broken place, particularly when you're talking about network volumes, so this approach gives you an "escape hatch" if/when you run into one of those "weird"* cases.

*Note that I'm not talking about any particular failure case or suggesting that "url(for:in:appropriateFor:create:)" is unreliable. Frankly, the most common "weird" case is likely to be "Grumpy Old Unix Head™ who happens to hate <insert arbitrary thing>". Similarly, the main reason something like "url(for:in:appropriateFor:create:)" would fail is that said Grumpy Old Unix Head™ has configured** their server in a way that breaks it.

**Why he did that is the sort of question people who haven't met enough Grumpy Old Unix Heads™ ask.

Finally, for both #2 and #3, using a user visible directory gives the direct access to your intermediate data, potentially allowing them to recover/access that intermediate data if/when "something" goes wrong.

A few final notes on this:

  • None of these choices exclude the others. An app could use #1 as the default, then offer #2 as an option and/or fallback to #3 if "something" goes wrong with #1.

  • Understanding who your users are and how they'll be using your app is critical. In a app that's designed to quickly copy data to a bunch of different destinations, #2 can end up inserting an extra file dialog into every copy, which gets pretty annoying, pretty fast. In an app where you're always sync'ing to a specifically configured target, it's perfectly reasonable one time setup. Lastly, if your app ends up being what's extracting data of a dodgy source that's nearing failure, making the intermediate data visible can be the sort of thing that makes someone very, very happy.

And do you mean that I shouldn't use the temporary folder in a production app?

I think what I was referring to here is the difference between:

  1. Fixed directories like /tmp/ or the app container temp directory.

  2. The "volume level temporary directory" that url(for:in:appropriateFor:create:) returns.

I think using #2 is entirely reasonable (see discussion above), but I'm not a fan of #1 (for the sort of thing you're describing). The problem with #1 is that "most" systems are simple enough that it works, but "enough" systems are "different" that I don't think you can really rely on it. Relying on fixed paths like #1 make it all to easy to make something that sort of works, then spend a bunch of time patching edge cases that slowly trickle in.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Makes perfect sense now. Thank you for your insights.

Detect and wait until a file has been unzipped to avoid permission errors
 
 
Q