ios - How to efficiently write large files to disk on background thread (Swift)

Question

Welcome To Ask or Share your Answers For Others

ios - How to efficiently write large files to disk on background thread (Swift)

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

ios - How to efficiently write large files to disk on background thread (Swift)

Update

I have resolved and removed the distracting error. Please read the entire post and feel free to leave comments if any questions remain.

Background

I am attempting to write relatively large files (video) to disk on iOS using Swift 2.0, GCD, and a completion handler. I would like to know if there is a more efficient way to perform this task. The task needs to be done without blocking the Main UI, while using completion logic, and also ensuring that the operation happens as quickly as possible. I have custom objects with an NSData property so I am currently experimenting using an extension on NSData. As an example an alternate solution might include using NSFilehandle or NSStreams coupled with some form of thread safe behavior that results in much faster throughput than the NSData writeToURL function on which I base the current solution.

What's wrong with NSData Anyway?

Please note the following discussion taken from the NSData Class Reference, (Saving Data). I do perform writes to my temp directory however the main reason that I am having an issue is that I can see a noticeable lag in the UI when dealing with large files. This lag is precisely because NSData is not asynchronous (and Apple Docs note that atomic writes can cause performance issues on "large" files ~ > 1mb). So when dealing with large files one is at the mercy of whatever internal mechanism is at work within the NSData methods.

I did some more digging and found this info from Apple..."This method is ideal for converting data:// URLs to NSData objects, and can also be used for reading short files synchronously. If you need to read potentially large files, use inputStreamWithURL: to open a stream, then read the file a piece at a time." (NSData Class Reference, Objective-C, +dataWithContentsOfURL). This info seems to imply that I could try using streams to write the file out on a background thread if moving the writeToURL to the background thread (as suggested by @jtbandes) is not sufficient.

The NSData class and its subclasses provide methods to quickly and easily save their contents to disk. To minimize the risk of data loss, these methods provide the option of saving the data atomically. Atomic writes guarantee that the data is either saved in its entirety, or it fails completely. The atomic write begins by writing the data to a temporary file. If this write succeeds, then the method moves the temporary file to its final location.

While atomic write operations minimize the risk of data loss due to corrupt or partially-written files, they may not be appropriate when writing to a temporary directory, the user’s home directory or other publicly accessible directories. Any time you work with a publicly accessible file, you should treat that file as an untrusted and potentially dangerous resource. An attacker may compromise or corrupt these files. The attacker can also replace the files with hard or symbolic links, causing your write operations to overwrite or corrupt other system resources.

Avoid using the writeToURL:atomically: method (and the related methods) when working inside a publicly accessible directory. Instead initialize an NSFileHandle object with an existing file descriptor and use the NSFileHandle methods to securely write the file.

Other Alternatives

One article on Concurrent Programming at objc.io provides interesting options on "Advanced: File I/O in the Background". Some of the options involve use of an InputStream as well. Apple also has some older references to reading and writing files asynchronously. I am posting this question in anticipation of Swift alternatives.

Example of an appropriate answer

Here is an example of an appropriate answer that might satisfy this type of question. (Taken for the Stream Programming Guide, Writing To Output Streams)

Using an NSOutputStream instance to write to an output stream requires several steps:

Create and initialize an instance of NSOutputStream with a repository for the written data. Also set a delegate.
Schedule the stream object on a run loop and open the stream.
Handle the events that the stream object reports to its delegate.
If the stream object has written data to memory, obtain the data by requesting the NSStreamDataWrittenToMemoryStreamKey property.
When there is no more data to write, dispose of the stream object.

I am looking for the most proficient algorithm that applies to writing extremely large files to iOS using Swift, APIs, or possibly even C/ObjC would suffice. I can transpose the algorithm into appropriate Swift compatible constructs.

Nota Bene

~~I understand the informational error below. It is included for completeness.~~ This question is asking whether or not there is a better algorithm to use for writing large files to disk with a guaranteed dependency sequence (e.g. NSOperation dependencies). If there is please provide enough information (description/sample for me to reconstruct pertinent Swift 2.0 compatible code). Please advise if I am missing any information that would help answer the question.

Note on the extension

I've added a completion handler to the base writeToURL to ensure that no unintended resource sharing occurs. My dependent tasks that use the file should never face a race condition.

extension NSData {

    func writeToURL(named:String, completion: (result: Bool, url:NSURL?) -> Void)  {

       let filePath = NSTemporaryDirectory() + named
       //var success:Bool = false
       let tmpURL = NSURL( fileURLWithPath:  filePath )
       weak var weakSelf = self


      dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), {
                //write to URL atomically
                if weakSelf!.writeToURL(tmpURL, atomically: true) {

                        if NSFileManager.defaultManager().fileExistsAtPath( filePath ) {
                            completion(result: true, url:tmpURL)                        
                        } else {
                            completion (result: false, url:tmpURL)
                        }
                    }
            })

        }
    }

This method is used to process the custom objects data from a controller using:

var items = [AnyObject]()
if let video = myCustomClass.data {

    //video is of type NSData        
    video.writeToURL("shared.mp4", completion: { (result, url) -> Void in
        if result {
            items.append(url!)
            if items.count > 0 {

                let sharedActivityView = UIActivityViewController(activityItems: items, applicationActivities: nil)

                self.presentViewController(sharedActivityView, animated: true) { () -> Void in
                //finished
    }
}
        }
     })
}

Conclusion

The Apple Docs on Core Data Performance provide some good advice on dealing with memory pressure and managing BLOBs. This is really one heck of an article with a lot of clues to behavior and how to moderate the issue of large files within your app. Now although it is specific to Core Data and not files, the warning on atomic writing does tell me that I ought to implement methods that write atomically with great care.

With large files, the only safe way to manage writing seems to be adding in a completion handler (to the write method) and showing an activity view on the main thread. Whether one does that with a stream or by modifying an existing API to add completion logic is up to the reader. I've done both in the past and am in the midst of testing for best performance.

Until then, I'm changing the solution to remove all binary data properties from Core Data and replacing them with strings to hold asset URLs on disk. I am also leveraging the built in functionality from Assets Library and PHAsset to grab and store all related asset URLs. When or if I need to copy any assets I will use standard API methods (export methods on PHAsset/Asset Library) with completion handlers to notify user of finished state on the main thread.

(Really useful snippets from the Core Data Performance article)

Reducing Memory Overhead

It is sometimes the case that you want to use managed objects on a temporary basis, for example to calculate an average value for a particular attribute. This causes your object graph, and memory consumption, to grow. You can reduce the memory overhead by re-faulting individual managed objects that you no longer need, or you can reset a managed object context to clear an entire object graph. You can also use patterns that apply to Cocoa programming in general.

You can re-fault an individual managed object using NSManagedObjectContext’s refreshObject:mergeChanges: method. This has the effect of clearing its in-memory property values thereby reducing its memory overhead. (Note that this is not the same as setting the property values to nil—the values will be retrieved on demand if the fault is fired—see Faulting and Uniquing.)

When you create a fetch request you can set includesPropertyValues to NO > to reduce memory overhead by avoiding creation of objects to represent the property values. You should typically only do so, however, if you are sure that either you will not need the actual property data or you already have the information in the row cache, otherwise you will incur multiple trips to the persistent store.

You can use the reset method of NSManagedObjectContext to remove all managed objects associated with a context and "start over" as if you'd just created it. Note that any managed object associated with that context will be invalidated, and so you will need to discard any references to and re-fetch any objects associated with that context in which you are still interested. If you iterate over a lot of objects, you may need to use local autorelease pool blocks to ensure temporary objects are deallocated as soon as possible.

If you do not intend to use Core Data’s undo functionality, you can reduce your application's resource requirements by setting the context’s undo manager to nil. This may be especially beneficial for background worker threads, as well as for large import or batch operations.

Fina

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T17:35:12+0000

Performance depends wether or not the data fits in RAM. If it does, then you should use NSData writeToURL with the atomically feature turned on, which is what you're doing.

Apple's notes about this being dangerous when "writing to a public directory" are completely irrelevant on iOS because there are no public directories. That section only applies to OS X. And frankly it's not really important there either.

So, the code you've written is as efficient as possible as long as the video fits in RAM (about 100MB would be a safe limit).

For files that don't fit in RAM, you need to use a stream or your app will crash while holding the video in memory. To download a large video from a server and write it to disk, you should use NSURLSessionDownloadTask.

In general, streaming (including NSURLSessionDownloadTask) will be orders of magnitude slower than NSData.writeToURL(). So don't use a stream unless you need to. All operations on NSData are extremely fast, it is perfectly capable of dealing with files that are multiple terabytes in size with excellent performance on OS X (iOS obviously can't have files that large, but it's the same class with the same performance).

There are a few issues in your code.

This is wrong:

let filePath = NSTemporaryDirectory() + named

Instead always do:

let filePath = NSTemporaryDirectory().stringByAppendingPathComponent(named)

But that's not ideal either, you should avoid using paths (they are buggy and slow). Instead use a URL like this:

let tmpDir = NSURL(fileURLWithPath: NSTemporaryDirectory())!
let fileURL = tmpDir.URLByAppendingPathComponent(named)

Also, you're using a path to check if the file exists... don't do this:

if NSFileManager.defaultManager().fileExistsAtPath( filePath ) {

Instead use NSURL to check if it exists:

if fileURL.checkResourceIsReachableAndReturnError(nil) {

Categories

ios - How to efficiently write large files to disk on background thread (Swift)

ios - How to efficiently write large files to disk on background thread (Swift)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags