We will be trying to compare various different ways of reading a file using Rust.
Apart from "wc -l" we will be running each function 10 times using criterion and then picking the mean.
Framework 16 with 7840hs and 64 Gigs of RAM. Power plugged in and performance mode enabled.
SSD: WD_BLACK SN850X 4000GB. Test using Gnome Disks shows the read speed at 3.6 GB/s (Sample size 1000MB, 100 Samples).
Filesystem : btrfs
Uname string: Linux fedora 6.8.8-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Apr 27 17:53:31 UTC 2024 x86_64 GNU/Linux
Details about the text file
Uncompressed size: 22G
Number of lines: 200,000,000
Compressed size after btrfs compression (zstd): 5.3G
For the impatient: an overview of the results
Method
Time (seconds)
Mmap with AVX512
2.61
Mmap with AVX2
2.64
io_uring with Vectored IO
2.86
Vectored IO
2.89
Mmap
3.43
io_uring
5.26
wc -l (baseline)
8.01
Direct IO
10.56
BufReader without appends
15.94
BufReader with lines().count()
33.50
Benchmark results
Interesting observation was that AVX512 was taking 2.61 seconds, file is ~22G and SSD benchmarks show 3.6 GB/s read speed. This means that the file should be read in about 6 seconds. The AVX512 implementation is reading the file at about 8.4 GB/s. What gives? Turns out Fedora uses btrfs which enables zstd compression by default. Actual on disk size can be found using compsize.
1
2
3
4
5
6
opdroid@box:~/tmp$ sudo compsize data
Processed 1 file, 177437 regular extents (177437 refs), 0 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 24% 5.3G 21G 21G
none 100% 32K 32K 32K
zstd 24% 5.3G 21G 21G
Thanks to these fine folks
@alextjensen - for pointing me to sane defaults for BufReader and to compile to the native arch.
@aepau2 - for spotting a glaring error in the wc numbers. I had forgotten to drop the cache before measuring with wc.
@rflaherty71 - for pointing me to use more buffers which are larger (64 x 64k).
Always a good idea to use some code we did not write as a baseline.
Baseline: wc -l
1
2
3
4
5
6
opdroid@box:~/tmp$ time wc -l data
200000000 data
real 0m8.010s
user 0m0.193s
sys 0m7.591s
We reset the file caches using the following command at the end of each function. I am yet to figure out how to use a teardown function in criterion so that this doesnt get counted in the time taken.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// TODO: move to a teardown function in criterion
fnreset_file_caches() {
// Execute the command to reset file caches
let output = Command::new("sudo")
.arg("sh")
.arg("-c")
.arg("echo 3 > /proc/sys/vm/drop_caches")
.output()
.expect("Failed to reset file caches");
// Check if the command executed successfully
if!output.status.success() {
panic!("Failed to reset file caches: {:?}", output);
}
}
Method 1: Read the file using BufReader and use reader.lines().count()
1
2
3
4
5
6
7
8
9
fncount_newlines_standard(filename: &str) -> Result<usize, std::io::Error> {
let file = File::open(filename)?;
let reader = BufReader::with_capacity(16*1024, file);
let newline_count = reader.lines().count();
reset_file_caches();
Ok(newline_count)
}
This takes about 36.5 seconds on my machine.
Method 2: Read the file using BufReader and avoid string appends