use strict;
use warnings FATAL => 'all';
use bytes;

package File::Hashset;

use base qw(DynaLoader);

our $VERSION = 1.00;

bootstrap File::Hashset;

1;

=pod

=encoding utf8

=head1 NAME

File::Hashset - handle compact collections of hashes

=head1 SYNOPSIS

This package contains a set of functions to create and query files that consist of consecutive binary hashes.
To create such a file, simply write the binary hashes to a file and call the sortfile method below.
This will sort and deduplicate the file, so that you can load the file and query it.

All functions and methods die() should an error be encountered.
Use eval {} as required.

=head1 USAGE

=head2 $hs = File::Hashset->load($filename)

Creates a new File::Hashset object from the specified file.
The file must be sorted and deduplicated beforehand.

=head2 $hs = File::Hashset->new($hashes, $hashlen)

Creates a new File::Hashset object from a string.
The string will be copied, sorted and deduplicated for you.
The $hashlen parameter must be a power of 2 between 8 and 4096 inclusive.

=head2 $hs->exists($hash)

Check if the specified hash exists in the hashset.
If the size of the hash does not match the length of the hashes in the file or the file is not sorted, the results are undefined (that is, the call may die() or return false).

=head2 File::Hashset->sortfile($filename, $hashlen)

Sorts and deduplicates the specified file.
The $hashlen parameter must be a power of 2 between 8 and 4096 inclusive.

=head2 File::Hashset->merge($outfile, $hashlen, [$hs1, [$hs2, ...]])

Merges the specified hashsets into a new file suitable for use as a hashset.
The output file will be sorted and deduplicated.
The number of input hashsets can be zero (in which case the output file will be empty).

=head1 LIMITATIONS

Hash lengths must be a power of two and must currently be at least 8 bytes.

This package makes liberal use of the mmap() system call.
If your operating system does not support making separate mappings of the same file (Solaris?) then opening the same file twice using File::Hashset will probably crash.

Modifying a file (using File::Hashset->sortfile() or otherwise) while the file is being used as a backend for a File::Hashset object is probably not a good idea.
Appending should not be a problem.

=head1 COPYRIGHT

Copyright (c) 2014 Wessel Dankers <wsl@fruit.je>.

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
