You can use PL/Perl (CREATE FUNCTION
langof(text) LANGUAGE
plperlu
AS ...
) with Lingua::Identify CPAN module.
Perl script:
#!/usr/bin/perl
use Lingua::Identify qw(langof);
undef $/;
my $textstring = <>; ## warning - slurps whole file to memory
my $a = langof( $textstring ); # gives the most probable language
print "$a
";
And the function:
create or replace function langof( text ) returns varchar(2)
immutable returns null on null input
language plperlu as $perlcode$
use Lingua::Identify qw(langof);
return langof( shift );
$perlcode$;
Works for me:
filip@filip=# select langof('Pójd?, kiń-?e t? chmurno?? w g??b flaszy');
langof
--------
pl
(1 row)
Time: 1.801 ms
PL/Perl on Windows
PL/Perl language libary (plperl.dll) comes preinstalled in latest Windows installer of postgres.
But to use PL/Perl, you need Perl interpreter itself. Specifically, Perl 5.14 (at the time of this writing). Most common installer is ActiveState, but it's not free. Free one comes from StrawberryPerl. Make sure you have PERL514.DLL
in place.
After installing Perl, login to your postgres database and try to run
CREATE LANGUAGE plperlu;
Language identification library
If quality is your concern, you have some options: You can improve Lingua::Identify yourself (it's open source) or you could try another library. I found this one, which is commercial but looks promising.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…