воскресенье, 27 декабря 2015 г.

Fuzzy substring searching with the pg_trgm extension

Introduction

In the previous topic I mentioned that we have some operators for the pattern matching (LIKE (~~), ILIKE (~~*), ~, ~* and SIMILAR TO) and an operator for the text search (@@). If you know exactly what you want to find this operators are good choise.

But if you know a search query only approximately then a trigram matching can help you. A trigram matching is provided by the pg_trgm module in PostgreSQL. For example, you can execute the following query:

CREATE TABLE clients(name varchar);
INSERT INTO clients VALUES('Devereaux'), ('Devereux');
SELECT * FROM clients WHERE name % 'Devereux';

And you will get the following people with the Devereux surname:

   name  
-----------
 Devereaux
 Devereux
(2 rows)

вторник, 22 декабря 2015 г.

Dictionaries and PostgreSQL FTS. Part 2

This is the second part of the topic about using Ispell and Hunspell dictionaries within PostgreSQL. In this topic I want to give some information about FLAG and AF parameters of Hunspell and about a patch which helps PostgreSQL to load dictionaries with this parameters.

понедельник, 7 декабря 2015 г.

Dictionaries and PostgreSQL FTS

Introduction

There are some text search operator in PostgreSQL: LIKE (~~), ILIKE (~~*), ~, ~* and SIMILAR TO. But they have important disadvantage - they do not provide linguistic support. For example, you will not find the string "great abilities", if you will execute the following query:

CREATE TABLE table1 (col1 varchar);
INSERT INTO table1 VALUES ('great abilities');
SELECT col1 FROM table1 WHERE col1 LIKE '%ability%';

To avoid this drawback use tsvector and tsquery data types. These data types consist of lexemes. Moreover tsvector can store a position and a rank of lexemes.