We're using cookies to make this site more secure, featureful and efficient.

Issue 174: The Sailor

Object
The Sailor (Dance)
Submitter
Steve Smith
Assigned to
Anselm Lingnau
Priority
Normal
Disposition
Fixed
Description

For some reason The Sailor does not get found by your search system. If you put just sailor in, several dances come up and among the tunes is one called The Sailor which back links to the right page.

I note that there is also a problem with words containing the ’ symbol, i.e. Mairi’s Wedding. I have a perl program that extracts dances from minicrib, written before I found the Strathspey Server. It suffers from the same problem and I ended up deleting all the ’ symbols during the search process. I have so far found two characters that can appear on websites for the ’ character. I think there may be a third.

More details on request if it is of any interest.

Regards

Steve Smith

Previous Actions

  • Date  Jan. 1, 2013, 5:07 p.m.
  • User  Unknown

New issue submitted

  • Date  Jan. 1, 2013, 6:43 p.m.
  • User  Anselm Lingnau (anselm)

Assigned changed to »anselm« (previously »None«)
Disposition changed to »Fixed« (previously »New«)

The issues in question have been fixed in the code (sort of, anyway).

In particular, characters that look like apostrophes but aren’t (i.e., the crud that Microsoft insisted on stuffing into the 128–159 range of the CP1252 code page), among other »gremlins«, are canonicalised for search purposes in the same manner as for data entry purposes. We also now consider the Unicode ACUTE ACCENT character (character U+00B4) an apostrophe.

The »The Sailor« search problem stems from the fact that the name of the dance is actually stored as »Sailor, The« in the database, so one would have to enter that to obtain an exact match. As a sort-of fix, a search for »The X« now also searches for exact matches for »X, The«, which fixes the »The Sailor« issue but could lead to somewhat counterintuitive results for approximate searches for strings like »The Reel of«, which would look for »Reel of, The«. Hence, for the approximate results, a leading article such as »The« is simply ignored. This leads to extra matches but that is probably better than finding nothing at all.