#include <boost/locale/boundary.hpp>
Public Types | |
typedef RangeIterator | iterator |
typedef RangeIterator::base_iterator | base_iterator |
typedef std::iterator_traits < base_iterator >::value_type | char_type |
Public Member Functions | |
mapping (boundary_type type, base_iterator begin, base_iterator end, std::locale const &loc=std::locale()) | |
mapping (boundary_type type, base_iterator begin, base_iterator end, unsigned mask, std::locale const &loc=std::locale()) | |
void | map (boundary_type type, base_iterator begin, base_iterator end, std::locale const &loc=std::locale()) |
void | map (boundary_type type, base_iterator begin, base_iterator end, unsigned mask, std::locale const &loc=std::locale()) |
mapping () | |
template<typename ORangeIterator> | |
mapping (mapping< ORangeIterator > const &other) | |
template<typename ORangeIterator> | |
void | swap (mapping< ORangeIterator > &other) |
template<typename ORangeIterator> | |
mapping const & | operator= (mapping< ORangeIterator > const &other) |
unsigned | mask () const |
void | mask (unsigned u) |
RangeIterator | begin () const |
RangeIterator | end () const |
Friends | |
class | break_iterator |
class | token_iterator |
class | mapping |
When the object is created in creates index and provides access to it with iterators. it is used mostly together with break_iterator and token_iterator. For each boundary point it provides the description mark of it that allows distinguish between different types of boundaries. For example it marks if sentence terminates because a mark like "?" or "." was found or because new line symbol is present in the text.
These marks can be read out with token_iterator::mark() and break_iterator::mark() member functions.
This class stores iterators to the original text, so you should be careful with iterators invalidation. If the iterators on original text are invalid you can't use this mapping any more.
boundary.cpp, and wboundary.cpp.
typedef RangeIterator boost::locale::boundary::mapping< RangeIterator >::iterator |
Iterator type that is used to iterate over boundaries
typedef RangeIterator::base_iterator boost::locale::boundary::mapping< RangeIterator >::base_iterator |
Underlying iterator that is used to iterate original text.
typedef std::iterator_traits<base_iterator>::value_type boost::locale::boundary::mapping< RangeIterator >::char_type |
The character type of the text
boost::locale::boundary::mapping< RangeIterator >::mapping | ( | boundary_type | type, | |
base_iterator | begin, | |||
base_iterator | end, | |||
std::locale const & | loc = std::locale() | |||
) | [inline] |
boost::locale::boundary::mapping< RangeIterator >::mapping | ( | boundary_type | type, | |
base_iterator | begin, | |||
base_iterator | end, | |||
unsigned | mask, | |||
std::locale const & | loc = std::locale() | |||
) | [inline] |
boost::locale::boundary::mapping< RangeIterator >::mapping | ( | ) | [inline] |
Default constructor of empty mapping
boost::locale::boundary::mapping< RangeIterator >::mapping | ( | mapping< ORangeIterator > const & | other | ) | [inline] |
Copy the mapping, note, you can copy the mapping that is used for token_iterator to break_iterator and vise versa.
void boost::locale::boundary::mapping< RangeIterator >::map | ( | boundary_type | type, | |
base_iterator | begin, | |||
base_iterator | end, | |||
std::locale const & | loc = std::locale() | |||
) | [inline] |
void boost::locale::boundary::mapping< RangeIterator >::map | ( | boundary_type | type, | |
base_iterator | begin, | |||
base_iterator | end, | |||
unsigned | mask, | |||
std::locale const & | loc = std::locale() | |||
) | [inline] |
void boost::locale::boundary::mapping< RangeIterator >::swap | ( | mapping< ORangeIterator > & | other | ) | [inline] |
Swap the mappings, note, you swap the mappings between those that are used for token_iterator to break_iterator and vise versa. This operation invalidates all iterators.
mapping const& boost::locale::boundary::mapping< RangeIterator >::operator= | ( | mapping< ORangeIterator > const & | other | ) | [inline] |
Copy the mapping, note, you can copy the mapping that is used for token_iterator to break_iterator and vise versa.
unsigned boost::locale::boundary::mapping< RangeIterator >::mask | ( | ) | const [inline] |
Get current boundary mask
void boost::locale::boundary::mapping< RangeIterator >::mask | ( | unsigned | u | ) | [inline] |
Set current boundary mask.
This mask provides fine grained control on the type of boundaries and tokens you need to relate to. For example, if you want to find sentence breaks that are caused only by terminator like "." or "?" and ignore new lines, you can set the mask value sentence_term and break iterator would iterate only over boundaries that much this mask.
Note: the beginning of the text and the end of the text are always considered legal boundaries regardless if they have a mark that fits the mask.
For token iterator it means which kind of tokens should be selected. Please note that token iterator generally selects the biggest amount of text that has specific mark. This is especially relevant for word boundary analysis.
For example: if you set mask to word_any (selects numbers, letters) then when you iterate Over "To be, or not to be?" You would get "To", "be", "or", "not", "to", "be". You can request from token iterator to use wider type of selection by calling token_iterator::full_select(true) so it would select only "To", " be", ", or", " not", " to", " be" tokens. All depends on your actual needs. For word selection you would probably want the first (default) and for sentence selection the second.
Changing a mask does not invalidate current iterators but all new created iterators would not be compatible with old ones So you can't compare them, be careful with it.
RangeIterator boost::locale::boundary::mapping< RangeIterator >::begin | ( | ) | const [inline] |
Get begin iterator used when object was created
RangeIterator boost::locale::boundary::mapping< RangeIterator >::end | ( | ) | const [inline] |
Get end iterator used when object was created