org.codehaus.jackson.sym
Class CharsToNameCanonicalizer

java.lang.Object
  extended by org.codehaus.jackson.sym.CharsToNameCanonicalizer

public final class CharsToNameCanonicalizer
extends Object

This class is a kind of specialized type-safe Map, from char array to String value. Specialization means that in addition to type-safety and specific access patterns (key char array, Value optionally interned String; values added on access if necessary), and that instances are meant to be used concurrently, but by using well-defined mechanisms to obtain such concurrently usable instances. Main use for the class is to store symbol table information for things like compilers and parsers; especially when number of symbols (keywords) is limited.

For optimal performance, usage pattern should be one where matches should be very common (esp. after "warm-up"), and as with most hash-based maps/sets, that hash codes are uniformly distributed. Also, collisions are slightly more expensive than with HashMap or HashSet, since hash codes are not used in resolving collisions; that is, equals() comparison is done with all symbols in same bucket index.
Finally, rehashing is also more expensive, as hash codes are not stored; rehashing requires all entries' hash codes to be recalculated. Reason for not storing hash codes is reduced memory usage, hoping for better memory locality.

Usual usage pattern is to create a single "master" instance, and either use that instance in sequential fashion, or to create derived "child" instances, which after use, are asked to return possible symbol additions to master instance. In either case benefit is that symbol table gets initialized so that further uses are more efficient, as eventually all symbols needed will already be in symbol table. At that point no more Symbol String allocations are needed, nor changes to symbol table itself.

Note that while individual SymbolTable instances are NOT thread-safe (much like generic collection classes), concurrently used "child" instances can be freely used without synchronization. However, using master table concurrently with child instances can only be done if access to master instance is read-only (ie. no modifications done).


Field Summary
protected  org.codehaus.jackson.sym.CharsToNameCanonicalizer.Bucket[] _buckets
          Overflow buckets; if primary doesn't match, lookup is done from here.
protected  boolean _dirty
          Flag that indicates if any changes have been made to the data; used to both determine if bucket array needs to be copied when (first) change is made, and potentially if updated bucket list is to be resync'ed back to master instance.
protected  int _indexMask
          Mask used to get index from hash values; equal to _buckets.length - 1, when _buckets.length is a power of two.
protected  CharsToNameCanonicalizer _parent
          Sharing of learnt symbols is done by optional linking of symbol table instances with their parents.
protected  int _size
          Current size (number of entries); needed to know if and when rehash.
protected  int _sizeThreshold
          Limit that indicates maximum size this instance can hold before it needs to be expanded and rehashed.
protected  String[] _symbols
          Primary matching symbols; it's expected most match occur from here.
protected static int DEFAULT_TABLE_SIZE
          Default initial table size.
protected static boolean INTERN_STRINGS
          Config setting that determines whether Strings to be added need to be interned before being added or not.
 
Constructor Summary
CharsToNameCanonicalizer(int initialSize)
          Main method for constructing a master symbol table instance; will be called by other public constructors.
 
Method Summary
static int calcHash(char[] buffer, int start, int len)
          Implementation of a hashing method for variable length Strings.
static int calcHash(String key)
           
static CharsToNameCanonicalizer createRoot()
           
 String findSymbol(char[] buffer, int start, int len, int hash)
           
 CharsToNameCanonicalizer makeChild()
          "Factory" method; will create a new child instance of this symbol table.
 boolean maybeDirty()
           
 void release()
           
 int size()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_TABLE_SIZE

protected static final int DEFAULT_TABLE_SIZE
Default initial table size. Shouldn't be miniscule (as there's cost to both array realloc and rehashing), but let's keep it reasonably small nonetheless. For systems that properly reuse factories it doesn't matter either way; but when recreating factories often, initial overhead may dominate.

See Also:
Constant Field Values

INTERN_STRINGS

protected static final boolean INTERN_STRINGS
Config setting that determines whether Strings to be added need to be interned before being added or not. Forcing intern()ing will add some overhead when adding new Strings, but may be beneficial if such Strings are generally used by other parts of system. Note that even without interning, all returned String instances are guaranteed to be comparable with equality (==) operator; it's just that such guarantees are not made for Strings other classes return.

See Also:
Constant Field Values

_parent

protected CharsToNameCanonicalizer _parent
Sharing of learnt symbols is done by optional linking of symbol table instances with their parents. When parent linkage is defined, and child instance is released (call to release), parent's shared tables may be updated from the child instance.


_symbols

protected String[] _symbols
Primary matching symbols; it's expected most match occur from here.


_buckets

protected org.codehaus.jackson.sym.CharsToNameCanonicalizer.Bucket[] _buckets
Overflow buckets; if primary doesn't match, lookup is done from here.

Note: Number of buckets is half of number of symbol entries, on assumption there's less need for buckets.


_size

protected int _size
Current size (number of entries); needed to know if and when rehash.


_sizeThreshold

protected int _sizeThreshold
Limit that indicates maximum size this instance can hold before it needs to be expanded and rehashed. Calculated using fill factor passed in to constructor.


_indexMask

protected int _indexMask
Mask used to get index from hash values; equal to _buckets.length - 1, when _buckets.length is a power of two.


_dirty

protected boolean _dirty
Flag that indicates if any changes have been made to the data; used to both determine if bucket array needs to be copied when (first) change is made, and potentially if updated bucket list is to be resync'ed back to master instance.

Constructor Detail

CharsToNameCanonicalizer

public CharsToNameCanonicalizer(int initialSize)
Main method for constructing a master symbol table instance; will be called by other public constructors.

Parameters:
initialSize - Minimum initial size for bucket array; internally will always use a power of two equal to or bigger than this value.
Method Detail

createRoot

public static CharsToNameCanonicalizer createRoot()

makeChild

public CharsToNameCanonicalizer makeChild()
"Factory" method; will create a new child instance of this symbol table. It will be a copy-on-write instance, ie. it will only use read-only copy of parent's data, but when changes are needed, a copy will be created.

Note: while this method is synchronized, it is generally not safe to both use makeChild/mergeChild, AND to use instance actively. Instead, a separate 'root' instance should be used on which only makeChild/mergeChild are called, but instance itself is not used as a symbol table.


release

public void release()

size

public int size()

maybeDirty

public boolean maybeDirty()

findSymbol

public String findSymbol(char[] buffer,
                         int start,
                         int len,
                         int hash)

calcHash

public static int calcHash(char[] buffer,
                           int start,
                           int len)
Implementation of a hashing method for variable length Strings. Most of the time intention is that this calculation is done by caller during parsing, not here; however, sometimes it needs to be done for parsed "String" too.

Parameters:
len - Length of String; has to be at least 1 (caller guarantees this pre-condition)

calcHash

public static int calcHash(String key)