edu.stanford.nlp.parser.lexparser
Class ChineseUnknownWordModel
java.lang.Object
edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel
edu.stanford.nlp.parser.lexparser.ChineseUnknownWordModel
- All Implemented Interfaces:
- UnknownWordModel, java.io.Serializable
public class ChineseUnknownWordModel
- extends BaseUnknownWordModel
Stores, trains, and scores with an unknown word model. A couple
of filters deterministically force rewrites for certain proper
nouns, dates, and cardinal and ordinal numbers; when none of these
filters are met, either the distribution of terminals with the same
first character is used, or Good-Turing smoothing is used. Although
this is developed for Chinese, the training and storage methods
could be used cross-linguistically.
- Author:
- Roger Levy
- See Also:
- Serialized Form
Fields inherited from class edu.stanford.nlp.parser.lexparser.BaseUnknownWordModel |
NULL_ITW, nullTag, nullWord, tagHash, tagIndex, trainOptions, unknown, unknownLevel, unSeenCounter, useFirst, useGT, VERBOSE, wordIndex |
Constructor Summary |
ChineseUnknownWordModel(Options op,
Lexicon lex,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex)
This constructor creates an UWM with empty data structures. |
ChineseUnknownWordModel(Options op,
Lexicon lex,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex,
ClassicCounter<IntTaggedWord> unSeenCounter,
java.util.HashMap<Label,ClassicCounter<java.lang.String>> tagHash,
java.util.HashMap<java.lang.String,java.lang.Float> unknownGT,
boolean useGT,
java.util.Set<java.lang.String> seenFirst)
|
Method Summary |
java.lang.String |
getSignature(java.lang.String word,
int loc)
Signature for a specific word; loc parameter is ignored. |
static void |
main(java.lang.String[] args)
|
float |
score(IntTaggedWord itw,
java.lang.String word)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ChineseUnknownWordModel
public ChineseUnknownWordModel(Options op,
Lexicon lex,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex,
ClassicCounter<IntTaggedWord> unSeenCounter,
java.util.HashMap<Label,ClassicCounter<java.lang.String>> tagHash,
java.util.HashMap<java.lang.String,java.lang.Float> unknownGT,
boolean useGT,
java.util.Set<java.lang.String> seenFirst)
ChineseUnknownWordModel
public ChineseUnknownWordModel(Options op,
Lexicon lex,
Index<java.lang.String> wordIndex,
Index<java.lang.String> tagIndex)
- This constructor creates an UWM with empty data structures. Only
use if loading in the data separately, such as by reading in text
lines containing the data.
TODO: would need to set useGT correctly if you saved a model with
useGT and then wanted to recover it from text.
score
public float score(IntTaggedWord itw,
java.lang.String word)
- Overrides:
score
in class BaseUnknownWordModel
main
public static void main(java.lang.String[] args)
getSignature
public java.lang.String getSignature(java.lang.String word,
int loc)
- Description copied from class:
BaseUnknownWordModel
- Signature for a specific word; loc parameter is ignored.
- Specified by:
getSignature
in interface UnknownWordModel
- Overrides:
getSignature
in class BaseUnknownWordModel
- Parameters:
word
- The wordloc
- Its sentence position
- Returns:
- A "signature" (which represents an equivalence class of Strings), e.g., a suffix of the string
Stanford NLP Group