cbJisho is an English-to-Japanese dictionary that sorts results based on relative frequencies in blogs, newspapers and novels. You can filter words by JLPT level (1-5), whether or not it’s a JDIC "common" word and by character length of the definition. You can also search using a regular expression or by whole word only. Search text can either be in English (in which case the definitions are searched) or Japanese (in which case the kanji and kana fields are searched) or you can also use a simple SQL query. Results can be saved to the clipboard or to a file.
If you want to change the default settings, you may edit settings.txt.
If you want to change the colors/fonts/styles of the results window, you may edit Template/results_template.html.
If you want to remove or re-order the fields of the results window, you may edit Tempate/single_result_template.html.
Key | Description |
---|---|
ENTER | Peform search |
ESC | Clear the search box |
CTRL-L | Send focus to the search box |
UP | Move back through the search history |
DOWN | Move forward through the search history |
CTRL-D | Copy the results to the clipboard. See Saving Results to File/Clipboard. |
CTRL-S | Copy the results to file. See Saving Results to File/Clipboard. |
CTRL-W | Toggle the Whole word checkbox |
CTRL-R | Toggle the RegEx checkbox |
CTRL-J | Toggle all of the JLPT buttons |
CTRL-P | Toggle the (P) checkbox |
CTRL-H | Show this help page |
CTRL-Q | Show the SQL help page |
Press CTRL-D.
You can specify the format of the saved results by editing the "SaveFormat" setting in settings.txt with the following tokens:
Token | Description |
---|---|
$s | Sequence Number |
$o | Overall Frequency |
$b | Blog Frequency |
$n | Newspaper Frequency |
$v | Novel Frequency |
$j | JLPT Number |
$p | (P) |
$k | Kanji |
$a | Kana |
$d | Defination |
Example:
To save the overall frequency, the kanji, the kana and the definition, use the following:
SaveFormat = $o$k$a$d
You may use the following columns:
Column | Type | Range | Description |
---|---|---|---|
kanji | TEXT | n/a | The kanji of an entry |
kana | TEXT | n/a | The kana of an entry |
def | TEXT | n/a | The description of an entry |
overall | REAL | [0-100] | The overall frequency of the entry (based on blog, newspaper and novel frequencies) |
blog | REAL | [0-100] | The blog frequency. It is set to -1 if no frequency is associated with the entry. |
news | REAL | [0-100] | The newspaper frequency. It is set to -1 if no frequency is associated with the entry. |
novel | REAL | [0-100] | The novel frequency. It is set to -1 if no frequency is associated with the entry. |
jlpt | INTEGER | [1-5] | The JLPT level |
common | INTEGER | [0-1] | 1 = Entry is EDICT common "(P)" word |
Note: The "Whole Word" and "RegEx" options have no effect when using SQL.
Wildcard | Description |
---|---|
% | Wildcard substitute for zero or more characters |
_ | Wildcard substitute for exactly one character |
Wildcard | Description |
---|---|
* | Wildcard substitute for zero or more characters |
? | Wildcard substitute for exactly one character |
[charlist] | Wildcard substitute for any character in charlist |
[^charlist] | Wildcard substitute for any character not in charlist |
Question:
Where did these frequencies come from?
Answer:
The blog and newspaper frequency lists where obtained from The Monash Nihongo ftp Archive.
Blog frequencies were obtained by using Google search result counts within Goo Blog. For details, see the readme.
Newspaper frequencies are based on a large number of articles from the online versions of the Yomiuri and Mainichi newspapers. For details, see the readme.
Novel frequencies are based on 5109 novels. The list of novels used: http://pastebin.com/VLJpTREd. The first 50 lines and last 20 lines were removed from each file so that things like table of contents, copyright and publisher information were not parsed. The readings (between 《 and 》) were also removed.
Overall frequencies are an average of the blog, newspaper and novel frequencies.