Dear Rafael,
I have successfully been able to run your code in creating language model. However, in order to execute code "completion" scenario, I am getting following error:

Note: My tokenized files do not have end line character ‘\n’ value, so for this purpose, I have changed the lines https://github.com/mast-group/OpenVocabCodeNLM/blob/master/code_nlm.py#L981 into the following:
#end_file_id = test_dataset.vocab["-eod-"]
start_index = 0
file_start_index = 0
while data_covered+10 < data_len:
# Stop when 1000000 test tokens have been scored.
if tokens_done > 1000000:
break
# Create minibatches for the next file
# while raw_data[data_covered] != end_file_id:
# data_covered += 1
#data_covered += 1 # eod symbol
data_covered=data_covered+10
I am attaching sample files sample BPE files.zip. Configuration settings are as follows:
# Scenario options. Training is default so, no option for it.
flags.DEFINE_boolean("predict", False, "Set to True for computing predictability.")
flags.DEFINE_boolean("test", False, "Set to True for computing test perplexity.")
flags.DEFINE_boolean("dynamic_test", False, "Set to True for performing dynamic train-testing perplexity calculation (only one train epoch).")
flags.DEFINE_boolean("maintenance_test", False, "Set to True for performing maintenance train-testing perplexity simulation (only one train epoch).")
flags.DEFINE_boolean("completion", True, "Set to True to run code completion experiment.")
flags.DEFINE_boolean("maintenance_completion", False, "Set to True to run maintenance code completion experiment")
flags.DEFINE_boolean("dynamic", False, "Set to True to run dynamic code completion experiment.")
Please let me know about your concerns.
Dear Rafael,
I have successfully been able to run your code in creating language model. However, in order to execute code "completion" scenario, I am getting following error:
Note: My tokenized files do not have end line character ‘\n’ value, so for this purpose, I have changed the lines https://github.com/mast-group/OpenVocabCodeNLM/blob/master/code_nlm.py#L981 into the following:
I am attaching sample files sample BPE files.zip. Configuration settings are as follows:
Please let me know about your concerns.