HELP wanted:tarin on https://cloud.vast.ai/

i tarin on this instance 14 hours

after 14 hours i login see nothing

help ,how can i find log to see what happen?

image

Look in the run/ directory

1 Like

the first pic.
last 3 lines

in the firsh picture

I see. What does the training progress look like after you run argos-train?

In config.yml there’s this config:

save_checkpoint_steps: 1000

So no checkpoints will be saved until you’ve completed 1000 train steps.

so when i had completed 5000 train steps. it is break.how can i restart train at 5000 steps

need u help,:when i had completed 5000 train steps. it is break.how can i restart train at 5000 steps :sob: