In pytorch RNN implementation, there are two biases,
Why is this? Is it different from using one bias? If yes, how? Will it affect performance or efficiency?
The formular in Pytorch Document in RNN is self-explained. That is
b_hh in the equation.
You may think that
b_ih is bias for input (which pair with
w_ih, weight for input) and
b_hh is bias for hidden (pair with
w_hh, weight for hidden)