I recently started reading Charles Petzold book titled Code, which so far covers exactly the kinds of things I assume you are curious about. But I have not gotten all the way through so thumb through the book first before buying/borrowing.
This is my relatively short answer, not Petzolds...and hopefully in line with what you were curios about.
You have heard of the transistor I assume. The original way to use a transistor was for things like a transistor radio. it is an amplifier basically, take the tiny little radio signal floating in air and feed it into the input of the transistor which opens or closes the flow of current on a circuit next to it. And you wire that circuit with higher power, so you can take a very small signal, amplify it and feed it into a speaker for example and listen to the radio station (there is more to it isolating the frequency and keeping the transistor balanced, but you get the idea I hope).
Now that the transistor exists that lead to was a way to use a transistor as a switch, like a light switch. The radio is like a dimmer light switch you can turn it to anywhere from all the way on to all the way off. A non-dimmer light switch is either all on or all off, there is some magic place in the middle of the switch where it changes over. We use transistors the same way in digital electronics. Take the output of one transistor and feed it into another transistors input. The output of the first is certainly not a small signal like the radio wave, it forces the second transistor all the way on or all the way off. that leads to the concept of TTL or transistor-transistor logic. Basically you have one transistor that drives a high voltage or lets call it a 1, and on that sinks a zero voltage, lets call that a 0. And you arrange the inputs with other electronics so that you can create AND gates (if both inputs are a 1 then the output is a 1), OR gates (if either one or the other input is a 1 then the output is a one). Inverters, NAND, gates, NOR gates (an or with an inverter) etc. There used to be a TTL handbook and you could buy 8 or so pin chips that had one or two or four of some kind of gate (NAND, NOR, AND, etc) functions inside, two inputs and an output for each. Now we dont need those it is cheaper to create programmable logic or dedicated chips with many millions of transistors. But we still think in terms of AND, OR, and NOT gates for hardware design. (usually more like nand and nor).
I dont know what they teach now but the concept is the same, for memory a flip flop can be thought of as two of these TTL pairs (NANDS) tied together with the output of one going to the input of the other. Lets leave it at that. That is basically a single bit in what we call SRAM, or static ram. sram takes basically 4 transistors. Dram or dynamic ram the memory sticks you put in your computer yourself take one transistor per bit, so for starters you can see why dram is the thing you buy gigabytes worth of. Sram bits remember what you set them to so long as the power doesnt go out. Dram starts to forget what you told it as soon as you tell it, basically dram uses the transistor in yet a third different way, there is some capacitance (as in capacitor, wont get into that here) that is like a tiny rechargeable battery, as soon as you charge it and unplug the charger it starts to drain. Think of a row of glasses on a shelf with little holes in each glass, these are your dram bits, you want some of them to be ones so you have an assistant fill up the glasses you want to be a one. That assistant has to constantly fill up the pitcher and go down the row and keep the "one" bit glasses full enough with water, and let the "zero" bit glasses remain empty. So that at any time you want to see what your data is you can look over and read the ones and zeros by looking for water levels that are definitely above the middle being a one and levels definitely below the middle being a zero.. So even with the power on, if the assistant is not able to keep the glasses full enough to tell a one from a zero they will eventually all look like zeros and drain out. Its the trade off for more bits per chip. So short story here is that outside the processor we use dram for our bulk memory, and there is assistant logic that takes care of keeping the ones a one and zeros a zero. But inside the chip, the AX register and DS registers for example keep your data using flip flops or sram. And for every bit you know about like the bits in the AX register, there are likely hundreds or thousands or more that are used to get the bits into and out of that AX register.
You know that processors run at some clock speed, these days around 2 gigahertz or two billion clocks per second. Think of the clock, which is generated by a crystal, another topic, but the logic sees that clock as a voltage that goes high and zero high and zero at this clock rate 2ghz or whatever (gameboy advances are 17mhz, old ipods around 75mhz, original ibm pc 4.77mhz).
So transistors used as switches allow us to take voltage and turn it into the ones and zeros we are familiar with both as hardware engineers and software engineers, and go so far as to give us AND, OR, and NOT logic functions. And we have these magic crystals that allow us to get an accurate oscillation of voltage.
So we can now do things like say, if the clock is a one, and my state variable says I am in the fetch instruction state, then I need to switch some gates so that the address of the instruction I want, which is in the program counter, goes out on the memory bus, so that the memory logic can give me my instruction for MOV AL,61h. You can look this up in a x86 manual, and find that some of those opcode bits say this is a mov operation and the target is the lower 8 bits of the EAX register, and the source of the mov is an immediate value which means it is in the memory location after this instruction. So we need to save that instruction/opcode somewhere and fetch the next memory location on the next clock cycle. so now we have saved the mov al, immediate and we have the value 61h read from memory and we can switch some transistor logic so that bit 0 of that 61h is stored in the bit 0 flipflop of al and bit 1 to bit 1, etc.
How does all that happen you ask? Think about a python function performing some math formula. you start at the top of the program with some inputs to the formula that come in as variables, you have individual steps through the program that might add a constant here or call the square root function from a library, etc. And at the bottom you return the answer. Hardware logic is done the same way, and today programming languages are used one of which looks a lot like C. The main difference is your hardware functions might have hundreds or thousands of inputs and the output is a single bit. On every clock cycle, bit 0 of the AL register is being computed with a huge algorithm depending how far out you want to look. Think about that square root function you called for your math operation, that function itself is one of these some inputs produce an output, and it may call other functions maybe a multiply or divide. So you likely have a bit somewhere that you can think of as the last step before bit 0 of the AL register and its function is: if clock is one then AL[0] = AL_next[0]; else AL[0] = AL[0]; But there is a higher function that contains that next al bit computed from other inputs, and a higher function and a higher function and much of these are created by the compiler in the same way that your three lines of python can turn into hundreds or thousands of lines of assembler. A few lines of HDL can become hundreds or thousands or more transistors. hardware folks dont normally look at the lowest level formula for a particular bit to find out all the possible inputs and all the possible ANDs and ORs and NOTs that it takes to compute any more than you probably inspect the assembler generated by your programs. but you could if you wanted to.
A note on microcoding, most processors do not use microcoding. you get into it with the x86 for example because it was a fine instruction set for its day but on the surface struggles to keep up with modern times. other instruction sets do not need microcoding and use logic directly in the way I described above. You can think of microcoding as a different processor using a different instruction set/assembly language that is emulating the instruction set that you see on the surface. Not as complicated as when you try to emulate windows on a mac or linux on windows, etc. The microcoding layer is designed specifically for the job, you may think of there only being the four registers AX, BX, CX, DX, but there are many more inside. And naturally that one assembly program somehow can get executed on multiple execution paths in one core or multiple cores. Just like the processor in your alarm clock or washing machine, the microcode program is simple and small and debugged and burned into the hardware hopefully never needing a firmware update. At least ideally. but like your ipod or phone for example you sometimes do want a bug fix or whatever and there is a way to upgrade your processor (the bios or other software loads a patch on boot). Say you open the battery compartment to your TV remote control or calculator, you might see a hole where you can see some bare metal contacts in a row, maybe three or 5 or many. For some remotes and calculators if you really wanted to you could reprogram it, update the firmware. Normally not though, ideally that remote is perfect or perfect enough to outlive the TV set. Microcoding provides the ability to get the very complicated product (millions, hundreds of millions of transistors) on the market and fix the big and fixable bugs in the field down the road. Imagine a 200 million line python program your team wrote in say 18 months and having to deliver it or the company will fail to the competitions product. Same kind of thing except only a small portion of that code you can update in the field the rest has to remain carved in stone. for the alarm clock or toaster, if there is a bug or the thing needs help you just throw it out and get another.
If you dig through wikipedia or just google stuff you can look at the instruction sets and machine language for things like the 6502, z80, 8080, and other processors. There may be 8 registers and 250 instructions and you can get a feel from the number of transistors that that 250 assembly instructions is still a very high level language compared to the sequence of logic gates it takes to compute each bit in a flip flop per clock cycle. You are correct in that assumption. Except for the microcoded processors, this low level logic is not re-programmable in any way, you have to fix the hardware bugs with software (for hardware that is or going to be delivered and not scrapped).
Look up that Petzold book, he does an excellent job of explaining stuff, far superior to anything I could ever write.