Personality Cafe banner

21 - 40 of 312 Posts

·
The spirit of the spirits
Joined
·
10,917 Posts
Discussion Starter #21
Found an awesome book in my uni's library today in which I found comprehensible explanation of CPU pipeline and other goodies. So I want to share some stuff. (If anyone is wondering when that book was released, it was released at around Athlon XP and Pentium 4 era)

Just before reading I want to mention that information isn't given in actual chronological order and is adjusted to make explanation easier.

Days before CPU cache and pipelining

CPU was getting information from RAM and RAM got information it needed from hard drive. That was slow due to hard drive's being slow and having very little RAM. CPU ran at constant clock speed often same as RAM's. If RAM had no information loaded, CPU couldn't do anything and whole computer was held back by the speed of hard drive. That wasn't ideal. Also CPU ran at much lower clock speeds than today. On top of that IPC (instructions per clock cycle) weren't a thing yet as CPU needed to do several cycles to execute instruction.

The clock speed improvements

CPUs were getting faster and faster. To achieve that manufacturer most likely increased clock speed. For clock speed improvements there was need too supply more voltage to be able to change 0 and 1 binary fast and reliably. In electrical terms keep oscillations (0s and 1s) strong enough to be detectable by CPUs rest of internals as 0s and 1s. If there isn't enough voltage, then CPUs internals will fail to detect 0 or 1 state and make incorrect calculations, eventually leading to crashing.

Die shrinking and litography

In order to increase computational power there was need to put more transistors. While in theory it's possible to just make CPU bigger, in practice it's not good at all. First of all you could only improve size up to some point, then due to size of chip electrical signals would need to travel longer distances and they would achieve their needed location slower leading to slower processing speed and the last thing is that you can only fit so many transistors using one lithography before CPU would get too hot to operate (due to need to ramp up electrical properties for reliable transfers of electrical signals). To counter those problems CPU engineers even before hitting limits of either started to think of solutions and they came up with die shrinking using smaller lithography.

Making CPUs utilizing smaller lithographies enabled manufacturers to increase transistor count per given area. Besides that CPU makers can reduce voltage needs due to much smaller distances each signal would need to travel. And as consumers we will not need to have huge CPUs, which mostly was a big concern for portable devices.

Longer electrical signal travel distances would have meant higher voltage needs. When manufacturer can choose between more voltage and smaller lithography, the choice is lithography as you will need less power to complete same tasks faster. Therefore making CPU much more efficient.

Worth mentioning that besides R&D costs making smaller CPU would cost less, which would make manufacturer and buyer happier.

CPU cache

CPUs got faster and faster with always increasing clock speeds and RAM always had to play catch up. At some point RAM couldn't get faster at same rate as CPU did. Engineers had a problem to solve. How they are supposed to make CPUs with higher clock speeds without fast RAM. They could have just increased clock speed of CPU, but then would a big problem. While for some cycles CPU will be fed date, for some it will have to wait for RAM to fill up CPU with data. Therefore CPU would need to perform empty cycles during information loading from RAM. This would have been wasteful and inefficient as CPU would sip power and do nothing and consumer would waste time while loading from RAM would be happening.

At the time we could split multipliers and not rely on classical 1:1 ratio for CPU and RAM clock speeds, but with penalties of performance. Soon they came up with solution. The cache.

What it enabled to do is while RAM was slower than CPU in terms of refreshing rate, cache would store small amount of date fast and let CPU perform tasks, when RAM could load data at the time. This way efficiency was greatly improved and there was no need to keep computer clock speed ratio as 1:1 for optimal performance. That split could be made bigger and bigger and CPU clock speed improved further and further.

The IPC problem

While all those improvements are nice and highly helpful there was another problem. Even with all those improvements CPU cycling rate could be improved, but how much data could be processed in each cycle was still limited. For example you needed 12 cycles to execute one instruction. Let's say our theoretical CPU works at 12Hz and does this operation in 12 seconds. If we double the clock speed then same task could be made in 6 seconds. And if we improved that once again two times, same task that once took 6 seconds would now could be performed in just 3 seconds. Very impressive.

Still we would be limited by 12 cycles needed to complete instruction and increasing clock speed wasn't very easy task.

The pipelining

If in one CPU clock cycle we could process only one information unit in one time frame, then with pipelining we could try to process several information units in single time frame. That's awesome! But how does that work?

For this explanation it would be helpful to imagine work situation with humans. Imagine one man wants to screw in light bulbs in factory. The factory is big and there are 1000 bulb sockets in total. For each bulb let's imagine it would take 1 minute to install a bulb. That would take 1000 minutes for whole task to complete. That's a lot of time, almost 17 hours of pure work. A man can understand and what screwing in light bulb means, but CPU of computer need instructions what to do. Those could be broken down into:
Take ladder
Place it under light bulb socket
Climb up
Screw in light bulb
Take ladder
Walk away
For CPU one abstract task would mean at best 6 smaller ones. Human can think of those automatically.

Well human might not want to work almost 17 hours, so he can ask more people to help him. Then everyone will be doing smaller tasks in same time frame. Some people after finishing work could just go to another light bulb socket, but they would need to wait for other workers to finish task. Therefore people would be less exhausted overall, but task would still take almost 17 hours to do.

If data in CPU pipeline is arranged well CPU could do tasks faster. Then we get back to question how. And for that we must know that some tasks in CPU can be done without need to wait for others to be completed. For visualization this table will be very useful:


In this table we can see that we have 5 instructions to complete and each of them consists of 5 smaller tasks to do. In total we have 9 CPU cycles. In the first cycle CPU can execute only one smaller task, but at the second cycle it can start to execute similar instruction alongside with the previous one. This reaches the peak at 5 instruction being executed per one cycle. We can reach maximum IPC of 5.

Pipeline means that CPU has "task completion line" in which workers in ideal conditions can all work at the same time in line at given time frame. If each stage of of our production line is utilized efficiently, we can perform 5 smaller tasks per time frame and work on 5 instructions per each cycle without need to wait for each to finish before starting to perform other. That would be awesome and indeed it is.

If our light bulb workers could do the same then imagine one person places ladder and moves on to another bulb socket without letting another person climb up. Ouch!

Now someone would say that my light bulb worker example is waste of time and was poor. That would be correct and incorrect at same time.

Correct in a way that it doesn't show theoretical working of CPU pipeline, but very correct in practice.

If CPU was strictly pipelined and "bulb task" was forced to run like that it would crash. The thing is that many tasks can be executed only in line, meanwhile others can be broken down into smaller pieces and done independently.

So the question remains what can done. The answer would be to execute several instructions at same time in line without breaking them down, yet pipelined CPU would be like manager and assign several people to work on light bulbs independently. So now we get back to single worker for single instruction, but instead of groups of people working on one action sequence, several people do sequences without needing to wait for others. Therefore increasing efficiency, speed and etc.

Now we can can reliably execute several instructions per time frame. Therefore per same CPU cycle count multiple instructions can now be executed.

Sadly pipeline length of CPU is hardware based and cannot be changed for each task. So if most often performed task takes 5 cycles and your CPU has pipeline of 12 units, then 5 will be used and others will keep doing nothing until whole pipelines tasks will be completed. So there are dangers of making pipeline too long, meanwhile making it too shot would result in alright working, but poor maximum IPC count. While improvements mentioned earlier are definitive improvements, pipeline improvements can be legit improvements if utilized. If not, then it's a loss. Therefore pipelining is a bit of gamble.

Branch prediction

In order for CPU to know how arrange tasks in pipeline we can use thing called branch prediction. What it does is that it predicts how tasks will look like before their execution. It can help data to be loaded before processing, therefore improving processing speed. That works well if predictions are right, if they aren't then CPU's pipeline will either be poorly loaded or memory loading task would be rewritten. Either way if prediction is incorrect, we face penalties of processing speed. So we have more gamble going on. Good thing is that branch prediction is more often correct than not, so usually we get acceleration of some tasks without penalties.

Core count increase

While CPUs can have more clock speed, cache, pipelining and branch prediction what is we want several big tasks to be done at the same time. It would normally would happen slowly, but if we have more than one data factory (core) or processor things can be done side by side. Two serial tasks could be executed side by side in parallel. In order to do this could reuse same cache, but we need two instruction pipelines working at same refresh rate. Like it was said before not all tasks can be broken down into smaller pieces, therefore those must be done in series without ability to be done in parallel fashion. Meanwhile some tasks can be broken down into smaller ones and be spread into several processing pipelines or we can run two computer programs at ones and utilize several cores at once. In theory we could see big speed gains if all cores are being used, but in reality not everything can be broken down into smaller pieces, meaning that core count doesn't guarantee speed improvements, but offer ability to improve speed if program maker decides to do so.

Hopefully this gave some idea how CPU works. Sorry for messy pipeline explanation, I quite mixed it up with core count, but tried to make clear which is which and what happens in each. Still there's no explanation of what data width means in processing as well as some other things like Hyper Threading.
 

·
The spirit of the spirits
Joined
·
10,917 Posts
Discussion Starter #24
You're an excellent teacher.
Thanks, but I really thought that pipeline description was unintelligible. In the book it was compactly and elegantly explained without weird examples that barely fit into situation.

And I still have no solid understanding of electricity. Which now doesn't matter that much, unless you repair electronics or create those. But back in early computer era it was must and if you truly want to understand older CPUs, memory and other components understanding of electricity would be very helpful.

Another weird thing is that with smaller lithography of CPU, due to all components getting smaller there should be more resistance compared to bigger lithography. Yet smaller CPUs eat less electricity and often run cooler. I can only guess that electricity traveling in long distances must be greater and losses are bigger than those made from higher resistance. Or maybe I'm totally wrong and basically smaller CPUs due to less electricity traveling require less voltage and even with higher resistance, there's just simply less electricity everywhere used, so resistance effects gets much smaller.
 

·
The spirit of the spirits
Joined
·
10,917 Posts
Discussion Starter #33
I've learned that IC's have lower power consumption than DC's, and that power consumption is highly correlated to temperature.
Not always. Phone CPU can be small and low power (5 watt), yet heat up to 80C, yet desktop CPU can reach same 80C and use times more power.

Sure cooling solutions are different, but I highly doubt that if we used same cooling for both, we would see nice correlation.

I also have overclocked FX 6300 a lot. With same cooling at stock max temperature was around 40C at full load, yet at around 4.9GHz maximum temperature was 55C, averaging at 52C. And heat was mostly rising from added voltage and much less from added clock speed.

Now that I settled with 4.1 GHz overclock, my heat output is very low and computer power usage is very low too. I think CPU itself only eats 100 watts or 120 watts at absolutely full load, yet there's very heat. So little that even if computer is loaded and I put my hand near exhaust fan I barely feel any warmth coming out. I can only feel that if GPU is fully loaded too. Even then not a lot.

Heat as far as I know is byproduct of task. The more inefficient task becomes, the more heat you will have and less task done. Something like that. I think it would be measured in Joules.
 

·
Banned
Joined
·
17,616 Posts
Heat as far as I know is byproduct of task. The more inefficient task becomes, the more heat you will have and less task done. Something like that. I think it would be measured in Joules.
Ah, I came across this as well...

Many complex circuits are fabricated on a single chip and hence this simplifies the designing of a complex circuit. And also it improves the performance of the system.
https://www.elprocus.com/difference-between-discrete-circuits-integrated-circuits/
 

·
The spirit of the spirits
Joined
·
10,917 Posts
Discussion Starter #38 (Edited)
The deathmatch of cartridges. AMD Athlon vs Pentium III in mind blowing fight to 1GHz

Late 90s and early 2000s were interesting times. Like never before CPUs were made on cartridges and also AMD tried for the first time to seriously take on Intel. Both companies did everything they could to deliver as fast CPUs as they could. Intel had good time with Pentium II, meanwhile AMD took the crown of value CPU with its K6 series.

Soon one of the biggest milestone in CPU history was reached. 1GHz clock speed. As unreal as it may sound, but AMD only won this title by releasing first 1GHz CPU 2 days earlier.

After breaking speed record, the only thing that was left is to see how fast 1GHz was and this is the deathmatch:
https://www.anandtech.com/show/500/7
https://www.anandtech.com/show/498/6
https://www.tomshardware.com/reviews/athlon-processor,121.html

So which one was the right choice in 2000?


AMD Athlon!

Soon after Athlon AMD was dominating with Athlon XP, Athlon 64, Athlon 64 FX and Athlon 64 X2. While making some really important advancements such as 64 bits, integrated memory controller and multiple core design. Those things were never seen before in consumer market.

Later with less success AMD broke other barriers like first quad core CPU, first six core CPU, first 5 GHz CPU, yet again some AMD CPUs were even the fastest on Earth (FX 8150 for short time could serve as an example).

After Athlon vs Pentium 3 battle, AMD became serious competitor to Intel, like never before.
 

·
The spirit of the spirits
Joined
·
10,917 Posts
Discussion Starter #40
21 - 40 of 312 Posts
Top