Keep imparting your wisdoms here! I'm a budding computer engineer trying to learn about parallel processing! Haha.
lw1990, in case these answers haven't been sufficient, a processor operates with a number of cores (usually a power of 2). These cores complete an instruction every one clock cycle (for the sake of example. in reality they can take longer).
So if you have two cores, they can complete two instructions every clock cycle. Seems like 13.6 GHz worth of compute power is within reach, right? Not quite...
Code is written in a way where your next operation often times relies on the answer to the previous operation. So say the code looked as follows:
add x and y, and save the result to z
now add z and n, and save the result to m.
Well, the computer can't do (x+y=z) at the same time it does (z+n=m), or else the value for z won't be updated yet.
Now, to get even more fun, you have shared "caches", or memory if you like. These caches contain data and instructions redundantly from memory (think RAM or HDD) so that the information can be quickly accessed. There are multiple levels of caches, and cores share caches at certain levels of caches (usually L3), but have their own caches at lower levels for super fast access (usually L1 and L2). If one core changes something in its L1 cache, then the other cores' L1 caches are now invalidated. It takes time for the new information to be placed into the other L1 caches, so there are even more real world delays.
Sorry for the super long answer, but caching really gets me going. The main reason computers are "so slow" any more has to do with how long it takes to access certain information, and caches look to reduce that time. Perfect caching, while impossible right now and maybe always, would result in much faster start times, response times, and the like.
Hope this helps!