A data set is sharded when it's broken into pieces and distributed across nodes. Simple (but with issues) example would be sharding all names in a data set about people across 26 nodes - one for each letter in the alphabet. Unfortunately that example balances badly (your Z node will be underused and your S node might be swamped).
A shard is replicated when there's more than one copy.
Shards enable parallel processing on separate nodes. Replicas improve throughput as you have more choices about where to process data and they improve availability. Once you have the basics, there are many good discussions on sharding / replication strategies on this list.