Wednesday, January 6, 2016

Boundaries and limits for SharePoint 2013

Web Application Limits

The following table lists the recommended guidelines for web applications.

Limit
Maximum value
Limit type
Notes
Web application
20 per farm
Supported
We recommended limiting the number of web applications as much as possible. Create additional host named site collections where possible instead of adding web applications.
Zone
5 per web application
Boundary
The number of zones defined for a farm is hard-coded to 5. Zones include Default, Intranet, Extranet, Internet, and custom.
Managed path for host-named site collections
20 per farm
Supported
Managed paths for host-named site collections apply at the farm level. Each managed path that is created can be applied in any Web application.
Managed path for path-based site collections
20 per web application
Supported
Managed paths are cached on the web server, and CPU resources are used to process incoming requests against the managed path list.
Managed paths for path-based site collections apply at the Web application level. You can create a different set of managed paths for each Web application. Exceeding 20 managed paths per web application adds more load to the web server for each request.
If you plan to exceed twenty managed paths in a given web application, we recommend that you test for acceptable system performance.
Solution cache size
300 MB per web application
Threshold
The solution cache allows the InfoPath Forms service to hold solutions in cache in order to speed up retrieval of the solutions. If the cache size is exceeded, solutions are retrieved from disk, which may slow down response times. You can configure the size of the solution cache by using the Windows PowerShell cmdlet Set-SPInfoPathFormsService.

Limit
Maximum value
Limit type
Notes
Application pools
10 per web server
Threshold
The maximum number is determined by hardware capabilities.
This limit is dependent largely upon:
  • The amount of memory allocated to the web servers
  • The workload that the farm is serving, that is, the user base and the usage characteristics (a single highly active application pool can utilize 10 GB or more)

Content Database Limits

The following table lists the recommended guidelines for content databases.

 

Limit
Maximum value
Limit type
Notes
Number of content databases
500 per farm
Supported
The maximum number of content databases per farm is 500. With 500 content databases per web application, end user operations such as opening the site or site collections are not affected. But administrative operations such as creating a new site collection will experience decrease in performance. We recommend that you use Windows PowerShell to manage the web application when a large number of content databases are present, because the management interface might become slow and difficult to navigate.
With 200GB per content database, and 500 content databases per farm, SharePoint Server 2013 supports 100TB of data per farm.
Content database size (general usage scenarios)
200 GB per content database
Supported
The default file size is 50 MB, which can be increased to a maximum of 2 GB. You can fit 100 files in each content database. Multiple site collections can share a single content database. Each site collection needs to be fully stored in a single content database.
We strongly recommended limiting the size of content databases to 200 GB, except when the circumstances in the following rows in this table apply.
If you are using Remote BLOB Storage (RBS), the total volume of remote BLOB storage and metadata in the content database must not exceed the 200GB limit.
Content database size (all usage scenarios)
4 TB per content database
Supported
Content databases of up to 4 TB are supported when the following requirements are met:
·         Disk sub-system performance of 0.25 IOPS per GB. 2 IOPS per GB is recommended for optimal performance.
·         You must have developed plans for high availability, disaster recovery, future capacity, and performance testing.
You should also carefully consider the following factors:
·         Requirements for backup and restore may not be met by the native SharePoint Server 2013 backup for content databases larger than 200 GB. It is recommended to evaluate and test SharePoint Server 2013 backup and alternative backup solutions to determine the best solution for your specific environment.
·         It is strongly recommended to have proactive skilled administrator management of the SharePoint Server 2013 and SQL Server installations.
·         The complexity of customizations and configurations on SharePoint Server 2013 may necessitate refactoring (or splitting) of data into multiple content databases. Seek advice from a skilled professional architect and perform testing to determine the optimum content database size for your implementation. Examples of complexity may include custom code deployments, use of more than 20 columns in property promotion, or features listed as not to be used in the over 4 TB section below.
·         Refactoring of site collections allows for scale out of a SharePoint Server 2013 implementation across multiple content databases. This permits SharePoint Server 2013 implementations to scale indefinitely. This refactoring will be easier and faster when content databases are less than 200 GB.
·         It is suggested that for ease of backup and restore that individual site collections within a content database be limited to 100 GB.
·         Not recommend the use of content databases that exceed 4 TB, except in document archive scenarios (described in the next row in this table). If, in the future, you need to upgrade your SharePoint Server 2013 installation, upgrading the site collections within the content databases can be very difficult and time consuming.
It is strongly recommended that you scale out across multiple content databases, rather than exceed 4 TB of data in a single content database.
Content database size (document archive scenario)
No explicit content database limit
Supported
Content databases with no explicit size limit for use in document archive scenarios are supported when the following requirements are met:
·         You must meet all requirements from the “Content database size (all usage scenarios)” limit earlier in this table, and you should ensure that you have carefully considered all the factors discussed in the Notes field of that limit.
·         SharePoint Server 2013 sites must be based on Document Center or Records Center site templates.
·         Less than 5% of the content in the content database is accessed each month on average, and less than 1% of content is modified or written each month on average.
·         Do not use alerts, workflows, link fix-ups, or item level security on any SharePoint Server 2013 objects in the content database.
Note :   Document archive content databases can be configured to accept documents from Content Routing workflows.
Content database items
60 million items including documents and list items
Supported
The largest number of items per content database that has been tested on SharePoint Server 2013 is 60 million items, including documents and list items. If you plan to store more than 60 million items in SharePoint Server 2013, you must deploy multiple content databases.
Site collections per content database
10,000 maximum (2,500 non-Personal site collections and 7,500 Personal Sites, or 10,000 Personal Sites alone)
Supported
We strongly recommended limiting the number of site collections in a content database to 5,000. However, up to 10,000 site collections in a database are supported. Note that in a content database with up to 10,000 total site collections, a maximum of 2,500 of these can be non-Personal site collections. It is possible to support 10,000 Personal site collections if they are the only site collections within the content database.
These limits relate to speed of upgrade. The larger the number of site collections in a database, the slower the upgrade with respect to both database upgrade and site collection upgrades.
The limit on the number of site collections in a database is subordinate to the limit on the size of a content database that has more than one site collection. Therefore, as the number of site collections in a database increases, the average size of the site collections it contains must decrease.
Exceeding the 5,000 site collection limit puts you at risk of longer downtimes during upgrades. If you plan to exceed 5,000 site collections, we recommend that you have a clear upgrade strategy to address outage length and operations impact, and obtain additional hardware to speed up the software updates and upgrades that affect databases.
To set the warning and maximum levels for the number of sites in a content database, use the Windows PowerShell cmdlet Set-SPContentDatabase with the -WarningSiteCount parameter. For more information, see Set-SPContentDatabase.
Remote BLOB Storage (RBS) storage subsystem on Network Attached Storage (NAS)
Time to first byte of any response from the NAS should remain within 40 milliseconds 95% of the time.

Boundary
When SharePoint Server 2013 is configured to use RBS, and the BLOBs reside on NAS storage, consider the following supported limit.
From the time that SharePoint Server 2013 requests a BLOB, until it receives the first byte from the NAS, 95% of the time no more than 40 milliseconds can pass.

Site Collection Limits
The following table lists the recommended guidelines for site collections.

 

Limit
Maximum value
Limit type
Notes
Site collections per farm
750,000 (500,000 Personal Sites and 250,000 other sites per farm)
Supported
The maximum recommended number of site collections per farm is 500,000 Personal Sites plus 250,000 for all other site templates. The Sites can all reside on one web application, or can be distributed across multiple web applications.
Note that this limit is affected by other factors that might reduce the effective number of site collections that can be supported by a given content database. Care must be exercised to avoid exceeding supported limits when a container object, such as a content database, contains a large number of other objects. For example, if a farm contains a smaller total number of content databases, each of which contains a large number of site collections, farm performance might be adversely affected long before the supported limit for the number of site collections is reached.
For example, Farm A contains a web application that has 200 content databases, a supported configuration. If each of these content databases contains 1,000 site collections, the total number of site collections in the web application will be 200,000, which falls within supported limits. However, if each content database contains 10,000 site collections, even though this number is supported for a content database, the total number of site collections in the farm will be 2,000,000, which exceeds the limit for the number of site collections per web application.
Memory usage on the web servers should be monitored, as memory usage is dependent on usage patterns and how many sites are being accessed in given timeframe. Similarly, the crawl targets might also exhibit memory pressure, and if so the application pool should be configured to recycle before available memory on any web server drops to less than 2 GB.
Web site
250,000 per site collection
Supported
The maximum recommended number of sites and subsites is 250,000 sites.
You can create a very large total number of web sites by nesting subsites. For example, in a shallow hierarchy with 100 sites, each with 1,000 subsites, you would have a total of 100,000 web sites. Or a deep hierarchy with 100 sites, each with 10 subsite levels would also contain a total of 100,000 web sites.
Note: Deleting or creating a site or subsite can significantly affect a site’s availability. Access to the site and subsites will be limited while the site is being deleted. Attempting to create many subsites at the same time may also fail.
Site collection size
Maximum size of the content database
Supported
A site collection can be as large as the content database size limit for the applicable usage scenario.
In general,strongly recommend limiting the size of site collections to 100 GB for the following reasons:
·         Certain site collection actions, such as site collection backup/restore or the Windows PowerShell cmdlet Move-SPSite, cause large SQL Server operations which can affect performance or fail if other site collections are active in the same database. For more information, see Move-SPSite.
·         SharePoint site collection backup and restore is only supported for a maximum site collection size of 100 GB. For larger site collections, the complete content database must be backed up. If multiple site collections larger than 100 GB are contained in a single content database, backup and restore operations can take a long time and are at risk of failure.
Number of device channels per publishing site collection
10
Boundary
The maximum allowed number of device channels per publishing site collection is 10.

List and Library Limits

Limit
Maximum value
Limit type
Notes
List row size
8,000 bytes per row
Boundary
Each list or library item can only occupy 8,000 bytes in total in the database. 256 bytes are reserved for built-in columns, which leaves 7,744 bytes for end-user columns.
File size
2 GB
Boundary
The default maximum file size is 250 MB. This is a configurable limit that can be increased up to 2 GB (2,047 MB). However, a large volume of very large files can affect farm performance.
Documents
30,000,000 per library
Supported
You can create very large document libraries by nesting folders, or using standard views and site hierarchy. This value may vary depending on how documents and folders are organized, and by the type and size of documents stored.
Major versions
400,000
Supported
If you exceed this limit, basic file operations—such as file open or save, delete, and viewing the version history— may not succeed.
Minor versions
511
Boundary
The maximum number of minor file versions is 511. This limit cannot be exceeded.
Items
30,000,000 per list
Supported
You can create very large lists using standard views, site hierarchies, and metadata navigation. This value may vary depending on the number of columns in the list and the usage of the list.
Rows size limit
6 table rows internal to the database used for a list or library item
Supported
Specifies the maximum number of table rows internal to the database that can be used for a list or library item. To accommodate wide lists with many columns, each item may be wrapped over several internal table rows, up to six rows by default. This is configurable by farm administrators through the object model only. The object model method is SPWebApplication.MaxListItemRowStorage.
Bulk operations
100 items per bulk operation
Boundary
The user interface allows a maximum of 100 items to be selected for bulk operations.
List view lookup threshold
8 join operations per query
Threshold
Specifies the maximum number of joins allowed per query, such as those based on lookup, person/group, or workflow status columns. If the query uses more than eight joins, the operation is blocked. This does not apply to single item operations. When using the maximal view via the object model (by not specifying any view fields), SharePoint will return up to the first eight lookups.
Note: After applying the SharePoint Server 2013 cumulative update package released on August 13, 2013 (https://support.microsoft.com/en-us/kb/2817616), the default value is increased from 8 to 12.
List view threshold
5,000
Threshold
Specifies the maximum number of list or library items that a database operation, such as a query, can process at the same time outside the daily time window set by the administrator during which queries are unrestricted.
List view threshold for auditors and administrators
20,000
Threshold
Specifies the maximum number of list or library items that a database operation, such as a query, can process at the same time when they are performed by an auditor or administrator with appropriate permissions. This setting works with Allow Object Model Override.
Subsite
2,000 per site view
Threshold
The interface for enumerating subsites of a given web site does not perform well as the number of subsites surpasses 2,000. Similarly, the All Site Content page and the Tree View Control performance will decrease significantly as the number of subsites grows.
Coauthoring in Word and PowerPoint for .docx, .pptx and .ppsx files
10 concurrent editors per document
Threshold
Recommended maximum number of concurrent editors is 10. The boundary is 99.
If there are 99 co-authors who have a single document opened for concurrent editing, each successive user sees a "File in use" error, and can only open a read-only copy.
More than 10 co-editors will lead to a gradually degraded user experience with more conflicts, and users might have to go through more iterations to successfully upload their changes to the server.
Security scope
50,000 per list
Threshold
The maximum number of unique security scopes set for a list cannot exceed 50,000.
For most farms, we recommend that you consider lowering this limit to 5,000 unique scopes. For large lists, consider using a design that uses as few unique permissions as possible.
When the number of unique security scopes for a list exceeds the value of the list view threshold (set by default at 5,000 list items), additional SQL Server round trips take place when the list is viewed, which can adversely affect list view performance.
A scope is the security boundary for a securable object and any of its children that do not have a separate security boundary defined. A scope contains an Access Control List (ACL), but unlike NTFS ACLs, a scope can include security principals that are specific to SharePoint Server 2013. The members of an ACL for a scope can include Windows users, user accounts other than Windows users (such as forms-based accounts), Active Directory groups, or SharePoint groups.

Security Limits
Limit
Maximum value
Limit type
Notes
Number of SharePoint groups a user can belong to
5,000
Supported
This is not a hard limit but it is consistent with Active Directory guidelines. There are several things that affect this number:
  • The size of the user token
  • The groups cache: SharePoint Server 2013 has a table that caches the number of groups a user belongs to as soon as those groups are used in access control lists (ACLs).
  • The security check time: as the number of groups that a user is a member of increases, the time that is required for the access check increases also.
Users in a site collection
2 million per site collection
Supported
You can add millions of people to your web site by using Microsoft Windows security groups to manage security instead of using individual users.
This limit is based on manageability and ease of navigation in the user interface.
When you have many entries (security groups of users) in the site collection (more than one thousand), you should use Windows PowerShell to manage users instead of the UI. This will provide a better management experience.
Active Directory Principles/Users in a SharePoint group
5,000 per SharePoint group
Supported
SharePoint Server 2013 enables you to add users or Active Directory groups to a SharePoint group.
Having up to 5,000 users (or Active Directory groups or users) in a SharePoint group provides acceptable performance.
The activities most affected by this limit are as follows:
  • Fetching users to validate permissions. This operation takes incrementally longer with growth in number of users in a group.
  • Rendering the membership of the view. This operation will always require time.
SharePoint groups
10,000 per site collection
Supported
Above 10,000 groups, the time to execute operations is increased significantly. This is especially true of adding a user to an existing group, creating a new group, and rendering group views.
Security principal: size of the Security Scope
5,000 per Access Control List (ACL)
Supported
The size of the scope affects the data that is used for a security check calculation. This calculation occurs every time that the scope changes. There is no hard limit, but the bigger the scope, the longer the calculation takes.

References:  Microsoft Site

14 hive and 15 hive folders in SharePoint


SharePoint 2013: The 15 Hive and other important directories


The 15 Hive is a special folder which is created during SharePoint 2013 installation. All the important files for supporting SharePoint framework like web.config, Features, Web Parts, User Controls, Help Files etc are stored in the SharePoint 2013 server file system inside the 15 Hive.



The 15 Hive - Folder Location
The 15 Hive folder is located at the following path
C:\Program Files\Common files\Microsoft Shared\Web Server Extensions\15

The 15 Hive - Folder Structure
The 15 Hive has a definite folder structure which holds the core SharePoint server files.
  • ADMISAPI:- It contains soap services for Central Administration. If this directory is altered, remote site creation and other methods exposed in the service will not function correctly.
  • Bin:- The directory contains all the core binary files, utilities which used by SharePoint Services.  command line tools such as STSADM.EXE also present in this folder.
  • Client:- This directory contains files that are used for creating office apps. 
  • Config:- This directory contains files used to extend IIS Web sites with SharePoint Server. If this directory or its contents are altered, Web application will not function correctly.
  • HCCab:- This directory has a set of cab files which has content information used by the SharePoint help system.
  • Help:- The folder contains html help file (.chm) used by the configuration wizard.
  • ISAPI:- This directory contains all the standard Web Services for SharePoint and resources and configuration files that the web services use.
  • Logs:- This is the folder where we can have all the SharePoint related logs will see. This is important when any problem or error occur in SharePoint you have to trace the logs files in this folder to get the error messages.
  • Policy:- This directory contains SharePoint 2013 Server policy files.
  • Resources:- This directory contains the core.resx file used for creating language packs for SharePoint.   by which different SharePoint sites with different languages and cultures can be create.
  • Template:- It contains the core web site functionality like the features, templates, configurations, resources of a web site.
  • UserCode:- This directory contains files used to support sandbox solution.
  • Web Clients:- This directory contains files related to Client Object Model.
  • Web Services:- This directory is the root directory where SharePoint back-end Web services are hosted, for example, Excel and Search.

Other Important Directories In SharePoint 2013 
1) C:\Inetpub\wwwroot\wss - This directory (or the corresponding directory under the Inetpub root on the server) is used as the default location for IIS Web sites.
2) C:\ProgramFiles\Microsoft Office Servers\15.0 - This directory is the installation location for SharePoint Server 2013 binaries and data. The directory can be changed during installation.
3) C:\ProgramFiles\Microsoft Office Servers\15.0\WebServices - This directory is the root directory where SharePoint back-end Web services are hosted, for example, Excel and Search.
4) C:\ProgramFiles\Microsoft Office Servers\15.0\Data - This directory is the root location where local data is stored, including search indexes.
5) C:\ProgramFiles\Microsoft Office Servers\15.0\Logs – This directory is the location where the run-time diagnostic logging is generated.

SharePoint 2010 - The 14 Hive and other important directories

The 14 Hive

The 14 Hive is a special folder which is created during Sp 2010 installation. All the important files for supporting Sharepoint framework like web.config, Features, Web Parts, User Controls, Help Files etc are stored in the SharePoint 2010 server file system inside the 14 Hive. 



The 14 Hive - Folder Location
The 14 Hive folder is located at the following path
C:\Program Files\Common files\Microsoft Shared\Web Server Extensions\14

The 14 Hive - Folder Structure
The 14 Hive has a definite folder structure which holds the core SharePoint server files.
  • ADMISAPI:- It contains soap services for Central Administration. If this directory is altered, remote site creation and other methods exposed in the service will not function correctly.
  • Bin:- The directory contains all the core binary files, utilities which used by SharePoint Services.  command line tools such as STSADM.EXE also present in this folder.
  • Config:- This directory contains files used to extend IIS Web sites with SharePoint Server. If this directory or its contents are altered, Web application will not function correctly.
  • HCCab:- This directory has a set of cab files which has content information used by the SharePoint help system.
  • Help:- The folder contains html help file (.chm) used by the configuration wizard.
  • ISAPI:- This directory contains all the standard Web Services for SharePoint and resources and configuration files that the web services use.
  • Logs:- This is the folder where we can have all the SharePoint related logs will see. This is important when any problem or error occur in SharePoint you have to trace the logs files in this folder to get the error messages.
  • Resources:- This directory contains the core.resx file used for creating language packs for SharePoint.   by which different SharePoint sites with different languages and cultures can be create.
  • Template:- It contains the core web site functionality like the features, templates, configurations, resources of a web site.
  • UserCode:- This directory contains files used to support sandbox solution.
  • Web Clients:- This directory contains files related to Client Object Model.
  • Web Services:- This directory is the root directory where SharePoint back-end Web services are hosted, for example, Excel and Search.

Other Important SharePoint 2010 Directories
1) C:\Inetpub\wwwroot\wss - This directory (or the corresponding directory under the Inetpub root on the server) is used as the default location for IIS Web sites.
2) C:\ProgramFiles\Microsoft Office Servers\14.0 - This directory is the installation location for SharePoint Server 2010 binaries and data. The directory can be changed during installation.
3) C:\ProgramFiles\Microsoft Office Servers\14.0\WebServices - This directory is the root directory where SharePoint back-end Web services are hosted, for example, Excel and Search.
4) C:\ProgramFiles\Microsoft Office Servers\14.0\Data - This directory is the root location where local data is stored, including search indexes.
5) C:\ProgramFiles\Microsoft Office Servers\14.0\Logs – This directory is the location where the run-time diagnostic logging is generated.


References:   From Microsoft site

Saturday, January 2, 2016

Troubleshoot Search Crawl Issues ( Best Possible ways)

To View Crawl Log
·         Verify that the user account that is performing this procedure is an administrator for the Search service application.
·         In Central Administration, in the Quick Launch, click Application Management.
·         On the Application Management page, under Service Applications, click Manage service applications.
·         On the Service Applications page, in the list of service applications, click the Search service application that you want.
·         On the Search Administration page, in the Quick Launch, under Crawling, click Crawl Log.
·         On the Crawl Log – Content Source page, click the view that you want.






Below fields you can see in Crawl Log
The Content Source, Host Name, and Crawl History views show data in the following columns:
  • Successes. Items that were successfully crawled and searchable.
  • Warnings. Items that might not have been successfully crawled and might not be searchable.
  • Errors. Items that were not successfully crawled and might not be searchable.
  • Deletes. Items that were removed from the index and are no longer searchable.
  • Top Level Errors. Errors in top-level documents, including start addresses, virtual servers, and content databases. Every top-level error is counted as an error, but not all errors are counted as top-level errors. Because the Errors column includes the count from the Top Level Errors column, top-level-errors are not counted again in the Host Name view.
  • Not Modified. Items that were not modified between crawls.
  • Security Update. Items whose security settings were crawled because they were modified.

Crawl Log Timer Job
            By default, the data for each view in the crawl log is refreshed every five minutes by the timer job Crawl Log Report for Search Application <Search Service Application name>. You can change the refresh rate for this timer job, but in general, this setting should remain as is.
To check the status of the crawl log timer job
  1. Verify that the user account that is performing this procedure is a member of the Farm Administrators SharePoint group.
  2. In Central Administration, in the Monitoring section, click Check job status.
  3. On the Timer Job Status page, click Job History.
  4. On the Job History page, find Crawl Log Report for Search Application <Search Service Application name> for the Search service application that you want and review the status.
To change the refresh rate for the crawl log timer job
  1. Verify that the user account that is performing this procedure is a member of the Farm Administrators SharePoint group.
  2. In Central Administration, in the Monitoring section, click Check job status.
  3. On the Timer Job Status page, click Job History.
  4. On the Job History page, click Crawl Log Report for Search Application <Search Service Application name> for the Search service application that you want.
  5. On the Edit Timer Job page, in the Recurring Schedule section, change the timer job schedule to the interval that you want.
  6. Click OK.





Troubleshoot Crawl Problems:

First we need to see all servers in the farm are in the same level ( All cumulative updates, service packs are in same in all the farm and need to look into recent upgrade error log file in uls logs) . If you find any errors in log file, need to solve those issues and run psconfig again.

This section provides information about common crawl log errors, crawler behavior, and actions to take to maintain a healthy crawling environment.

When An Item is from the Index

When a crawler cannot find an item that exists in the index because the URL is obsolete or it cannot be accessed due to a network outage, the crawler reports an error for that item in that crawl. If this continues during the next three crawls, the item is deleted from the index. For file-share content sources, items are immediately deleted from the index when they are deleted from the file share.

“Object could not be found” Error for a file Share
This error can result from a crawled file-share content source that contains a valid host name but an invalid file name. For example, with a host name and file name of \\ValidHost\files\file1, \\ValidHost exists, but the file file1 does not. In this case, the crawler reports the error "Object could not be found" and deletes the item from the index. The Crawl History view shows:
  • Error: 1
  • Deletes: 1
  • Top Level Errors: 1 (\\ValidHost\files\file1 shows as a top-level error because it is a start address)
The Content Source view shows:
  • Errors: 0
  • Deletes: 0
  • Top Level Errors: 0
The Content Source view will show all zeros because it only shows the status of items that are in the index, and this start address was not entered into the index. However, the Crawl History view shows all crawl transactions, whether or not they are entered into the index.

“Network path for item could not be resolved “error for a file share
This error can result from a crawled file-share content source that contains an invalid host name and an invalid file name. For example, with a host name and file name of \\InvalidHost\files\file1, both \\InvalidHost and the file file1 do not exist. In this case, the crawler reports the error "Network path for item could not be resolved" and does not delete the item from the index. The Crawl History view shows:
  • Errors: 1
  • Deletes: 0
  • Top Level Errors: 1 (\\InvalidHost\files\file1 shows as a top-level error because it is a start address)
The Content Source view shows:
  • Error: 0
  • Deletes: 0
  • Top Level Errors: 0
The item is not deleted from the index, because the crawler cannot determine if the item really does not exist or if there is a network outage that prevents the item from being accessed.

Obsolete Start Address

The crawl log reports top-level errors for top-level documents, or start addresses. To ensure healthy content sources, you should take the following actions:
  • Always investigate non-zero top-level errors.
  • Always investigate top-level errors that appear consistently in the crawl log.
  • Otherwise, we recommend that you remove obsolete start addresses every two weeks after contacting the owner of the site.
To troubleshoot and delete obsolete start addresses
  1. Verify that the user account that is performing this procedure is an administrator for the Search service application.
  2. When you have determined that a start address might be obsolete, first determine whether it exists or not by pinging the site. If you receive a response, determine which of the following issues caused the problem:
    • If you can access the URL from a browser, the crawler could not crawl the start address because there were problems with the network connection.
    • If the URL is redirected from a browser, you should change the start address to be the same as the new address.
    • If the URL receives an error in a browser, try again at another time. If it still receives an error after multiple tries, contact the site owner to ensure that the site is available.
  3. If you do not receive a response from pinging the site, the site does not exist and should be deleted. Confirm this with the site owner before you delete the site.

Access Denied

When the crawl log continually reports an "Access Denied" error for a start address, the content access account might not have Read permissions to crawl the site. If you are able to view the URL with an administrative account, there might be a problem with how the permissions were updated. In this case, you should contact the site owner to request permissions. 

Number Set to zero in content source view during host distribution
During a host distribution, the numbers in all columns in Content Source view are set to zero. This happens because the numbers in Content Source view are sourced directly from the crawl database tables. During a host distribution, the data from these tables are being moved, so the values remain at zero during the duration of the host distribution.
After the host distribution is complete, run an incremental crawl of the content sources in order to restore the original numbers.

Showing File Shares deletes in Content Source View
When documents are deleted from a file-share content source that was successfully crawled, they are immediately deleted from the index during the next full or incremental crawl. These items will show as errors in the Content Source view of the crawl log, but will show as deletes in other views.

Stopping or restarting the SharePoint server search service causes crawl log transaction discrepancy

The SharePoint Server Search service (OSearch14) might be reset or restarted due to administrative operations or server functions. When this occurs, a discrepancy in the crawl history view of the crawl log can occur. You may notice a difference between the number of transactions reported per crawl and the actual number of transactions performed per crawl. This can occur because the OSearch14 service stores active transactions in memory and writes these transactions after they are completed. If the OSearch14 service is stopped, reset, or restarted before the in-memory transactions have been written to the crawl log database, the number of transactions per crawl will be shown incorrectly.

Deleted by Gatherer this item was deleted because its parent was deleted
  The best possible ways to troubleshoot this issue is

Multiple Crawls overlapping cause results to be deleted
I have read in one article/blog that this error can occur if multiple crawls that occur at the same time overlap. It may lead to a collision with the results getting put into the search index. If this happens too often the content will be removed as being unseen. There is a three day limit on keeping results and if the content is not found again within that time then it will be removed. As an example the content and the people queries may be running at the same time.  I'm not convinced by this proposition, I had the system setup to have just one incremental crawl running, and the problem still occurred, also the system always waited for each incremental crawl to finish before starting another one (eg when there is just a few minutes between them in the schedule).
SharePoint crawler can’t access SharePoint website to crawl it
By design the local SharePoint sites crawl uses the default url to access the SharePoint sites, the crawler (running from the index server) will attempt to visit the public/default URL. Some blog/forums explains that "deleted by the gatherer" mostly occurs when the crawler is not able to access the SharePoint website at the time of crawling.  In addition I've seen it stated that the indexer visits the default view for a list in order to index that list. 
There might be a number of reasons for why the indexer can't reach a SharePoint URL:
·         Connectivity issues, DNS issues
you could get the issue if you have DNS issues when then indexer is trying to crawl, if the indexer cannot resolve links correctly or cannot verify credentials in AD.

I've seen a couple of forums that suggests it resides with an intermittent DNS resolution issue and is nothing to do with SharePoint configuration.    Another user noticed an issue whereby the SharePoint crawler was not crawling any sites that had a DNS alias to itself.  eg, server name was myserver01 and there was a DNS alias called myportal.  The crawler would not crawlhttp://myportal and anything under it. 
·         SharePoint server under heavy load
The problem may be caused when the server is under heavy load or when very large documents are being indexed, thus requiring more time for the indexer to access the site.

Check the index server can reach the URL

I've had inter-server communication issues before; when the problem occurs make sure all the Sharepoint servers can ping each other, and make sure the index server can reach the public URL (open IE on the index server and check it out). Alternatively setup or write some monitoring tool on the index server that checks connectivity regularly and logs it until the issue appears.
Increase time out values?
One forum suggested increasing your timeout values for search, since the problem may be caused by indexer being too busy or difficulty reaching the url. If it fails to complete indexing of a document for example, it will remove it from the gatherer process as incomplete.  The setting in Central Administration 2013 can be found at:
Application Management > Manage Services on Server > Sharepoint Server Search > Search Service Application and then on the Farm Search Administration page there is a Time-Out (Seconds) settings which you can try to increase.

Normally these are set to 60 seconds. By changing these to 120 or longer you will give the search service some extra time to connect and index an item.  I tried increasing this substantially but found it did not help.
Check Alternate Access Mappings
Certain users have reported the problem happening as a result of incorrect configuration of the Alternate Access Mappings in Central Admin.  If your default Alternate Access Mapping is not the same as the one in your Content Source you could have this issue. Check the Alternate Access Mappings and the Content Source are the same.

Check that the site or list is configured to allow it to be in search results

For each site check that the "Allow this site to appear in search results?" is on in the Search and Offline Availability options.  If it is a list or document library that is the problem, also check that the library is allowed to be indexed, under the advanced settings of the library/list and click the button to reindex the list on its next run.

Change the search schedule?

If there are issues with overlapping crawls (multiple overlapping crawls above), then adjust the schedules and doing some restarts of the services and see if the content being collected in the search index again

DNS Loopback issue

Certain sites have recommended disabling the loopback check.  Please note that I advise you to read around thoroughly on disable loopback checks on production Sharepoint servers before you decide whether or not to do this, there is various pages of advice on this. 
·         Open regedit
·         Go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa
·         Under Lsa, create a DWORD value called DisableLoopbackCheck
·         In Value, type 1
·         Close regedit

Recreate the content source

One suggested option is to create a new content source and doing a full crawl fixed the issue - again however not ideal in a production environment.  Alternatively delete/re-create the content source and run a full crawl. I don't think you can reset the index on specific content source unless you have dedicated crawl just for that source, which of course the architecture permits.

Create an include rule

Create a include rule in the Central Administration to force the inclusion of this site collection/site.

Kerberos issues

If you've set up anything aside from Integrated Windows Authentication, you'll have to work harder to get your crawler working.  Some issues are related to Kerberos. If you don't have the infrastructure update applied, then SharePoint will not be able to use Kerberos auth to web sites with non-default (80/443) ports.

Recommendation from a Microsoft person

I was contacted by another Sharepoint user via this blog who got Microsoft to investigate their issue, and they came up with the following proposed steps to fixing it:
The list of Alternate Access Mappings in Central Administration must match the bindings list in IIS exactly. There should be no extra bindings in IIS that don’t have AAMs.
1.     Removed the extra binding statement for from IIS.
2.     IISRESET /noforce
3.     Go to central admin; Manage Service Applications; click your Search App – "Search Service Application 1", click "Crawl log" – notice the Deleted column.
4.     Click the link – the one in the Deleted column that lists the number of deleted items.
5.     Look for any rows that have the Deleted by the gatherer (This item was deleted because its parent was deleted) message and note the url.
6.     Navigate to that site and to site contents.
7.     Add a document library and a document into it. (This is necessary!)
8.     Retry search.
9.     Repeat steps for each site collection.