To View Crawl Log
·
Verify that the user account that is performing this procedure
is an administrator for the Search service application.
·
In Central Administration, in the Quick Launch, click Application Management.
·
On the Application Management page, under Service Applications, click Manage service applications.
·
On the Service Applications page, in the list of service
applications, click the Search service application that you want.
·
On the Search Administration page, in the Quick Launch,
under Crawling, click Crawl Log.
·
On the Crawl Log – Content Source page, click the view that
you want.
Below fields you can see in Crawl Log
The Content Source,
Host Name, and Crawl History views show data in the following columns:
- Successes.
Items that were successfully crawled and searchable.
- Warnings.
Items that might not have been successfully crawled and might not be
searchable.
- Errors.
Items that were not successfully crawled and might not be searchable.
- Deletes.
Items that were removed from the index and are no longer searchable.
- Top Level Errors.
Errors in top-level documents, including start addresses, virtual servers,
and content databases. Every top-level error is counted as an error, but
not all errors are counted as top-level errors. Because the Errors column
includes the count from the Top Level Errors column,
top-level-errors are not counted again in the Host Name view.
- Not Modified.
Items that were not modified between crawls.
- Security Update.
Items whose security settings were crawled because they were modified.
Crawl Log Timer Job
By default,
the data for each view in the crawl log is refreshed every five minutes by the
timer job Crawl Log Report for Search Application <Search Service Application name>. You can change the refresh rate for this timer
job, but in general, this setting should remain as is.
To check the status of
the crawl log timer job
- Verify that the user account that is performing this
procedure is a member of the Farm Administrators SharePoint group.
- In Central Administration, in the Monitoring section,
click Check job status.
- On the Timer Job Status page, click Job History.
- On the Job History page, find Crawl Log Report for
Search Application <Search Service Application name> for
the Search service application that you want and review the status.
To change the refresh
rate for the crawl log timer job
- Verify that the user account that is performing this
procedure is a member of the Farm Administrators SharePoint group.
- In Central Administration, in the Monitoring section,
click Check job status.
- On the Timer Job Status page, click Job History.
- On the Job History page, click Crawl Log Report
for Search Application <Search Service Application name> for
the Search service application that you want.
- On the Edit Timer Job page, in the Recurring
Schedule section, change the timer job schedule to the interval
that you want.
- Click OK.
Troubleshoot Crawl Problems:
First we need to see all servers in the farm
are in the same level ( All cumulative updates, service packs are in same in
all the farm and need to look into recent upgrade error log file in uls logs) .
If you find any errors in log file, need to solve those issues and run psconfig
again.
This section provides information about
common crawl log errors, crawler behavior, and actions to take to maintain a
healthy crawling environment.
When An Item is from the Index
When a crawler cannot find an item that
exists in the index because the URL is obsolete or it cannot be accessed due to
a network outage, the crawler reports an error for that item in that crawl. If
this continues during the next three crawls, the item is deleted from the
index. For file-share content sources, items are immediately deleted from the
index when they are deleted from the file share.
“Object could not be found” Error for a file Share
This error can result
from a crawled file-share content source that contains a valid host name but an
invalid file name. For example, with a host name and file name of
\\ValidHost\files\file1, \\ValidHost exists, but the file file1 does not. In
this case, the crawler reports the error "Object could not be found"
and deletes the item from the index. The Crawl History view shows:
- Error: 1
- Deletes: 1
- Top Level Errors: 1 (\\ValidHost\files\file1 shows as a
top-level error because it is a start address)
The Content Source
view shows:
- Errors: 0
- Deletes: 0
- Top Level Errors: 0
The Content Source
view will show all zeros because it only shows the status of items that are in
the index, and this start address was not entered into the index. However, the
Crawl History view shows all crawl transactions, whether or not they are
entered into the index.
This error can result
from a crawled file-share content source that contains an invalid host name and
an invalid file name. For example, with a host name and file name of \\InvalidHost\files\file1,
both \\InvalidHost and the file file1 do not exist. In this case, the crawler
reports the error "Network path for item could not be resolved" and
does not delete the item from the index. The Crawl History view shows:
- Errors: 1
- Deletes: 0
- Top Level Errors: 1 (\\InvalidHost\files\file1 shows as
a top-level error because it is a start address)
The Content Source
view shows:
- Error: 0
- Deletes: 0
- Top Level Errors: 0
The item is not
deleted from the index, because the crawler cannot determine if the item really
does not exist or if there is a network outage that prevents the item from
being accessed.
Obsolete Start Address
The crawl log reports
top-level errors for top-level documents, or start addresses. To ensure healthy
content sources, you should take the following actions:
- Always investigate non-zero top-level errors.
- Always investigate top-level errors that appear
consistently in the crawl log.
- Otherwise, we recommend that you remove obsolete start
addresses every two weeks after contacting the owner of the site.
To troubleshoot and
delete obsolete start addresses
- Verify that the user account that is performing this
procedure is an administrator for the Search service application.
- When you have determined that a start address might be
obsolete, first determine whether it exists or not by pinging the site. If
you receive a response, determine which of the following issues caused the
problem:
- If you can access the URL from a browser, the crawler
could not crawl the start address because there were problems with the
network connection.
- If the URL is redirected from a browser, you should
change the start address to be the same as the new address.
- If the URL receives an error in a browser, try again
at another time. If it still receives an error after multiple tries,
contact the site owner to ensure that the site is available.
- If you do not receive a response from pinging the site,
the site does not exist and should be deleted. Confirm this with the site
owner before you delete the site.
When the crawl log continually reports an
"Access Denied" error for a start address, the content access account
might not have Read permissions to crawl the site. If you are able to view the
URL with an administrative account, there might be a problem with how the
permissions were updated. In this case, you should contact the site owner to
request permissions.
Number Set to zero in content source view during host distribution
During a host
distribution, the numbers in all columns in Content Source view are set to
zero. This happens because the numbers in Content Source view are sourced
directly from the crawl database tables. During a host distribution, the data
from these tables are being moved, so the values remain at zero during the
duration of the host distribution.
After the host
distribution is complete, run an incremental crawl of the content sources in
order to restore the original numbers.
When documents are
deleted from a file-share content source that was successfully crawled, they
are immediately deleted from the index during the next full or incremental
crawl. These items will show as errors in the Content Source view of the crawl
log, but will show as deletes in other views.
Stopping or restarting the SharePoint server search service causes crawl log transaction discrepancy
The SharePoint Server Search service
(OSearch14) might be reset or restarted due to administrative operations or
server functions. When this occurs, a discrepancy in the crawl history view of
the crawl log can occur. You may notice a difference between the number of
transactions reported per crawl and the actual number of transactions performed
per crawl. This can occur because the OSearch14 service stores active
transactions in memory and writes these transactions after they are completed.
If the OSearch14 service is stopped, reset, or restarted before the in-memory
transactions have been written to the crawl log database, the number of
transactions per crawl will be shown incorrectly.
Deleted by Gatherer this item was deleted because its parent was
deleted
The best possible ways to troubleshoot this
issue is
Multiple
Crawls overlapping cause results to be deleted
I have
read in one article/blog that this error can occur if multiple crawls that
occur at the same time overlap. It may lead to a collision with the results
getting put into the search index. If this happens too often the content will
be removed as being unseen. There is a three day limit on keeping results and
if the content is not found again within that time then it will be removed. As
an example the content and the people queries may be running at the same
time. I'm not convinced by this proposition, I had the system setup to
have just one incremental crawl running, and the problem still occurred, also
the system always waited for each incremental crawl to finish before starting
another one (eg when there is just a few minutes between them in the schedule).
SharePoint
crawler can’t access SharePoint website to crawl it
By design the local SharePoint sites crawl uses the default url
to access the SharePoint sites, the crawler (running from the index
server) will attempt to visit the public/default URL. Some blog/forums
explains that "deleted by the gatherer" mostly occurs when the
crawler is not able to access the SharePoint website at the time of
crawling. In addition I've seen it stated that the indexer visits the
default view for a list in order to index that list.
There might be a number of reasons for why the indexer can't
reach a SharePoint URL:
·
Connectivity
issues, DNS issues
you could get the issue if you have DNS issues when then indexer is trying to crawl, if the indexer cannot resolve links correctly or cannot verify credentials in AD.
I've seen a couple of forums that suggests it resides with an intermittent DNS resolution issue and is nothing to do with SharePoint configuration. Another user noticed an issue whereby the SharePoint crawler was not crawling any sites that had a DNS alias to itself. eg, server name was myserver01 and there was a DNS alias called myportal. The crawler would not crawlhttp://myportal and anything under it.
you could get the issue if you have DNS issues when then indexer is trying to crawl, if the indexer cannot resolve links correctly or cannot verify credentials in AD.
I've seen a couple of forums that suggests it resides with an intermittent DNS resolution issue and is nothing to do with SharePoint configuration. Another user noticed an issue whereby the SharePoint crawler was not crawling any sites that had a DNS alias to itself. eg, server name was myserver01 and there was a DNS alias called myportal. The crawler would not crawlhttp://myportal and anything under it.
·
SharePoint
server under heavy load
The problem may be caused when the server is under heavy load or when very large documents are being indexed, thus requiring more time for the indexer to access the site.
The problem may be caused when the server is under heavy load or when very large documents are being indexed, thus requiring more time for the indexer to access the site.
Check
the index server can reach the URL
I've
had inter-server communication issues before; when the problem occurs make sure
all the Sharepoint servers can ping each other, and make sure the index server
can reach the public URL (open IE on the index server and check it out).
Alternatively setup or write some monitoring tool on the index server that
checks connectivity regularly and logs it until the issue appears.
Increase time
out values?
One forum suggested increasing your timeout values for
search, since the problem may be caused by indexer being too busy or difficulty
reaching the url. If it fails to complete indexing of a document for example,
it will remove it from the gatherer process as incomplete. The setting in
Central Administration 2013 can be found at:
Application Management > Manage Services on Server >
Sharepoint Server Search > Search Service Application and
then on the Farm Search Administration page there is a Time-Out (Seconds)
settings which you can try to increase.
Normally these are set to 60 seconds. By changing these to 120 or longer you will give the search service some extra time to connect and index an item. I tried increasing this substantially but found it did not help.
Normally these are set to 60 seconds. By changing these to 120 or longer you will give the search service some extra time to connect and index an item. I tried increasing this substantially but found it did not help.
Check
Alternate Access Mappings
Certain users have reported the problem happening as a
result of incorrect configuration of the Alternate Access Mappings in Central
Admin. If your default Alternate Access Mapping is not the same as the
one in your Content Source you could have this issue. Check the
Alternate Access Mappings and the Content Source are the same.
Check
that the site or list is configured to allow it to be in search results
For each site check that the "Allow this
site to appear in search results?" is on in the Search and Offline
Availability options. If it is a list or document library that is the
problem, also check that the library is allowed to be indexed, under the
advanced settings of the library/list and click the button to reindex the list
on its next run.
Change
the search schedule?
If there are issues with
overlapping crawls (multiple overlapping crawls above), then adjust
the schedules and doing some restarts of the services and see if the content
being collected in the search index again
DNS Loopback issue
Certain sites have recommended disabling the loopback
check. Please note that I advise you to read around thoroughly on disable
loopback checks on production Sharepoint servers before you
decide whether or not to do this, there is various pages of advice on
this.
·
Open regedit
·
Go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa
·
Under Lsa, create a DWORD value called DisableLoopbackCheck
·
In Value, type 1
·
Close regedit
Recreate the content source
One suggested option is to create a new content source and doing
a full crawl fixed the issue - again however not ideal in a production
environment. Alternatively delete/re-create the content source and run a
full crawl. I don't think you can reset the index on specific content
source unless you have dedicated crawl just for that source, which of course
the architecture permits.
Create an include rule
Create a include rule in the Central Administration to
force the inclusion of this site collection/site.
Kerberos issues
If you've set up anything
aside from Integrated Windows Authentication, you'll have to work harder to get
your crawler working. Some issues are related to Kerberos. If you don't
have the infrastructure update applied, then SharePoint will not be able to use
Kerberos auth to web sites with non-default (80/443) ports.
Recommendation
from a Microsoft person
I was contacted by another Sharepoint user via this blog who got
Microsoft to investigate their issue, and they came up with the following
proposed steps to fixing it:
The list of Alternate
Access Mappings in Central
Administration must
match the bindings list in IIS exactly. There should be no extra bindings in
IIS that don’t have AAMs.
1. Removed the extra
binding statement for from IIS.
2. IISRESET /noforce
3. Go to central admin;
Manage Service Applications; click your Search App – "Search Service
Application 1", click "Crawl log" – notice the Deleted column.
4. Click the link – the
one in the Deleted column that lists the number of deleted items.
5. Look for any rows that
have the Deleted by the gatherer (This item was deleted because its parent was
deleted) message and note the url.
6. Navigate to that site
and to site contents.
7. Add a document library
and a document into it. (This is necessary!)
8. Retry search.
9. Repeat steps for each
site collection.
Outstanding! What a wonderful content you've written on SharePoint.Thank you so much for sharing your knowlege on SharePoint Developer with us. Please keep sharing such as great content in future.
ReplyDeleteI just want to thank you for sharing your information and your site or blog this is simple but nice Information I’ve ever seen i like it i learn something today. List Crawler
ReplyDeleteInteresting Article. Hoping that you will continue posting an article having a useful information. Best Anal Sex Videos
ReplyDelete