Saturday, January 2, 2016

Troubleshoot Search Crawl Issues ( Best Possible ways)

To View Crawl Log
·         Verify that the user account that is performing this procedure is an administrator for the Search service application.
·         In Central Administration, in the Quick Launch, click Application Management.
·         On the Application Management page, under Service Applications, click Manage service applications.
·         On the Service Applications page, in the list of service applications, click the Search service application that you want.
·         On the Search Administration page, in the Quick Launch, under Crawling, click Crawl Log.
·         On the Crawl Log – Content Source page, click the view that you want.






Below fields you can see in Crawl Log
The Content Source, Host Name, and Crawl History views show data in the following columns:
  • Successes. Items that were successfully crawled and searchable.
  • Warnings. Items that might not have been successfully crawled and might not be searchable.
  • Errors. Items that were not successfully crawled and might not be searchable.
  • Deletes. Items that were removed from the index and are no longer searchable.
  • Top Level Errors. Errors in top-level documents, including start addresses, virtual servers, and content databases. Every top-level error is counted as an error, but not all errors are counted as top-level errors. Because the Errors column includes the count from the Top Level Errors column, top-level-errors are not counted again in the Host Name view.
  • Not Modified. Items that were not modified between crawls.
  • Security Update. Items whose security settings were crawled because they were modified.

Crawl Log Timer Job
            By default, the data for each view in the crawl log is refreshed every five minutes by the timer job Crawl Log Report for Search Application <Search Service Application name>. You can change the refresh rate for this timer job, but in general, this setting should remain as is.
To check the status of the crawl log timer job
  1. Verify that the user account that is performing this procedure is a member of the Farm Administrators SharePoint group.
  2. In Central Administration, in the Monitoring section, click Check job status.
  3. On the Timer Job Status page, click Job History.
  4. On the Job History page, find Crawl Log Report for Search Application <Search Service Application name> for the Search service application that you want and review the status.
To change the refresh rate for the crawl log timer job
  1. Verify that the user account that is performing this procedure is a member of the Farm Administrators SharePoint group.
  2. In Central Administration, in the Monitoring section, click Check job status.
  3. On the Timer Job Status page, click Job History.
  4. On the Job History page, click Crawl Log Report for Search Application <Search Service Application name> for the Search service application that you want.
  5. On the Edit Timer Job page, in the Recurring Schedule section, change the timer job schedule to the interval that you want.
  6. Click OK.





Troubleshoot Crawl Problems:

First we need to see all servers in the farm are in the same level ( All cumulative updates, service packs are in same in all the farm and need to look into recent upgrade error log file in uls logs) . If you find any errors in log file, need to solve those issues and run psconfig again.

This section provides information about common crawl log errors, crawler behavior, and actions to take to maintain a healthy crawling environment.

When An Item is from the Index

When a crawler cannot find an item that exists in the index because the URL is obsolete or it cannot be accessed due to a network outage, the crawler reports an error for that item in that crawl. If this continues during the next three crawls, the item is deleted from the index. For file-share content sources, items are immediately deleted from the index when they are deleted from the file share.

“Object could not be found” Error for a file Share
This error can result from a crawled file-share content source that contains a valid host name but an invalid file name. For example, with a host name and file name of \\ValidHost\files\file1, \\ValidHost exists, but the file file1 does not. In this case, the crawler reports the error "Object could not be found" and deletes the item from the index. The Crawl History view shows:
  • Error: 1
  • Deletes: 1
  • Top Level Errors: 1 (\\ValidHost\files\file1 shows as a top-level error because it is a start address)
The Content Source view shows:
  • Errors: 0
  • Deletes: 0
  • Top Level Errors: 0
The Content Source view will show all zeros because it only shows the status of items that are in the index, and this start address was not entered into the index. However, the Crawl History view shows all crawl transactions, whether or not they are entered into the index.

“Network path for item could not be resolved “error for a file share
This error can result from a crawled file-share content source that contains an invalid host name and an invalid file name. For example, with a host name and file name of \\InvalidHost\files\file1, both \\InvalidHost and the file file1 do not exist. In this case, the crawler reports the error "Network path for item could not be resolved" and does not delete the item from the index. The Crawl History view shows:
  • Errors: 1
  • Deletes: 0
  • Top Level Errors: 1 (\\InvalidHost\files\file1 shows as a top-level error because it is a start address)
The Content Source view shows:
  • Error: 0
  • Deletes: 0
  • Top Level Errors: 0
The item is not deleted from the index, because the crawler cannot determine if the item really does not exist or if there is a network outage that prevents the item from being accessed.

Obsolete Start Address

The crawl log reports top-level errors for top-level documents, or start addresses. To ensure healthy content sources, you should take the following actions:
  • Always investigate non-zero top-level errors.
  • Always investigate top-level errors that appear consistently in the crawl log.
  • Otherwise, we recommend that you remove obsolete start addresses every two weeks after contacting the owner of the site.
To troubleshoot and delete obsolete start addresses
  1. Verify that the user account that is performing this procedure is an administrator for the Search service application.
  2. When you have determined that a start address might be obsolete, first determine whether it exists or not by pinging the site. If you receive a response, determine which of the following issues caused the problem:
    • If you can access the URL from a browser, the crawler could not crawl the start address because there were problems with the network connection.
    • If the URL is redirected from a browser, you should change the start address to be the same as the new address.
    • If the URL receives an error in a browser, try again at another time. If it still receives an error after multiple tries, contact the site owner to ensure that the site is available.
  3. If you do not receive a response from pinging the site, the site does not exist and should be deleted. Confirm this with the site owner before you delete the site.

Access Denied

When the crawl log continually reports an "Access Denied" error for a start address, the content access account might not have Read permissions to crawl the site. If you are able to view the URL with an administrative account, there might be a problem with how the permissions were updated. In this case, you should contact the site owner to request permissions. 

Number Set to zero in content source view during host distribution
During a host distribution, the numbers in all columns in Content Source view are set to zero. This happens because the numbers in Content Source view are sourced directly from the crawl database tables. During a host distribution, the data from these tables are being moved, so the values remain at zero during the duration of the host distribution.
After the host distribution is complete, run an incremental crawl of the content sources in order to restore the original numbers.

Showing File Shares deletes in Content Source View
When documents are deleted from a file-share content source that was successfully crawled, they are immediately deleted from the index during the next full or incremental crawl. These items will show as errors in the Content Source view of the crawl log, but will show as deletes in other views.

Stopping or restarting the SharePoint server search service causes crawl log transaction discrepancy

The SharePoint Server Search service (OSearch14) might be reset or restarted due to administrative operations or server functions. When this occurs, a discrepancy in the crawl history view of the crawl log can occur. You may notice a difference between the number of transactions reported per crawl and the actual number of transactions performed per crawl. This can occur because the OSearch14 service stores active transactions in memory and writes these transactions after they are completed. If the OSearch14 service is stopped, reset, or restarted before the in-memory transactions have been written to the crawl log database, the number of transactions per crawl will be shown incorrectly.

Deleted by Gatherer this item was deleted because its parent was deleted
  The best possible ways to troubleshoot this issue is

Multiple Crawls overlapping cause results to be deleted
I have read in one article/blog that this error can occur if multiple crawls that occur at the same time overlap. It may lead to a collision with the results getting put into the search index. If this happens too often the content will be removed as being unseen. There is a three day limit on keeping results and if the content is not found again within that time then it will be removed. As an example the content and the people queries may be running at the same time.  I'm not convinced by this proposition, I had the system setup to have just one incremental crawl running, and the problem still occurred, also the system always waited for each incremental crawl to finish before starting another one (eg when there is just a few minutes between them in the schedule).
SharePoint crawler can’t access SharePoint website to crawl it
By design the local SharePoint sites crawl uses the default url to access the SharePoint sites, the crawler (running from the index server) will attempt to visit the public/default URL. Some blog/forums explains that "deleted by the gatherer" mostly occurs when the crawler is not able to access the SharePoint website at the time of crawling.  In addition I've seen it stated that the indexer visits the default view for a list in order to index that list. 
There might be a number of reasons for why the indexer can't reach a SharePoint URL:
·         Connectivity issues, DNS issues
you could get the issue if you have DNS issues when then indexer is trying to crawl, if the indexer cannot resolve links correctly or cannot verify credentials in AD.

I've seen a couple of forums that suggests it resides with an intermittent DNS resolution issue and is nothing to do with SharePoint configuration.    Another user noticed an issue whereby the SharePoint crawler was not crawling any sites that had a DNS alias to itself.  eg, server name was myserver01 and there was a DNS alias called myportal.  The crawler would not crawlhttp://myportal and anything under it. 
·         SharePoint server under heavy load
The problem may be caused when the server is under heavy load or when very large documents are being indexed, thus requiring more time for the indexer to access the site.

Check the index server can reach the URL

I've had inter-server communication issues before; when the problem occurs make sure all the Sharepoint servers can ping each other, and make sure the index server can reach the public URL (open IE on the index server and check it out). Alternatively setup or write some monitoring tool on the index server that checks connectivity regularly and logs it until the issue appears.
Increase time out values?
One forum suggested increasing your timeout values for search, since the problem may be caused by indexer being too busy or difficulty reaching the url. If it fails to complete indexing of a document for example, it will remove it from the gatherer process as incomplete.  The setting in Central Administration 2013 can be found at:
Application Management > Manage Services on Server > Sharepoint Server Search > Search Service Application and then on the Farm Search Administration page there is a Time-Out (Seconds) settings which you can try to increase.

Normally these are set to 60 seconds. By changing these to 120 or longer you will give the search service some extra time to connect and index an item.  I tried increasing this substantially but found it did not help.
Check Alternate Access Mappings
Certain users have reported the problem happening as a result of incorrect configuration of the Alternate Access Mappings in Central Admin.  If your default Alternate Access Mapping is not the same as the one in your Content Source you could have this issue. Check the Alternate Access Mappings and the Content Source are the same.

Check that the site or list is configured to allow it to be in search results

For each site check that the "Allow this site to appear in search results?" is on in the Search and Offline Availability options.  If it is a list or document library that is the problem, also check that the library is allowed to be indexed, under the advanced settings of the library/list and click the button to reindex the list on its next run.

Change the search schedule?

If there are issues with overlapping crawls (multiple overlapping crawls above), then adjust the schedules and doing some restarts of the services and see if the content being collected in the search index again

DNS Loopback issue

Certain sites have recommended disabling the loopback check.  Please note that I advise you to read around thoroughly on disable loopback checks on production Sharepoint servers before you decide whether or not to do this, there is various pages of advice on this. 
·         Open regedit
·         Go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa
·         Under Lsa, create a DWORD value called DisableLoopbackCheck
·         In Value, type 1
·         Close regedit

Recreate the content source

One suggested option is to create a new content source and doing a full crawl fixed the issue - again however not ideal in a production environment.  Alternatively delete/re-create the content source and run a full crawl. I don't think you can reset the index on specific content source unless you have dedicated crawl just for that source, which of course the architecture permits.

Create an include rule

Create a include rule in the Central Administration to force the inclusion of this site collection/site.

Kerberos issues

If you've set up anything aside from Integrated Windows Authentication, you'll have to work harder to get your crawler working.  Some issues are related to Kerberos. If you don't have the infrastructure update applied, then SharePoint will not be able to use Kerberos auth to web sites with non-default (80/443) ports.

Recommendation from a Microsoft person

I was contacted by another Sharepoint user via this blog who got Microsoft to investigate their issue, and they came up with the following proposed steps to fixing it:
The list of Alternate Access Mappings in Central Administration must match the bindings list in IIS exactly. There should be no extra bindings in IIS that don’t have AAMs.
1.     Removed the extra binding statement for from IIS.
2.     IISRESET /noforce
3.     Go to central admin; Manage Service Applications; click your Search App – "Search Service Application 1", click "Crawl log" – notice the Deleted column.
4.     Click the link – the one in the Deleted column that lists the number of deleted items.
5.     Look for any rows that have the Deleted by the gatherer (This item was deleted because its parent was deleted) message and note the url.
6.     Navigate to that site and to site contents.
7.     Add a document library and a document into it. (This is necessary!)
8.     Retry search.
9.     Repeat steps for each site collection.


Wednesday, December 30, 2015

OWA for SP2013 Part-1(Introduction)

When used together with SharePoint Server 2013, Office Web Apps Server provides updated versions of Word Web App, Excel Web App, PowerPoint Web App, and OneNote Web App. Users can view and, in some cases, edit Office documents in SharePoint libraries by using a supported web browser on computers and on many mobile devices, such as Windows Phones, iPhones, iPads, Windows 8 tablets, and Android devices.
Figure: The viewing and editing capabilities of Office Web Apps on different kinds of devices

Office Web Apps Server is now installed as a stand-alone server

Office Web Apps is not installed on the same servers that run SharePoint 2013. Instead, you deploy one or more physical or virtual servers that run Office Web Apps Server. Then you configure the SharePoint 2013 farm to use the Office Web Apps Server farm to provide Office Web Apps functionality to users who create or open Office files from SharePoint libraries.

Theory about Office Web App Server

Office Web Apps Server is an Office server product that provides browser-based file viewing and editing services for Office files. Office Web Apps Server works with products and services that support WOPI, the Web app Open Platform Interface protocol. These products, known as hosts, include SharePoint 2013, Lync Server 2013, and Exchange Server 2013. An Office Web Apps Server farm can provide Office services to multiple on-premises hosts, and you can scale out the farm from one server to multiple servers as your organization’s needs grow. Although Office Web Apps Server requires dedicated servers that run no other server applications, you can install Office Web Apps Server on virtual machine instances instead.

It is easier to deploy and manage Office Web Apps within your organization now that it is a stand-alone product. If you deploy SharePoint 2013, for example, you no longer have to optimize the SharePoint infrastructure to support Office Web Apps, which in earlier versions was tightly integrated with SharePoint Server 2010. You can also apply updates to the Office Web Apps Server farm separately and at a different frequency than you update SharePoint, Exchange, or Lync Server. Having a stand-alone Office Web Apps Server farm also means that users can view or edit Office files that are stored outside SharePoint Server, such as those in shared folders or other websites. This functionality is provided by a feature known as Online Viewers.


Differences between the Office Web Apps deployment models

How SharePoint 2013 uses Office Web App server for Viewing and Editing Office Documents
When used with SharePoint Server 2013, Office Web Apps Server provides updated versions of Word Web App, Excel Web App, PowerPoint Web App, and OneNote Web App. Users can view and, in some cases, edit Office documents in SharePoint libraries by using a supported web browser on computers and on many mobile devices, such as Windows Phones, iPhones, iPads, and Windows 8 tablets. Among the many new features in Office Web Apps, improved touch support and editing capabilities enable users of iPads and Windows 8 tablets to enjoy editing and viewing Office documents directly from their devices.

The following illustration summarizes the viewing and editing capabilities of Office Web Apps on different kinds of devices.

Viewing and editing capabilities of Office Web Apps
Differences between Excel Web App and Excel Services in SharePoint
Excel Web App and Excel Services in SharePoint have a lot in common, but they are not the same. Excel Services is available only in the Enterprise edition of SharePoint Server 2013. Excel Web App is available in SharePoint Server 2013 and SharePoint Foundation 2013. Both applications enable you to view workbooks in a browser window, and both enable you to interact with and explore data.

But there are certain differences between Excel Web App and Excel Services in SharePoint. For example, Excel Services supports external data connections, data models, and the ability to interact with items that use data models (such as PivotChart reports, PivotTable reports and timeline controls). Excel Services provides more business intelligence functionality than Excel Web App, but Excel Services does not enable users to create or edit workbooks in a browser window.

Deploy Office Web Apps Server

First, here are a few things you should NOT do when deploying Office Web Apps Server.

  • Don’t install any other server applications on the server that’s running Office Web Apps Server. This includes Exchange Server, SharePoint Server, Lync Server, and SQL Server. If you have a shortage of servers, consider running Office Web Apps Server in a virtual machine instance on one of the servers you have.
  • Don’t install any services or roles that depend on the Web Server (IIS) role on port 80, 443, or 809 because Office Web Apps Server periodically removes web applications on these ports.
  • Don’t install any version of Office. If it’s already installed, you’ll need to uninstall it before you install Office Web Apps Server.
  • Don’t install Office Web Apps Server on a domain controller. It won’t run on a server with Active Directory Domain Services (AD DS).

Make sure the following ports aren’t blocked by firewalls on either the server that runs Office Web Apps Server or the load balancer:
·         Port 443 for HTTPS traffic
·         Port 80 for HTTP traffic
·         Port 809 for private traffic between the servers that run Office Web Apps Server (if you’re setting up a multi-server farm)

Downloads, server roles, and features that are required for Office Web Apps Server


Download, server role, or feature
If you’re installing on Windows Server 2008 R2
If you’re installing on Windows Server 2012
If you’re installing on Windows Server 2012 R2
Download: Office Web Apps Server
Download: Office Web Apps Server SP1
Recommended
Recommended
Download: Correct version of .NET Framework
.NET framework 4.5 is already installed
Download: Update for Windows Server 2008 R2 x64 Edition
Not applicable
Not applicable
Download: Windows PowerShell 3.0
Already installed
Already installed
Server role: Web Server (IIS)
Here are the minimum role services required for the Web Server (IIS) server role.
Common HTTP Features
  • Static Content
  • Default Document
Application Development
  • ASP.NET
  • .NET Extensibility
  • ISAPI Extensions
  • ISAPI Filters
  • Server Side Includes
Security
  • Windows Authentication
  • Request Filtering
Management Tools
  • IIS Management Console
The following options are recommended but not required:
Performance
  • Static Content Compression
  • Dynamic Content Compression
Here are the minimum role services required for the Web Server (IIS) server role.
Management Tools
  • IIS Management Console
Web Server
  • Common HTTP Features
  • Default Document
  • Static Content
Security
  • Request Filtering
  • Windows Authentication
Application Development
  • .NET Extensibility 4.5
  • ASP.NET 4.5
  • ISAPI Extensions
  • ISAPI Filters
  • Server Side Includes
The following services are recommended but not required:
Performance
  • Static Content Compression
  • Dynamic Content Compression
Here are the minimum role services required for the Web Server (IIS) server role.
Management Tools
  • IIS Management Console
Web Server
  • Common HTTP Features
  • Default Document
  • Static Content
Security
  • Request Filtering
  • Windows Authentication
Application Development
  • .NET Extensibility 4.5
  • ASP.NET 4.5
  • ISAPI Extensions
  • ISAPI Filters
  • Server Side Includes
The following services are recommended but not required:
Performance
  • Static Content Compression
  • Dynamic Content Compression
Feature: Ink and Handwriting Services
Ink and Handwriting Services
  • Ink Support
Ink and Handwriting Services
  • Ink Support is not required.
Ink and Handwriting Services
  • Ink Support is not required.

Deploying Office Web App Server involves installing some prerequisite software and running a few Windows PowerShell commands, but overall the process is designed to be pretty straightforward.